I am working on a tool to tell whether all processing has completed for a given input filename. Since it's possible for the flow to change the "filename" attribute, that is not a reliable way to get all the events related to an input file. My current solution involves recursively calling the API for each child flowfile within a flowfile. This is acceptable for small lineages, but not so much when a flowfile can be split into hundreds of children and has several thousand descendent flowfiles.
There are a couple of things that can be done to the provenanceAPI to make it friendlier.
The ability to query for a list of provenance events
Pagination for dealing with large responses
The ability to query for all descendants of a flowfile