Is provenance data preserved when processors are deleted?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Is provenance data preserved when processors are deleted?

Eric Secules
Hello everyone,

If I am upgrading a process group to the latest version, do you know whether provenance is preserved for processors that may get deleted in the upgrade?
I have noticed that if I delete my process group and redownload it from the registry, I am no longer able to see the provenance data from flowfiles that went through the first process group.

What is the best way to view and archive provenance data for older versions of flows? For background I am running NiFi in a docker container.
I think I might have to archive the currently running container and bring the new version up on a new container.

Thanks,
Eric
Reply | Threaded
Open this post in threaded view
|

Re: Is provenance data preserved when processors are deleted?

Mike Thomsen
One way to do it would be to set up a SiteToSiteProvenanceReporting task and have it send the data to another NiFi instance. That instance can post all of the provenance data into a NoSQL database like Mongo or Elasticsearch very quickly.

On Mon, May 4, 2020 at 5:47 PM Eric Secules <[hidden email]> wrote:
Hello everyone,

If I am upgrading a process group to the latest version, do you know whether provenance is preserved for processors that may get deleted in the upgrade?
I have noticed that if I delete my process group and redownload it from the registry, I am no longer able to see the provenance data from flowfiles that went through the first process group.

What is the best way to view and archive provenance data for older versions of flows? For background I am running NiFi in a docker container.
I think I might have to archive the currently running container and bring the new version up on a new container.

Thanks,
Eric
Reply | Threaded
Open this post in threaded view
|

Re: Is provenance data preserved when processors are deleted?

Eric Secules
What information is transmitted by SiteToSiteProvenanceReporting? Is it the content, the attributes, and the path the flowfile takes through the system? Is there any way to connect the provenance view from NiFi to the nosql database instead of the internal provenance storage?

On Mon, May 4, 2020 at 3:07 PM Mike Thomsen <[hidden email]> wrote:
One way to do it would be to set up a SiteToSiteProvenanceReporting task and have it send the data to another NiFi instance. That instance can post all of the provenance data into a NoSQL database like Mongo or Elasticsearch very quickly.

On Mon, May 4, 2020 at 5:47 PM Eric Secules <[hidden email]> wrote:
Hello everyone,

If I am upgrading a process group to the latest version, do you know whether provenance is preserved for processors that may get deleted in the upgrade?
I have noticed that if I delete my process group and redownload it from the registry, I am no longer able to see the provenance data from flowfiles that went through the first process group.

What is the best way to view and archive provenance data for older versions of flows? For background I am running NiFi in a docker container.
I think I might have to archive the currently running container and bring the new version up on a new container.

Thanks,
Eric
Reply | Threaded
Open this post in threaded view
|

Re: Is provenance data preserved when processors are deleted?

Mike Thomsen
It copies all of the provenance data, and no, there's no way yet to back the provenance repository with one of those nosql databases yet unfortunately.

On Mon, May 4, 2020 at 6:40 PM Eric Secules <[hidden email]> wrote:
What information is transmitted by SiteToSiteProvenanceReporting? Is it the content, the attributes, and the path the flowfile takes through the system? Is there any way to connect the provenance view from NiFi to the nosql database instead of the internal provenance storage?

On Mon, May 4, 2020 at 3:07 PM Mike Thomsen <[hidden email]> wrote:
One way to do it would be to set up a SiteToSiteProvenanceReporting task and have it send the data to another NiFi instance. That instance can post all of the provenance data into a NoSQL database like Mongo or Elasticsearch very quickly.

On Mon, May 4, 2020 at 5:47 PM Eric Secules <[hidden email]> wrote:
Hello everyone,

If I am upgrading a process group to the latest version, do you know whether provenance is preserved for processors that may get deleted in the upgrade?
I have noticed that if I delete my process group and redownload it from the registry, I am no longer able to see the provenance data from flowfiles that went through the first process group.

What is the best way to view and archive provenance data for older versions of flows? For background I am running NiFi in a docker container.
I think I might have to archive the currently running container and bring the new version up on a new container.

Thanks,
Eric
Reply | Threaded
Open this post in threaded view
|

Re: Is provenance data preserved when processors are deleted?

Eric Secules
Thanks Mike, 

So the content gets sent over the wire too or just a content URI? I see that the content gets aged out according to nifi.content.repository properties. Given that the defaults for retention are so short would nifi crumble on a long running system if the retention period is years and the available disk space is huge? Shipping the provenance info off to MongoDB or something isn't as attractive because we loose the provenance web UI and the ability to view the configuration of a processor that a flowfile went through.

Thanks,
Eric

On Mon, May 4, 2020 at 4:13 PM Mike Thomsen <[hidden email]> wrote:
It copies all of the provenance data, and no, there's no way yet to back the provenance repository with one of those nosql databases yet unfortunately.

On Mon, May 4, 2020 at 6:40 PM Eric Secules <[hidden email]> wrote:
What information is transmitted by SiteToSiteProvenanceReporting? Is it the content, the attributes, and the path the flowfile takes through the system? Is there any way to connect the provenance view from NiFi to the nosql database instead of the internal provenance storage?

On Mon, May 4, 2020 at 3:07 PM Mike Thomsen <[hidden email]> wrote:
One way to do it would be to set up a SiteToSiteProvenanceReporting task and have it send the data to another NiFi instance. That instance can post all of the provenance data into a NoSQL database like Mongo or Elasticsearch very quickly.

On Mon, May 4, 2020 at 5:47 PM Eric Secules <[hidden email]> wrote:
Hello everyone,

If I am upgrading a process group to the latest version, do you know whether provenance is preserved for processors that may get deleted in the upgrade?
I have noticed that if I delete my process group and redownload it from the registry, I am no longer able to see the provenance data from flowfiles that went through the first process group.

What is the best way to view and archive provenance data for older versions of flows? For background I am running NiFi in a docker container.
I think I might have to archive the currently running container and bring the new version up on a new container.

Thanks,
Eric
Reply | Threaded
Open this post in threaded view
|

Re: Is provenance data preserved when processors are deleted?

Andy LoPresto
Eric,

The provenance exported via the reporting task does not contain the flowfile content. 

NiFi wasn’t designed as a long term store for the content or provenance data, but given appropriate resources, you can certainly increase the retention policies significantly. This is not an endorsement, but there are other metadata storage systems like Apache Atlas [1] which you may want to look at for longer retention and some of the features you’re looking for, like a UI for lineage graphs. 



Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 5, 2020, at 11:39 PM, Eric Secules <[hidden email]> wrote:

Thanks Mike, 

So the content gets sent over the wire too or just a content URI? I see that the content gets aged out according to nifi.content.repository properties. Given that the defaults for retention are so short would nifi crumble on a long running system if the retention period is years and the available disk space is huge? Shipping the provenance info off to MongoDB or something isn't as attractive because we loose the provenance web UI and the ability to view the configuration of a processor that a flowfile went through.

Thanks,
Eric

On Mon, May 4, 2020 at 4:13 PM Mike Thomsen <[hidden email]> wrote:
It copies all of the provenance data, and no, there's no way yet to back the provenance repository with one of those nosql databases yet unfortunately.

On Mon, May 4, 2020 at 6:40 PM Eric Secules <[hidden email]> wrote:
What information is transmitted by SiteToSiteProvenanceReporting? Is it the content, the attributes, and the path the flowfile takes through the system? Is there any way to connect the provenance view from NiFi to the nosql database instead of the internal provenance storage?

On Mon, May 4, 2020 at 3:07 PM Mike Thomsen <[hidden email]> wrote:
One way to do it would be to set up a SiteToSiteProvenanceReporting task and have it send the data to another NiFi instance. That instance can post all of the provenance data into a NoSQL database like Mongo or Elasticsearch very quickly.

On Mon, May 4, 2020 at 5:47 PM Eric Secules <[hidden email]> wrote:
Hello everyone,

If I am upgrading a process group to the latest version, do you know whether provenance is preserved for processors that may get deleted in the upgrade?
I have noticed that if I delete my process group and redownload it from the registry, I am no longer able to see the provenance data from flowfiles that went through the first process group.

What is the best way to view and archive provenance data for older versions of flows? For background I am running NiFi in a docker container.
I think I might have to archive the currently running container and bring the new version up on a new container.

Thanks,
Eric