Batch Dependency in NIFI - GetFile

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Batch Dependency in NIFI - GetFile

KhajaAsmath Mohammed
Hi,

I have use case where the data is read using Getfile from file location and loads that data into database. I would like to have trigger once the database load is successful for all the files. 

I tried approach of Wait/Notify but still it does not work as it works for individual files. Lets say, I have 1300 files and I should alert trigger after completion of 1300 files.

This count changes and it is dynamic in nature. Any suggestions on this approach please.

Thanks,
Asmath
Reply | Threaded
Open this post in threaded view
|

Re: Batch Dependency in NIFI - GetFile

Boris Tyukin
Dependencies in NiFi is something I wish could work better. I see your pain for sure! I wish there was an easier way as we all have to do ETL batch type dependencies eventually. 

I also tried Wait/Notify but it was a very confusing setup and felt a bit overengineered for what I wanted to do. 

The best option I came up and still being simple is this:

1. GenerateFlow processor to schedule your flow (let's say 7am every day). It generates one flow and we also record some helpful audit attributes for our framework
2. this flow will trigger your other flowfiles - in your case GetFile will produce 1300 flowfiles today or 1320 tomorrow.
3. Once you get count of files, you init attributes for MergeContent processor (number of files will be number of fragments)
4. do you thing here
5. final step is MergeContent processor which will wait for all the fragments/files to finish. Only then it will proceed further. Here you can also set timeout in case something went wrong and you got 1290 files instead of expected 1300.


On Tue, Jun 2, 2020 at 11:16 AM KhajaAsmath Mohammed <[hidden email]> wrote:
Hi,

I have use case where the data is read using Getfile from file location and loads that data into database. I would like to have trigger once the database load is successful for all the files. 

I tried approach of Wait/Notify but still it does not work as it works for individual files. Lets say, I have 1300 files and I should alert trigger after completion of 1300 files.

This count changes and it is dynamic in nature. Any suggestions on this approach please.

Thanks,
Asmath