NiFi-light for analysts

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

NiFi-light for analysts

Boris Tyukin
Hi guys,

I am thinking to increase the footprint of NiFi in my org to extend it to less technical roles. I have a few questions:

1) is there any plans to support easy dependencies at some point? We are aware of all the current options (wait-notify, kafka, mergerecord/mergecontent etc.) and all of them are still hard and not reliable. For non-technical roles, we really need very stupid simple way to define classical dependencies like run task C only after task A and B are finished. I realize it is a challenge because of the whole concept of NiFi with flowfiles (which we do love being on a technical side of the house), but I really do not want to get another ETL/scheduling tool.

2) is it fairly easy to build and support our custom version of NiFi-light, when we remove all the processors that we do not want to expose to non-technical people? The idea is to remove all the processors that consume cpu/ram to force them benefit from our Big Data systems and not use NiFi to do the actual processing. We would like to leave these capabilities to our data engineering team while shift our analysts to ELT/ELTL paradigm to let them run SQL and benefit from Big Data engines. 

3) what would be recommended set up for multiple decentralized teams? separate NiFi instances when they can support their own jobs while our admin supports all these instances? or one large NiFi cluster when everyone works on the same NiFi cluster? We do not want them to step on each other jobs, see each other failure alerts/bulletins etc. We want to make it look like their team's own environment. Not sure if NiFi policies are mature enough to provide this sort of isolation.

Thanks,
Boris
Reply | Threaded
Open this post in threaded view
|

Re: NiFi-light for analysts

Mark Payne
Hey Boris,

There’s a good bit to unpack here but I’ll try to answer each question.

1) I would say that the target audience for NiFi really is a person with a pretty technical role. Not developers, necessarily, though. We do see a lot of developers using it, as well as data scientists, data engineers, sys admins, etc. So while there may be quite a few tasks that a non-technical person can achieve, it may be hard to expose the platform to someone without a technical background.

That said, I do believe that you’re right about the notion of flow dependencies. I’ve done some work recently to help improve this. For example, NIFI-7476 [1] makes it possible to configure a Process Group in such a way that only a single FlowFile at a time is allowed into the group. And the data is optionally held within the group until that FlowFile has completed processing, even if it’s split up into many parts. Additionally, NIFI-7509 [2] updates the List* processors so that they can use an optional Record Writer. This makes it possible to get a full listing of a directory from ListFile as a single FlowFile. Or a listing of all items in an S3 bucket or an Azure Blob Store, etc. So when that is combined with NIFI-7476, it makes it very easy to process an entire directory of files or an entire bucket, etc. and wait until all processing is complete before data is transferred on to the next task. (Additionally, NIFI-7552 updates this to add attributes indicating FlowFile counts for each Output Port so it’s easy to determine if there were any “processing failures” etc.).

So with all of the above said, I don’t think that it necessarily solves in a simple and generic sense the requirement to complete Task A, then Task B, and then Task C. But it does put us far closer. This may be achievable still with some nesting of Process Groups, etc. but it won’t be completely as straight-forward as I’d like and would perhaps add significantly latency if it’s allowing only a single FlowFile at a time though the Process Group. Perhaps that can be addressed in the future by having the ability to bulk transfer all FlowFiles from Queue A to Queue B, and then allowing a "Batch Input" on a Process Group instead of just “Streaming" vs. "Single FlowFile at a Time.” I do think there will be some future improvements along these lines, though.

2) This should be fairly straight-forward. It would basically be just creating an assembly like the nifi-assembly module but one that doesn’t include all of the nar’s.

3) This probably boils down to some trade-offs and what makes most sense for your organization. A single, large NiFi deployment makes it much easier for the sys admins, generally. The NiFi policies should provide the needed multi-tenancy in terms of authorization. But it doesn’t really offer much in terms of resource isolation. So, if resource isolation is important to you, then using separate NiFi deployments is likely desirable.

Hope this helps!
-Mark





On Jun 28, 2020, at 1:04 PM, Boris Tyukin <[hidden email]> wrote:

Hi guys,

I am thinking to increase the footprint of NiFi in my org to extend it to less technical roles. I have a few questions:

1) is there any plans to support easy dependencies at some point? We are aware of all the current options (wait-notify, kafka, mergerecord/mergecontent etc.) and all of them are still hard and not reliable. For non-technical roles, we really need very stupid simple way to define classical dependencies like run task C only after task A and B are finished. I realize it is a challenge because of the whole concept of NiFi with flowfiles (which we do love being on a technical side of the house), but I really do not want to get another ETL/scheduling tool.

2) is it fairly easy to build and support our custom version of NiFi-light, when we remove all the processors that we do not want to expose to non-technical people? The idea is to remove all the processors that consume cpu/ram to force them benefit from our Big Data systems and not use NiFi to do the actual processing. We would like to leave these capabilities to our data engineering team while shift our analysts to ELT/ELTL paradigm to let them run SQL and benefit from Big Data engines. 

3) what would be recommended set up for multiple decentralized teams? separate NiFi instances when they can support their own jobs while our admin supports all these instances? or one large NiFi cluster when everyone works on the same NiFi cluster? We do not want them to step on each other jobs, see each other failure alerts/bulletins etc. We want to make it look like their team's own environment. Not sure if NiFi policies are mature enough to provide this sort of isolation.

Thanks,
Boris

Reply | Threaded
Open this post in threaded view
|

Re: NiFi-light for analysts

Boris Tyukin
Hi Mark, thanks for the great comments and for working on these improvements. these are great enhancements that we can certainly benefit from - I am thinking of two projects at least we support today. 

As far as making it more user-friendly, at some point I looked at Kylo.io and it was quite an interesting project - not sure if it is alive still - but I liked how they created their own UI/tooling around NiFi. 

I am going to toy with this idea to have a "dumb down" version of NiFi.

On Sun, Jun 28, 2020 at 3:36 PM Mark Payne <[hidden email]> wrote:
Hey Boris,

There’s a good bit to unpack here but I’ll try to answer each question.

1) I would say that the target audience for NiFi really is a person with a pretty technical role. Not developers, necessarily, though. We do see a lot of developers using it, as well as data scientists, data engineers, sys admins, etc. So while there may be quite a few tasks that a non-technical person can achieve, it may be hard to expose the platform to someone without a technical background.

That said, I do believe that you’re right about the notion of flow dependencies. I’ve done some work recently to help improve this. For example, NIFI-7476 [1] makes it possible to configure a Process Group in such a way that only a single FlowFile at a time is allowed into the group. And the data is optionally held within the group until that FlowFile has completed processing, even if it’s split up into many parts. Additionally, NIFI-7509 [2] updates the List* processors so that they can use an optional Record Writer. This makes it possible to get a full listing of a directory from ListFile as a single FlowFile. Or a listing of all items in an S3 bucket or an Azure Blob Store, etc. So when that is combined with NIFI-7476, it makes it very easy to process an entire directory of files or an entire bucket, etc. and wait until all processing is complete before data is transferred on to the next task. (Additionally, NIFI-7552 updates this to add attributes indicating FlowFile counts for each Output Port so it’s easy to determine if there were any “processing failures” etc.).

So with all of the above said, I don’t think that it necessarily solves in a simple and generic sense the requirement to complete Task A, then Task B, and then Task C. But it does put us far closer. This may be achievable still with some nesting of Process Groups, etc. but it won’t be completely as straight-forward as I’d like and would perhaps add significantly latency if it’s allowing only a single FlowFile at a time though the Process Group. Perhaps that can be addressed in the future by having the ability to bulk transfer all FlowFiles from Queue A to Queue B, and then allowing a "Batch Input" on a Process Group instead of just “Streaming" vs. "Single FlowFile at a Time.” I do think there will be some future improvements along these lines, though.

2) This should be fairly straight-forward. It would basically be just creating an assembly like the nifi-assembly module but one that doesn’t include all of the nar’s.

3) This probably boils down to some trade-offs and what makes most sense for your organization. A single, large NiFi deployment makes it much easier for the sys admins, generally. The NiFi policies should provide the needed multi-tenancy in terms of authorization. But it doesn’t really offer much in terms of resource isolation. So, if resource isolation is important to you, then using separate NiFi deployments is likely desirable.

Hope this helps!
-Mark





On Jun 28, 2020, at 1:04 PM, Boris Tyukin <[hidden email]> wrote:

Hi guys,

I am thinking to increase the footprint of NiFi in my org to extend it to less technical roles. I have a few questions:

1) is there any plans to support easy dependencies at some point? We are aware of all the current options (wait-notify, kafka, mergerecord/mergecontent etc.) and all of them are still hard and not reliable. For non-technical roles, we really need very stupid simple way to define classical dependencies like run task C only after task A and B are finished. I realize it is a challenge because of the whole concept of NiFi with flowfiles (which we do love being on a technical side of the house), but I really do not want to get another ETL/scheduling tool.

2) is it fairly easy to build and support our custom version of NiFi-light, when we remove all the processors that we do not want to expose to non-technical people? The idea is to remove all the processors that consume cpu/ram to force them benefit from our Big Data systems and not use NiFi to do the actual processing. We would like to leave these capabilities to our data engineering team while shift our analysts to ELT/ELTL paradigm to let them run SQL and benefit from Big Data engines. 

3) what would be recommended set up for multiple decentralized teams? separate NiFi instances when they can support their own jobs while our admin supports all these instances? or one large NiFi cluster when everyone works on the same NiFi cluster? We do not want them to step on each other jobs, see each other failure alerts/bulletins etc. We want to make it look like their team's own environment. Not sure if NiFi policies are mature enough to provide this sort of isolation.

Thanks,
Boris

Reply | Threaded
Open this post in threaded view
|

Re: NiFi-light for analysts

Mike Thomsen
As far as I can tell, Kylo is dead based on their public github activity.

Mark,

Would it make sense for us to start modularizing nifi-assembly with more profiles? That way people like Boris could run something like this:

mvn install -Pinclude-grpc,include-graph,!include-kafka,!include-mongodb

On Mon, Jun 29, 2020 at 11:20 AM Boris Tyukin <[hidden email]> wrote:
Hi Mark, thanks for the great comments and for working on these improvements. these are great enhancements that we can certainly benefit from - I am thinking of two projects at least we support today. 

As far as making it more user-friendly, at some point I looked at Kylo.io and it was quite an interesting project - not sure if it is alive still - but I liked how they created their own UI/tooling around NiFi. 

I am going to toy with this idea to have a "dumb down" version of NiFi.

On Sun, Jun 28, 2020 at 3:36 PM Mark Payne <[hidden email]> wrote:
Hey Boris,

There’s a good bit to unpack here but I’ll try to answer each question.

1) I would say that the target audience for NiFi really is a person with a pretty technical role. Not developers, necessarily, though. We do see a lot of developers using it, as well as data scientists, data engineers, sys admins, etc. So while there may be quite a few tasks that a non-technical person can achieve, it may be hard to expose the platform to someone without a technical background.

That said, I do believe that you’re right about the notion of flow dependencies. I’ve done some work recently to help improve this. For example, NIFI-7476 [1] makes it possible to configure a Process Group in such a way that only a single FlowFile at a time is allowed into the group. And the data is optionally held within the group until that FlowFile has completed processing, even if it’s split up into many parts. Additionally, NIFI-7509 [2] updates the List* processors so that they can use an optional Record Writer. This makes it possible to get a full listing of a directory from ListFile as a single FlowFile. Or a listing of all items in an S3 bucket or an Azure Blob Store, etc. So when that is combined with NIFI-7476, it makes it very easy to process an entire directory of files or an entire bucket, etc. and wait until all processing is complete before data is transferred on to the next task. (Additionally, NIFI-7552 updates this to add attributes indicating FlowFile counts for each Output Port so it’s easy to determine if there were any “processing failures” etc.).

So with all of the above said, I don’t think that it necessarily solves in a simple and generic sense the requirement to complete Task A, then Task B, and then Task C. But it does put us far closer. This may be achievable still with some nesting of Process Groups, etc. but it won’t be completely as straight-forward as I’d like and would perhaps add significantly latency if it’s allowing only a single FlowFile at a time though the Process Group. Perhaps that can be addressed in the future by having the ability to bulk transfer all FlowFiles from Queue A to Queue B, and then allowing a "Batch Input" on a Process Group instead of just “Streaming" vs. "Single FlowFile at a Time.” I do think there will be some future improvements along these lines, though.

2) This should be fairly straight-forward. It would basically be just creating an assembly like the nifi-assembly module but one that doesn’t include all of the nar’s.

3) This probably boils down to some trade-offs and what makes most sense for your organization. A single, large NiFi deployment makes it much easier for the sys admins, generally. The NiFi policies should provide the needed multi-tenancy in terms of authorization. But it doesn’t really offer much in terms of resource isolation. So, if resource isolation is important to you, then using separate NiFi deployments is likely desirable.

Hope this helps!
-Mark





On Jun 28, 2020, at 1:04 PM, Boris Tyukin <[hidden email]> wrote:

Hi guys,

I am thinking to increase the footprint of NiFi in my org to extend it to less technical roles. I have a few questions:

1) is there any plans to support easy dependencies at some point? We are aware of all the current options (wait-notify, kafka, mergerecord/mergecontent etc.) and all of them are still hard and not reliable. For non-technical roles, we really need very stupid simple way to define classical dependencies like run task C only after task A and B are finished. I realize it is a challenge because of the whole concept of NiFi with flowfiles (which we do love being on a technical side of the house), but I really do not want to get another ETL/scheduling tool.

2) is it fairly easy to build and support our custom version of NiFi-light, when we remove all the processors that we do not want to expose to non-technical people? The idea is to remove all the processors that consume cpu/ram to force them benefit from our Big Data systems and not use NiFi to do the actual processing. We would like to leave these capabilities to our data engineering team while shift our analysts to ELT/ELTL paradigm to let them run SQL and benefit from Big Data engines. 

3) what would be recommended set up for multiple decentralized teams? separate NiFi instances when they can support their own jobs while our admin supports all these instances? or one large NiFi cluster when everyone works on the same NiFi cluster? We do not want them to step on each other jobs, see each other failure alerts/bulletins etc. We want to make it look like their team's own environment. Not sure if NiFi policies are mature enough to provide this sort of isolation.

Thanks,
Boris

Reply | Threaded
Open this post in threaded view
|

Re: NiFi-light for analysts

Joe Witt
That would be a fine option for those users who are capable to run maven builds. I think evolving the nifi registry and nifi integration to source all nars as needed at runtime from the registry would be the best user experience and deployment answer over time.

Thanks

On Mon, Jun 29, 2020 at 9:57 AM Mike Thomsen <[hidden email]> wrote:
As far as I can tell, Kylo is dead based on their public github activity.

Mark,

Would it make sense for us to start modularizing nifi-assembly with more profiles? That way people like Boris could run something like this:

mvn install -Pinclude-grpc,include-graph,!include-kafka,!include-mongodb

On Mon, Jun 29, 2020 at 11:20 AM Boris Tyukin <[hidden email]> wrote:
Hi Mark, thanks for the great comments and for working on these improvements. these are great enhancements that we can certainly benefit from - I am thinking of two projects at least we support today. 

As far as making it more user-friendly, at some point I looked at Kylo.io and it was quite an interesting project - not sure if it is alive still - but I liked how they created their own UI/tooling around NiFi. 

I am going to toy with this idea to have a "dumb down" version of NiFi.

On Sun, Jun 28, 2020 at 3:36 PM Mark Payne <[hidden email]> wrote:
Hey Boris,

There’s a good bit to unpack here but I’ll try to answer each question.

1) I would say that the target audience for NiFi really is a person with a pretty technical role. Not developers, necessarily, though. We do see a lot of developers using it, as well as data scientists, data engineers, sys admins, etc. So while there may be quite a few tasks that a non-technical person can achieve, it may be hard to expose the platform to someone without a technical background.

That said, I do believe that you’re right about the notion of flow dependencies. I’ve done some work recently to help improve this. For example, NIFI-7476 [1] makes it possible to configure a Process Group in such a way that only a single FlowFile at a time is allowed into the group. And the data is optionally held within the group until that FlowFile has completed processing, even if it’s split up into many parts. Additionally, NIFI-7509 [2] updates the List* processors so that they can use an optional Record Writer. This makes it possible to get a full listing of a directory from ListFile as a single FlowFile. Or a listing of all items in an S3 bucket or an Azure Blob Store, etc. So when that is combined with NIFI-7476, it makes it very easy to process an entire directory of files or an entire bucket, etc. and wait until all processing is complete before data is transferred on to the next task. (Additionally, NIFI-7552 updates this to add attributes indicating FlowFile counts for each Output Port so it’s easy to determine if there were any “processing failures” etc.).

So with all of the above said, I don’t think that it necessarily solves in a simple and generic sense the requirement to complete Task A, then Task B, and then Task C. But it does put us far closer. This may be achievable still with some nesting of Process Groups, etc. but it won’t be completely as straight-forward as I’d like and would perhaps add significantly latency if it’s allowing only a single FlowFile at a time though the Process Group. Perhaps that can be addressed in the future by having the ability to bulk transfer all FlowFiles from Queue A to Queue B, and then allowing a "Batch Input" on a Process Group instead of just “Streaming" vs. "Single FlowFile at a Time.” I do think there will be some future improvements along these lines, though.

2) This should be fairly straight-forward. It would basically be just creating an assembly like the nifi-assembly module but one that doesn’t include all of the nar’s.

3) This probably boils down to some trade-offs and what makes most sense for your organization. A single, large NiFi deployment makes it much easier for the sys admins, generally. The NiFi policies should provide the needed multi-tenancy in terms of authorization. But it doesn’t really offer much in terms of resource isolation. So, if resource isolation is important to you, then using separate NiFi deployments is likely desirable.

Hope this helps!
-Mark





On Jun 28, 2020, at 1:04 PM, Boris Tyukin <[hidden email]> wrote:

Hi guys,

I am thinking to increase the footprint of NiFi in my org to extend it to less technical roles. I have a few questions:

1) is there any plans to support easy dependencies at some point? We are aware of all the current options (wait-notify, kafka, mergerecord/mergecontent etc.) and all of them are still hard and not reliable. For non-technical roles, we really need very stupid simple way to define classical dependencies like run task C only after task A and B are finished. I realize it is a challenge because of the whole concept of NiFi with flowfiles (which we do love being on a technical side of the house), but I really do not want to get another ETL/scheduling tool.

2) is it fairly easy to build and support our custom version of NiFi-light, when we remove all the processors that we do not want to expose to non-technical people? The idea is to remove all the processors that consume cpu/ram to force them benefit from our Big Data systems and not use NiFi to do the actual processing. We would like to leave these capabilities to our data engineering team while shift our analysts to ELT/ELTL paradigm to let them run SQL and benefit from Big Data engines. 

3) what would be recommended set up for multiple decentralized teams? separate NiFi instances when they can support their own jobs while our admin supports all these instances? or one large NiFi cluster when everyone works on the same NiFi cluster? We do not want them to step on each other jobs, see each other failure alerts/bulletins etc. We want to make it look like their team's own environment. Not sure if NiFi policies are mature enough to provide this sort of isolation.

Thanks,
Boris

Reply | Threaded
Open this post in threaded view
|

Re: NiFi-light for analysts

Juan Pablo Gardella
I actually do it manually in docker file:

RUN mv /opt/nifi/nifi-current/lib/*.nar /opt/nifi/nifi-current/lib.original/
RUN cp /opt/nifi/nifi-current/lib.original/nifi-avro-nar-$NIFI_VER.nar /opt/nifi/nifi-current/lib
RUN cp /opt/nifi/nifi-current/lib.original/nifi-update-attribute-nar-$NIFI_VER.nar /opt/nifi/nifi-current/lib
RUN cp /opt/nifi/nifi-current/lib.original/nifi-kafka-2-0-nar-$NIFI_VER.nar /opt/nifi/nifi-current/lib
RUN cp /opt/nifi/nifi-current/lib.original/nifi-standard-services-api-nar-$NIFI_VER.nar /opt/nifi/nifi-current/lib
RUN cp /opt/nifi/nifi-current/lib.original/nifi-dbcp-service-nar-$NIFI_VER.nar /opt/nifi/nifi-current/lib
RUN cp /opt/nifi/nifi-current/lib.original/nifi-ldap-iaa-providers-nar-$NIFI_VER.nar /opt/nifi/nifi-current/lib
RUN cp /opt/nifi/nifi-current/lib.original/nifi-framework-nar-$NIFI_VER.nar /opt/nifi/nifi-current/lib
RUN cp /opt/nifi/nifi-current/lib.original/nifi-provenance-repository-nar-$NIFI_VER.nar /opt/nifi/nifi-current/lib
RUN cp /opt/nifi/nifi-current/lib.original/nifi-standard-nar-$NIFI_VER.nar /opt/nifi/nifi-current/lib
RUN cp /opt/nifi/nifi-current/lib.original/nifi-jetty-bundle-$NIFI_VER.nar /opt/nifi/nifi-current/lib
RUN cp /opt/nifi/nifi-current/lib.original/nifi-record-serialization-services-nar-$NIFI_VER.nar /opt/nifi/nifi-current/lib
RUN cp /opt/nifi/nifi-current/lib.original/nifi-registry-nar-$NIFI_VER.nar /opt/nifi/nifi-current/lib
# Custom one
COPY --chown=nifi:nifi  processors/*.nar /opt/nifi/nifi-current/lib/

That allows faster starts.

Juan

On Mon, 29 Jun 2020 at 14:16, Joe Witt <[hidden email]> wrote:
That would be a fine option for those users who are capable to run maven builds. I think evolving the nifi registry and nifi integration to source all nars as needed at runtime from the registry would be the best user experience and deployment answer over time.

Thanks

On Mon, Jun 29, 2020 at 9:57 AM Mike Thomsen <[hidden email]> wrote:
As far as I can tell, Kylo is dead based on their public github activity.

Mark,

Would it make sense for us to start modularizing nifi-assembly with more profiles? That way people like Boris could run something like this:

mvn install -Pinclude-grpc,include-graph,!include-kafka,!include-mongodb

On Mon, Jun 29, 2020 at 11:20 AM Boris Tyukin <[hidden email]> wrote:
Hi Mark, thanks for the great comments and for working on these improvements. these are great enhancements that we can certainly benefit from - I am thinking of two projects at least we support today. 

As far as making it more user-friendly, at some point I looked at Kylo.io and it was quite an interesting project - not sure if it is alive still - but I liked how they created their own UI/tooling around NiFi. 

I am going to toy with this idea to have a "dumb down" version of NiFi.

On Sun, Jun 28, 2020 at 3:36 PM Mark Payne <[hidden email]> wrote:
Hey Boris,

There’s a good bit to unpack here but I’ll try to answer each question.

1) I would say that the target audience for NiFi really is a person with a pretty technical role. Not developers, necessarily, though. We do see a lot of developers using it, as well as data scientists, data engineers, sys admins, etc. So while there may be quite a few tasks that a non-technical person can achieve, it may be hard to expose the platform to someone without a technical background.

That said, I do believe that you’re right about the notion of flow dependencies. I’ve done some work recently to help improve this. For example, NIFI-7476 [1] makes it possible to configure a Process Group in such a way that only a single FlowFile at a time is allowed into the group. And the data is optionally held within the group until that FlowFile has completed processing, even if it’s split up into many parts. Additionally, NIFI-7509 [2] updates the List* processors so that they can use an optional Record Writer. This makes it possible to get a full listing of a directory from ListFile as a single FlowFile. Or a listing of all items in an S3 bucket or an Azure Blob Store, etc. So when that is combined with NIFI-7476, it makes it very easy to process an entire directory of files or an entire bucket, etc. and wait until all processing is complete before data is transferred on to the next task. (Additionally, NIFI-7552 updates this to add attributes indicating FlowFile counts for each Output Port so it’s easy to determine if there were any “processing failures” etc.).

So with all of the above said, I don’t think that it necessarily solves in a simple and generic sense the requirement to complete Task A, then Task B, and then Task C. But it does put us far closer. This may be achievable still with some nesting of Process Groups, etc. but it won’t be completely as straight-forward as I’d like and would perhaps add significantly latency if it’s allowing only a single FlowFile at a time though the Process Group. Perhaps that can be addressed in the future by having the ability to bulk transfer all FlowFiles from Queue A to Queue B, and then allowing a "Batch Input" on a Process Group instead of just “Streaming" vs. "Single FlowFile at a Time.” I do think there will be some future improvements along these lines, though.

2) This should be fairly straight-forward. It would basically be just creating an assembly like the nifi-assembly module but one that doesn’t include all of the nar’s.

3) This probably boils down to some trade-offs and what makes most sense for your organization. A single, large NiFi deployment makes it much easier for the sys admins, generally. The NiFi policies should provide the needed multi-tenancy in terms of authorization. But it doesn’t really offer much in terms of resource isolation. So, if resource isolation is important to you, then using separate NiFi deployments is likely desirable.

Hope this helps!
-Mark





On Jun 28, 2020, at 1:04 PM, Boris Tyukin <[hidden email]> wrote:

Hi guys,

I am thinking to increase the footprint of NiFi in my org to extend it to less technical roles. I have a few questions:

1) is there any plans to support easy dependencies at some point? We are aware of all the current options (wait-notify, kafka, mergerecord/mergecontent etc.) and all of them are still hard and not reliable. For non-technical roles, we really need very stupid simple way to define classical dependencies like run task C only after task A and B are finished. I realize it is a challenge because of the whole concept of NiFi with flowfiles (which we do love being on a technical side of the house), but I really do not want to get another ETL/scheduling tool.

2) is it fairly easy to build and support our custom version of NiFi-light, when we remove all the processors that we do not want to expose to non-technical people? The idea is to remove all the processors that consume cpu/ram to force them benefit from our Big Data systems and not use NiFi to do the actual processing. We would like to leave these capabilities to our data engineering team while shift our analysts to ELT/ELTL paradigm to let them run SQL and benefit from Big Data engines. 

3) what would be recommended set up for multiple decentralized teams? separate NiFi instances when they can support their own jobs while our admin supports all these instances? or one large NiFi cluster when everyone works on the same NiFi cluster? We do not want them to step on each other jobs, see each other failure alerts/bulletins etc. We want to make it look like their team's own environment. Not sure if NiFi policies are mature enough to provide this sort of isolation.

Thanks,
Boris