Connecting Controller Services Automatically

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Connecting Controller Services Automatically

Eric Secules
Hello everyone,

I am running into an issue with automated deployment using nipyapi. We would like to be able to pull down flows from a registry and have them ready to go once all their controller services have been turned on. But there are a few issues. Sometimes the flows that we download from the registry reference controller service IDs that don't exist on this machine because the flow was developed in a different environment. That's easy enough to fix if there is just one applicable controller service, but not when there are two or more.

We have taken the decision to put all our controller services at the top level and have one of each kind we need, rather than have multiple of the same controller service attached to individual process groups.

We are running into a problem where some processors can either connect to a JSONTreeReader or a CSVReader and there's no indication in the ProcessorDTO object which type it was originally connected to, just a GUID of a controller service that doesn't exist in this deployment.

Would it be possible to include the type or name of the controller service in the component.config.descriptors section? Are we going about it the wrong way trying to simplify down to the least number of controller services?

Thanks,
Eric
Reply | Threaded
Open this post in threaded view
|

Re: Connecting Controller Services Automatically

Andy LoPresto
Eric,

I can’t answer all these questions but I would definitely have hesitations around building an expectation that there is only one instance of any given controller service type in an entire canvas. I can think of numerous flows (this may not affect your particular flows, but the concepts still apply) which require multiple instances of the same controller service type to be available: 

* A flow which invokes a mutually-authenticated TLS HTTP API, consumes data, transforms it, and posts it to another mTLS API
* A flow which retrieves objects from one S3 bucket and puts them into an S3 bucket in a different AWS account
* A flow which connects to one database and retrieves data, transforms it, and persists it to another database

If there is only _one_ StandardSSLContextService, AWSCredentialsProviderControllerService, or DBCPConnectionPool available in the entire controller, these flows cannot exist. 

I am not saying the retrieval of new flow versions and the matching of referenced controller services cannot be improved, but I would definitely advise caution before going too far down this path without considering all possible side effects and potential constraints on future flow development.  


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:01 PM, Eric Secules <[hidden email]> wrote:

Hello everyone,

I am running into an issue with automated deployment using nipyapi. We would like to be able to pull down flows from a registry and have them ready to go once all their controller services have been turned on. But there are a few issues. Sometimes the flows that we download from the registry reference controller service IDs that don't exist on this machine because the flow was developed in a different environment. That's easy enough to fix if there is just one applicable controller service, but not when there are two or more.

We have taken the decision to put all our controller services at the top level and have one of each kind we need, rather than have multiple of the same controller service attached to individual process groups.

We are running into a problem where some processors can either connect to a JSONTreeReader or a CSVReader and there's no indication in the ProcessorDTO object which type it was originally connected to, just a GUID of a controller service that doesn't exist in this deployment.

Would it be possible to include the type or name of the controller service in the component.config.descriptors section? Are we going about it the wrong way trying to simplify down to the least number of controller services?

Thanks,
Eric

Reply | Threaded
Open this post in threaded view
|

Re: Connecting Controller Services Automatically

Eric Secules
Hi Andy,

Given that you have a flow which operates on two different S3 accounts for example, how would you do deployment automation? Do you mandate that the controller service with the same ID must exist in both a development and production environment rather than try to connect a processor to a matching controller service?

-Eric

On Fri, May 22, 2020 at 3:44 PM Andy LoPresto <[hidden email]> wrote:
Eric,

I can’t answer all these questions but I would definitely have hesitations around building an expectation that there is only one instance of any given controller service type in an entire canvas. I can think of numerous flows (this may not affect your particular flows, but the concepts still apply) which require multiple instances of the same controller service type to be available: 

* A flow which invokes a mutually-authenticated TLS HTTP API, consumes data, transforms it, and posts it to another mTLS API
* A flow which retrieves objects from one S3 bucket and puts them into an S3 bucket in a different AWS account
* A flow which connects to one database and retrieves data, transforms it, and persists it to another database

If there is only _one_ StandardSSLContextService, AWSCredentialsProviderControllerService, or DBCPConnectionPool available in the entire controller, these flows cannot exist. 

I am not saying the retrieval of new flow versions and the matching of referenced controller services cannot be improved, but I would definitely advise caution before going too far down this path without considering all possible side effects and potential constraints on future flow development.  


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:01 PM, Eric Secules <[hidden email]> wrote:

Hello everyone,

I am running into an issue with automated deployment using nipyapi. We would like to be able to pull down flows from a registry and have them ready to go once all their controller services have been turned on. But there are a few issues. Sometimes the flows that we download from the registry reference controller service IDs that don't exist on this machine because the flow was developed in a different environment. That's easy enough to fix if there is just one applicable controller service, but not when there are two or more.

We have taken the decision to put all our controller services at the top level and have one of each kind we need, rather than have multiple of the same controller service attached to individual process groups.

We are running into a problem where some processors can either connect to a JSONTreeReader or a CSVReader and there's no indication in the ProcessorDTO object which type it was originally connected to, just a GUID of a controller service that doesn't exist in this deployment.

Would it be possible to include the type or name of the controller service in the component.config.descriptors section? Are we going about it the wrong way trying to simplify down to the least number of controller services?

Thanks,
Eric

Reply | Threaded
Open this post in threaded view
|

Re: Connecting Controller Services Automatically

Andy LoPresto
If you want the process to be completely automated, you would have to enforce the controller service IDs to be identical across environments. Otherwise deployment would need a manual intervention to reference the specific controller service in the proper component. 

Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:57 PM, Eric Secules <[hidden email]> wrote:

Hi Andy,

Given that you have a flow which operates on two different S3 accounts for example, how would you do deployment automation? Do you mandate that the controller service with the same ID must exist in both a development and production environment rather than try to connect a processor to a matching controller service?

-Eric

On Fri, May 22, 2020 at 3:44 PM Andy LoPresto <[hidden email]> wrote:
Eric,

I can’t answer all these questions but I would definitely have hesitations around building an expectation that there is only one instance of any given controller service type in an entire canvas. I can think of numerous flows (this may not affect your particular flows, but the concepts still apply) which require multiple instances of the same controller service type to be available: 

* A flow which invokes a mutually-authenticated TLS HTTP API, consumes data, transforms it, and posts it to another mTLS API
* A flow which retrieves objects from one S3 bucket and puts them into an S3 bucket in a different AWS account
* A flow which connects to one database and retrieves data, transforms it, and persists it to another database

If there is only _one_ StandardSSLContextService, AWSCredentialsProviderControllerService, or DBCPConnectionPool available in the entire controller, these flows cannot exist. 

I am not saying the retrieval of new flow versions and the matching of referenced controller services cannot be improved, but I would definitely advise caution before going too far down this path without considering all possible side effects and potential constraints on future flow development.  


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:01 PM, Eric Secules <[hidden email]> wrote:

Hello everyone,

I am running into an issue with automated deployment using nipyapi. We would like to be able to pull down flows from a registry and have them ready to go once all their controller services have been turned on. But there are a few issues. Sometimes the flows that we download from the registry reference controller service IDs that don't exist on this machine because the flow was developed in a different environment. That's easy enough to fix if there is just one applicable controller service, but not when there are two or more.

We have taken the decision to put all our controller services at the top level and have one of each kind we need, rather than have multiple of the same controller service attached to individual process groups.

We are running into a problem where some processors can either connect to a JSONTreeReader or a CSVReader and there's no indication in the ProcessorDTO object which type it was originally connected to, just a GUID of a controller service that doesn't exist in this deployment.

Would it be possible to include the type or name of the controller service in the component.config.descriptors section? Are we going about it the wrong way trying to simplify down to the least number of controller services?

Thanks,
Eric


Reply | Threaded
Open this post in threaded view
|

Re: Connecting Controller Services Automatically

Andrew Grande
Aren't those IDs generated? How can one enforce it?

Andrew

On Sat, May 23, 2020, 10:53 AM Andy LoPresto <[hidden email]> wrote:
If you want the process to be completely automated, you would have to enforce the controller service IDs to be identical across environments. Otherwise deployment would need a manual intervention to reference the specific controller service in the proper component. 

Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:57 PM, Eric Secules <[hidden email]> wrote:

Hi Andy,

Given that you have a flow which operates on two different S3 accounts for example, how would you do deployment automation? Do you mandate that the controller service with the same ID must exist in both a development and production environment rather than try to connect a processor to a matching controller service?

-Eric

On Fri, May 22, 2020 at 3:44 PM Andy LoPresto <[hidden email]> wrote:
Eric,

I can’t answer all these questions but I would definitely have hesitations around building an expectation that there is only one instance of any given controller service type in an entire canvas. I can think of numerous flows (this may not affect your particular flows, but the concepts still apply) which require multiple instances of the same controller service type to be available: 

* A flow which invokes a mutually-authenticated TLS HTTP API, consumes data, transforms it, and posts it to another mTLS API
* A flow which retrieves objects from one S3 bucket and puts them into an S3 bucket in a different AWS account
* A flow which connects to one database and retrieves data, transforms it, and persists it to another database

If there is only _one_ StandardSSLContextService, AWSCredentialsProviderControllerService, or DBCPConnectionPool available in the entire controller, these flows cannot exist. 

I am not saying the retrieval of new flow versions and the matching of referenced controller services cannot be improved, but I would definitely advise caution before going too far down this path without considering all possible side effects and potential constraints on future flow development.  


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:01 PM, Eric Secules <[hidden email]> wrote:

Hello everyone,

I am running into an issue with automated deployment using nipyapi. We would like to be able to pull down flows from a registry and have them ready to go once all their controller services have been turned on. But there are a few issues. Sometimes the flows that we download from the registry reference controller service IDs that don't exist on this machine because the flow was developed in a different environment. That's easy enough to fix if there is just one applicable controller service, but not when there are two or more.

We have taken the decision to put all our controller services at the top level and have one of each kind we need, rather than have multiple of the same controller service attached to individual process groups.

We are running into a problem where some processors can either connect to a JSONTreeReader or a CSVReader and there's no indication in the ProcessorDTO object which type it was originally connected to, just a GUID of a controller service that doesn't exist in this deployment.

Would it be possible to include the type or name of the controller service in the component.config.descriptors section? Are we going about it the wrong way trying to simplify down to the least number of controller services?

Thanks,
Eric


Reply | Threaded
Open this post in threaded view
|

Re: Connecting Controller Services Automatically

Andy LoPresto
My position is that we don’t claim completely automated deployment as a feature, so manually setting the controller service IDs is not exposed. Technically, they are defined in the flow.xml.gz and could be modified by an administrator to be static after generation. This would require frequent manual manipulation of the flow.xml.gz in various environments and frequent restarts of the NiFi service. I do not recommend this. 


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 23, 2020, at 11:05 AM, Andrew Grande <[hidden email]> wrote:

Aren't those IDs generated? How can one enforce it?

Andrew

On Sat, May 23, 2020, 10:53 AM Andy LoPresto <[hidden email]> wrote:
If you want the process to be completely automated, you would have to enforce the controller service IDs to be identical across environments. Otherwise deployment would need a manual intervention to reference the specific controller service in the proper component. 

Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:57 PM, Eric Secules <[hidden email]> wrote:

Hi Andy,

Given that you have a flow which operates on two different S3 accounts for example, how would you do deployment automation? Do you mandate that the controller service with the same ID must exist in both a development and production environment rather than try to connect a processor to a matching controller service?

-Eric

On Fri, May 22, 2020 at 3:44 PM Andy LoPresto <[hidden email]> wrote:
Eric,

I can’t answer all these questions but I would definitely have hesitations around building an expectation that there is only one instance of any given controller service type in an entire canvas. I can think of numerous flows (this may not affect your particular flows, but the concepts still apply) which require multiple instances of the same controller service type to be available: 

* A flow which invokes a mutually-authenticated TLS HTTP API, consumes data, transforms it, and posts it to another mTLS API
* A flow which retrieves objects from one S3 bucket and puts them into an S3 bucket in a different AWS account
* A flow which connects to one database and retrieves data, transforms it, and persists it to another database

If there is only _one_ StandardSSLContextService, AWSCredentialsProviderControllerService, or DBCPConnectionPool available in the entire controller, these flows cannot exist. 

I am not saying the retrieval of new flow versions and the matching of referenced controller services cannot be improved, but I would definitely advise caution before going too far down this path without considering all possible side effects and potential constraints on future flow development.  


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:01 PM, Eric Secules <[hidden email]> wrote:

Hello everyone,

I am running into an issue with automated deployment using nipyapi. We would like to be able to pull down flows from a registry and have them ready to go once all their controller services have been turned on. But there are a few issues. Sometimes the flows that we download from the registry reference controller service IDs that don't exist on this machine because the flow was developed in a different environment. That's easy enough to fix if there is just one applicable controller service, but not when there are two or more.

We have taken the decision to put all our controller services at the top level and have one of each kind we need, rather than have multiple of the same controller service attached to individual process groups.

We are running into a problem where some processors can either connect to a JSONTreeReader or a CSVReader and there's no indication in the ProcessorDTO object which type it was originally connected to, just a GUID of a controller service that doesn't exist in this deployment.

Would it be possible to include the type or name of the controller service in the component.config.descriptors section? Are we going about it the wrong way trying to simplify down to the least number of controller services?

Thanks,
Eric



Reply | Threaded
Open this post in threaded view
|

Re: Connecting Controller Services Automatically

Bryan Bende
If you use registry >= 0.5.0 And nifi >= 1.10.0, then it will auto select external controller services with the same name as long as there is only one of the same type with same name (name is not unique).

On Sat, May 23, 2020 at 3:34 PM Andy LoPresto <[hidden email]> wrote:
My position is that we don’t claim completely automated deployment as a feature, so manually setting the controller service IDs is not exposed. Technically, they are defined in the flow.xml.gz and could be modified by an administrator to be static after generation. This would require frequent manual manipulation of the flow.xml.gz in various environments and frequent restarts of the NiFi service. I do not recommend this. 


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 23, 2020, at 11:05 AM, Andrew Grande <[hidden email]> wrote:

Aren't those IDs generated? How can one enforce it?

Andrew

On Sat, May 23, 2020, 10:53 AM Andy LoPresto <[hidden email]> wrote:
If you want the process to be completely automated, you would have to enforce the controller service IDs to be identical across environments. Otherwise deployment would need a manual intervention to reference the specific controller service in the proper component. 

Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:57 PM, Eric Secules <[hidden email]> wrote:

Hi Andy,

Given that you have a flow which operates on two different S3 accounts for example, how would you do deployment automation? Do you mandate that the controller service with the same ID must exist in both a development and production environment rather than try to connect a processor to a matching controller service?

-Eric

On Fri, May 22, 2020 at 3:44 PM Andy LoPresto <[hidden email]> wrote:
Eric,

I can’t answer all these questions but I would definitely have hesitations around building an expectation that there is only one instance of any given controller service type in an entire canvas. I can think of numerous flows (this may not affect your particular flows, but the concepts still apply) which require multiple instances of the same controller service type to be available: 

* A flow which invokes a mutually-authenticated TLS HTTP API, consumes data, transforms it, and posts it to another mTLS API
* A flow which retrieves objects from one S3 bucket and puts them into an S3 bucket in a different AWS account
* A flow which connects to one database and retrieves data, transforms it, and persists it to another database

If there is only _one_ StandardSSLContextService, AWSCredentialsProviderControllerService, or DBCPConnectionPool available in the entire controller, these flows cannot exist. 

I am not saying the retrieval of new flow versions and the matching of referenced controller services cannot be improved, but I would definitely advise caution before going too far down this path without considering all possible side effects and potential constraints on future flow development.  


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:01 PM, Eric Secules <[hidden email]> wrote:

Hello everyone,

I am running into an issue with automated deployment using nipyapi. We would like to be able to pull down flows from a registry and have them ready to go once all their controller services have been turned on. But there are a few issues. Sometimes the flows that we download from the registry reference controller service IDs that don't exist on this machine because the flow was developed in a different environment. That's easy enough to fix if there is just one applicable controller service, but not when there are two or more.

We have taken the decision to put all our controller services at the top level and have one of each kind we need, rather than have multiple of the same controller service attached to individual process groups.

We are running into a problem where some processors can either connect to a JSONTreeReader or a CSVReader and there's no indication in the ProcessorDTO object which type it was originally connected to, just a GUID of a controller service that doesn't exist in this deployment.

Would it be possible to include the type or name of the controller service in the component.config.descriptors section? Are we going about it the wrong way trying to simplify down to the least number of controller services?

Thanks,
Eric



--
Sent from Gmail Mobile
Reply | Threaded
Open this post in threaded view
|

Re: Connecting Controller Services Automatically

Andy LoPresto
Yes, I should have clarified this. Thanks Bryan. This is the solution for the generic use case. The original question was about reducing the controller to only a single instance of a specific controller service implementation, which is how the tangent got started. 


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 23, 2020, at 3:27 PM, Bryan Bende <[hidden email]> wrote:

If you use registry >= 0.5.0 And nifi >= 1.10.0, then it will auto select external controller services with the same name as long as there is only one of the same type with same name (name is not unique).

On Sat, May 23, 2020 at 3:34 PM Andy LoPresto <[hidden email]> wrote:
My position is that we don’t claim completely automated deployment as a feature, so manually setting the controller service IDs is not exposed. Technically, they are defined in the flow.xml.gz and could be modified by an administrator to be static after generation. This would require frequent manual manipulation of the flow.xml.gz in various environments and frequent restarts of the NiFi service. I do not recommend this. 


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 23, 2020, at 11:05 AM, Andrew Grande <[hidden email]> wrote:

Aren't those IDs generated? How can one enforce it?

Andrew

On Sat, May 23, 2020, 10:53 AM Andy LoPresto <[hidden email]> wrote:
If you want the process to be completely automated, you would have to enforce the controller service IDs to be identical across environments. Otherwise deployment would need a manual intervention to reference the specific controller service in the proper component. 

Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:57 PM, Eric Secules <[hidden email]> wrote:

Hi Andy,

Given that you have a flow which operates on two different S3 accounts for example, how would you do deployment automation? Do you mandate that the controller service with the same ID must exist in both a development and production environment rather than try to connect a processor to a matching controller service?

-Eric

On Fri, May 22, 2020 at 3:44 PM Andy LoPresto <[hidden email]> wrote:
Eric,

I can’t answer all these questions but I would definitely have hesitations around building an expectation that there is only one instance of any given controller service type in an entire canvas. I can think of numerous flows (this may not affect your particular flows, but the concepts still apply) which require multiple instances of the same controller service type to be available: 

* A flow which invokes a mutually-authenticated TLS HTTP API, consumes data, transforms it, and posts it to another mTLS API
* A flow which retrieves objects from one S3 bucket and puts them into an S3 bucket in a different AWS account
* A flow which connects to one database and retrieves data, transforms it, and persists it to another database

If there is only _one_ StandardSSLContextService, AWSCredentialsProviderControllerService, or DBCPConnectionPool available in the entire controller, these flows cannot exist. 

I am not saying the retrieval of new flow versions and the matching of referenced controller services cannot be improved, but I would definitely advise caution before going too far down this path without considering all possible side effects and potential constraints on future flow development.  


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:01 PM, Eric Secules <[hidden email]> wrote:

Hello everyone,

I am running into an issue with automated deployment using nipyapi. We would like to be able to pull down flows from a registry and have them ready to go once all their controller services have been turned on. But there are a few issues. Sometimes the flows that we download from the registry reference controller service IDs that don't exist on this machine because the flow was developed in a different environment. That's easy enough to fix if there is just one applicable controller service, but not when there are two or more.

We have taken the decision to put all our controller services at the top level and have one of each kind we need, rather than have multiple of the same controller service attached to individual process groups.

We are running into a problem where some processors can either connect to a JSONTreeReader or a CSVReader and there's no indication in the ProcessorDTO object which type it was originally connected to, just a GUID of a controller service that doesn't exist in this deployment.

Would it be possible to include the type or name of the controller service in the component.config.descriptors section? Are we going about it the wrong way trying to simplify down to the least number of controller services?

Thanks,
Eric



--
Sent from Gmail Mobile

Reply | Threaded
Open this post in threaded view
|

Re: Connecting Controller Services Automatically

Eric Secules
In reply to this post by Bryan Bende
Hi Bryan,

I have noticed this behaviour sometimes, but not all the time I am running the latest registry and NiFi versions. I haven't found a conclusive pattern but I have a hunch that it has to do with having versioned process groups within versioned process groups. My deployment strategy is this:
  • Have an outer process group which only contains controller services, called the "Controller Container"
    • For now I just have one controller service per type of controller service.
  • When deploying, download all production flows inside the Controller Container.
  • I noticed that some of the controller services find their match, but others don't leaving me with roughly 70 invalid processors out of 800.
If you could point me in the right direction of the code which is supposed to do the matching I might be able to debug better.

Thanks,
Eric

On Sat, May 23, 2020 at 3:27 PM Bryan Bende <[hidden email]> wrote:
If you use registry >= 0.5.0 And nifi >= 1.10.0, then it will auto select external controller services with the same name as long as there is only one of the same type with same name (name is not unique).

On Sat, May 23, 2020 at 3:34 PM Andy LoPresto <[hidden email]> wrote:
My position is that we don’t claim completely automated deployment as a feature, so manually setting the controller service IDs is not exposed. Technically, they are defined in the flow.xml.gz and could be modified by an administrator to be static after generation. This would require frequent manual manipulation of the flow.xml.gz in various environments and frequent restarts of the NiFi service. I do not recommend this. 


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 23, 2020, at 11:05 AM, Andrew Grande <[hidden email]> wrote:

Aren't those IDs generated? How can one enforce it?

Andrew

On Sat, May 23, 2020, 10:53 AM Andy LoPresto <[hidden email]> wrote:
If you want the process to be completely automated, you would have to enforce the controller service IDs to be identical across environments. Otherwise deployment would need a manual intervention to reference the specific controller service in the proper component. 

Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:57 PM, Eric Secules <[hidden email]> wrote:

Hi Andy,

Given that you have a flow which operates on two different S3 accounts for example, how would you do deployment automation? Do you mandate that the controller service with the same ID must exist in both a development and production environment rather than try to connect a processor to a matching controller service?

-Eric

On Fri, May 22, 2020 at 3:44 PM Andy LoPresto <[hidden email]> wrote:
Eric,

I can’t answer all these questions but I would definitely have hesitations around building an expectation that there is only one instance of any given controller service type in an entire canvas. I can think of numerous flows (this may not affect your particular flows, but the concepts still apply) which require multiple instances of the same controller service type to be available: 

* A flow which invokes a mutually-authenticated TLS HTTP API, consumes data, transforms it, and posts it to another mTLS API
* A flow which retrieves objects from one S3 bucket and puts them into an S3 bucket in a different AWS account
* A flow which connects to one database and retrieves data, transforms it, and persists it to another database

If there is only _one_ StandardSSLContextService, AWSCredentialsProviderControllerService, or DBCPConnectionPool available in the entire controller, these flows cannot exist. 

I am not saying the retrieval of new flow versions and the matching of referenced controller services cannot be improved, but I would definitely advise caution before going too far down this path without considering all possible side effects and potential constraints on future flow development.  


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:01 PM, Eric Secules <[hidden email]> wrote:

Hello everyone,

I am running into an issue with automated deployment using nipyapi. We would like to be able to pull down flows from a registry and have them ready to go once all their controller services have been turned on. But there are a few issues. Sometimes the flows that we download from the registry reference controller service IDs that don't exist on this machine because the flow was developed in a different environment. That's easy enough to fix if there is just one applicable controller service, but not when there are two or more.

We have taken the decision to put all our controller services at the top level and have one of each kind we need, rather than have multiple of the same controller service attached to individual process groups.

We are running into a problem where some processors can either connect to a JSONTreeReader or a CSVReader and there's no indication in the ProcessorDTO object which type it was originally connected to, just a GUID of a controller service that doesn't exist in this deployment.

Would it be possible to include the type or name of the controller service in the component.config.descriptors section? Are we going about it the wrong way trying to simplify down to the least number of controller services?

Thanks,
Eric



--
Sent from Gmail Mobile
Reply | Threaded
Open this post in threaded view
|

Re: Connecting Controller Services Automatically

Andrew Grande
Maybe something is going on with specific types or hierarchies. I've noticed DefaultSslContext didn't get assigned, even though it was the only one available. Does autowiring logic apply to this one?

Andrew

On Sat, May 23, 2020, 3:54 PM Eric Secules <[hidden email]> wrote:
Hi Bryan,

I have noticed this behaviour sometimes, but not all the time I am running the latest registry and NiFi versions. I haven't found a conclusive pattern but I have a hunch that it has to do with having versioned process groups within versioned process groups. My deployment strategy is this:
  • Have an outer process group which only contains controller services, called the "Controller Container"
    • For now I just have one controller service per type of controller service.
  • When deploying, download all production flows inside the Controller Container.
  • I noticed that some of the controller services find their match, but others don't leaving me with roughly 70 invalid processors out of 800.
If you could point me in the right direction of the code which is supposed to do the matching I might be able to debug better.

Thanks,
Eric

On Sat, May 23, 2020 at 3:27 PM Bryan Bende <[hidden email]> wrote:
If you use registry >= 0.5.0 And nifi >= 1.10.0, then it will auto select external controller services with the same name as long as there is only one of the same type with same name (name is not unique).

On Sat, May 23, 2020 at 3:34 PM Andy LoPresto <[hidden email]> wrote:
My position is that we don’t claim completely automated deployment as a feature, so manually setting the controller service IDs is not exposed. Technically, they are defined in the flow.xml.gz and could be modified by an administrator to be static after generation. This would require frequent manual manipulation of the flow.xml.gz in various environments and frequent restarts of the NiFi service. I do not recommend this. 


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 23, 2020, at 11:05 AM, Andrew Grande <[hidden email]> wrote:

Aren't those IDs generated? How can one enforce it?

Andrew

On Sat, May 23, 2020, 10:53 AM Andy LoPresto <[hidden email]> wrote:
If you want the process to be completely automated, you would have to enforce the controller service IDs to be identical across environments. Otherwise deployment would need a manual intervention to reference the specific controller service in the proper component. 

Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:57 PM, Eric Secules <[hidden email]> wrote:

Hi Andy,

Given that you have a flow which operates on two different S3 accounts for example, how would you do deployment automation? Do you mandate that the controller service with the same ID must exist in both a development and production environment rather than try to connect a processor to a matching controller service?

-Eric

On Fri, May 22, 2020 at 3:44 PM Andy LoPresto <[hidden email]> wrote:
Eric,

I can’t answer all these questions but I would definitely have hesitations around building an expectation that there is only one instance of any given controller service type in an entire canvas. I can think of numerous flows (this may not affect your particular flows, but the concepts still apply) which require multiple instances of the same controller service type to be available: 

* A flow which invokes a mutually-authenticated TLS HTTP API, consumes data, transforms it, and posts it to another mTLS API
* A flow which retrieves objects from one S3 bucket and puts them into an S3 bucket in a different AWS account
* A flow which connects to one database and retrieves data, transforms it, and persists it to another database

If there is only _one_ StandardSSLContextService, AWSCredentialsProviderControllerService, or DBCPConnectionPool available in the entire controller, these flows cannot exist. 

I am not saying the retrieval of new flow versions and the matching of referenced controller services cannot be improved, but I would definitely advise caution before going too far down this path without considering all possible side effects and potential constraints on future flow development.  


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:01 PM, Eric Secules <[hidden email]> wrote:

Hello everyone,

I am running into an issue with automated deployment using nipyapi. We would like to be able to pull down flows from a registry and have them ready to go once all their controller services have been turned on. But there are a few issues. Sometimes the flows that we download from the registry reference controller service IDs that don't exist on this machine because the flow was developed in a different environment. That's easy enough to fix if there is just one applicable controller service, but not when there are two or more.

We have taken the decision to put all our controller services at the top level and have one of each kind we need, rather than have multiple of the same controller service attached to individual process groups.

We are running into a problem where some processors can either connect to a JSONTreeReader or a CSVReader and there's no indication in the ProcessorDTO object which type it was originally connected to, just a GUID of a controller service that doesn't exist in this deployment.

Would it be possible to include the type or name of the controller service in the component.config.descriptors section? Are we going about it the wrong way trying to simplify down to the least number of controller services?

Thanks,
Eric



--
Sent from Gmail Mobile
Reply | Threaded
Open this post in threaded view
|

Re: Connecting Controller Services Automatically

Bryan Bende
Do you have any controller services in process groups above the “controller container”?

I can’t remember if it is based on only the immediate parent, or the entire hierarchy.

On Sat, May 23, 2020 at 8:00 PM Andrew Grande <[hidden email]> wrote:
Maybe something is going on with specific types or hierarchies. I've noticed DefaultSslContext didn't get assigned, even though it was the only one available. Does autowiring logic apply to this one?

Andrew

On Sat, May 23, 2020, 3:54 PM Eric Secules <[hidden email]> wrote:
Hi Bryan,

I have noticed this behaviour sometimes, but not all the time I am running the latest registry and NiFi versions. I haven't found a conclusive pattern but I have a hunch that it has to do with having versioned process groups within versioned process groups. My deployment strategy is this:
  • Have an outer process group which only contains controller services, called the "Controller Container"
    • For now I just have one controller service per type of controller service.
  • When deploying, download all production flows inside the Controller Container.
  • I noticed that some of the controller services find their match, but others don't leaving me with roughly 70 invalid processors out of 800.
If you could point me in the right direction of the code which is supposed to do the matching I might be able to debug better.

Thanks,
Eric

On Sat, May 23, 2020 at 3:27 PM Bryan Bende <[hidden email]> wrote:
If you use registry >= 0.5.0 And nifi >= 1.10.0, then it will auto select external controller services with the same name as long as there is only one of the same type with same name (name is not unique).

On Sat, May 23, 2020 at 3:34 PM Andy LoPresto <[hidden email]> wrote:
My position is that we don’t claim completely automated deployment as a feature, so manually setting the controller service IDs is not exposed. Technically, they are defined in the flow.xml.gz and could be modified by an administrator to be static after generation. This would require frequent manual manipulation of the flow.xml.gz in various environments and frequent restarts of the NiFi service. I do not recommend this. 


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 23, 2020, at 11:05 AM, Andrew Grande <[hidden email]> wrote:

Aren't those IDs generated? How can one enforce it?

Andrew

On Sat, May 23, 2020, 10:53 AM Andy LoPresto <[hidden email]> wrote:
If you want the process to be completely automated, you would have to enforce the controller service IDs to be identical across environments. Otherwise deployment would need a manual intervention to reference the specific controller service in the proper component. 

Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:57 PM, Eric Secules <[hidden email]> wrote:

Hi Andy,

Given that you have a flow which operates on two different S3 accounts for example, how would you do deployment automation? Do you mandate that the controller service with the same ID must exist in both a development and production environment rather than try to connect a processor to a matching controller service?

-Eric

On Fri, May 22, 2020 at 3:44 PM Andy LoPresto <[hidden email]> wrote:
Eric,

I can’t answer all these questions but I would definitely have hesitations around building an expectation that there is only one instance of any given controller service type in an entire canvas. I can think of numerous flows (this may not affect your particular flows, but the concepts still apply) which require multiple instances of the same controller service type to be available: 

* A flow which invokes a mutually-authenticated TLS HTTP API, consumes data, transforms it, and posts it to another mTLS API
* A flow which retrieves objects from one S3 bucket and puts them into an S3 bucket in a different AWS account
* A flow which connects to one database and retrieves data, transforms it, and persists it to another database

If there is only _one_ StandardSSLContextService, AWSCredentialsProviderControllerService, or DBCPConnectionPool available in the entire controller, these flows cannot exist. 

I am not saying the retrieval of new flow versions and the matching of referenced controller services cannot be improved, but I would definitely advise caution before going too far down this path without considering all possible side effects and potential constraints on future flow development.  


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:01 PM, Eric Secules <[hidden email]> wrote:

Hello everyone,

I am running into an issue with automated deployment using nipyapi. We would like to be able to pull down flows from a registry and have them ready to go once all their controller services have been turned on. But there are a few issues. Sometimes the flows that we download from the registry reference controller service IDs that don't exist on this machine because the flow was developed in a different environment. That's easy enough to fix if there is just one applicable controller service, but not when there are two or more.

We have taken the decision to put all our controller services at the top level and have one of each kind we need, rather than have multiple of the same controller service attached to individual process groups.

We are running into a problem where some processors can either connect to a JSONTreeReader or a CSVReader and there's no indication in the ProcessorDTO object which type it was originally connected to, just a GUID of a controller service that doesn't exist in this deployment.

Would it be possible to include the type or name of the controller service in the component.config.descriptors section? Are we going about it the wrong way trying to simplify down to the least number of controller services?

Thanks,
Eric



--
Sent from Gmail Mobile
--
Sent from Gmail Mobile
Reply | Threaded
Open this post in threaded view
|

Re: Connecting Controller Services Automatically

Mark Payne
It should be hierarchical. The Set of Controller Services to match by name is obtained by calling `parentGroup.getControllerServices(true)` where parentGroup is the Process Group that components are to be added to.

@Eric, the code that does the matching can be found at [1].

Thanks
Mark





On May 23, 2020, at 8:31 PM, Bryan Bende <[hidden email]> wrote:

Do you have any controller services in process groups above the “controller container”?

I can’t remember if it is based on only the immediate parent, or the entire hierarchy.

On Sat, May 23, 2020 at 8:00 PM Andrew Grande <[hidden email]> wrote:
Maybe something is going on with specific types or hierarchies. I've noticed DefaultSslContext didn't get assigned, even though it was the only one available. Does autowiring logic apply to this one?

Andrew

On Sat, May 23, 2020, 3:54 PM Eric Secules <[hidden email]> wrote:
Hi Bryan,

I have noticed this behaviour sometimes, but not all the time I am running the latest registry and NiFi versions. I haven't found a conclusive pattern but I have a hunch that it has to do with having versioned process groups within versioned process groups. My deployment strategy is this:
  • Have an outer process group which only contains controller services, called the "Controller Container"
    • For now I just have one controller service per type of controller service.
  • When deploying, download all production flows inside the Controller Container.
  • I noticed that some of the controller services find their match, but others don't leaving me with roughly 70 invalid processors out of 800.
If you could point me in the right direction of the code which is supposed to do the matching I might be able to debug better.

Thanks,
Eric

On Sat, May 23, 2020 at 3:27 PM Bryan Bende <[hidden email]> wrote:
If you use registry >= 0.5.0 And nifi >= 1.10.0, then it will auto select external controller services with the same name as long as there is only one of the same type with same name (name is not unique).

On Sat, May 23, 2020 at 3:34 PM Andy LoPresto <[hidden email]> wrote:
My position is that we don’t claim completely automated deployment as a feature, so manually setting the controller service IDs is not exposed. Technically, they are defined in the flow.xml.gz and could be modified by an administrator to be static after generation. This would require frequent manual manipulation of the flow.xml.gz in various environments and frequent restarts of the NiFi service. I do not recommend this. 


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 23, 2020, at 11:05 AM, Andrew Grande <[hidden email]> wrote:

Aren't those IDs generated? How can one enforce it?

Andrew

On Sat, May 23, 2020, 10:53 AM Andy LoPresto <[hidden email]> wrote:
If you want the process to be completely automated, you would have to enforce the controller service IDs to be identical across environments. Otherwise deployment would need a manual intervention to reference the specific controller service in the proper component. 

Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:57 PM, Eric Secules <[hidden email]> wrote:

Hi Andy,

Given that you have a flow which operates on two different S3 accounts for example, how would you do deployment automation? Do you mandate that the controller service with the same ID must exist in both a development and production environment rather than try to connect a processor to a matching controller service?

-Eric

On Fri, May 22, 2020 at 3:44 PM Andy LoPresto <[hidden email]> wrote:
Eric,

I can’t answer all these questions but I would definitely have hesitations around building an expectation that there is only one instance of any given controller service type in an entire canvas. I can think of numerous flows (this may not affect your particular flows, but the concepts still apply) which require multiple instances of the same controller service type to be available: 

* A flow which invokes a mutually-authenticated TLS HTTP API, consumes data, transforms it, and posts it to another mTLS API
* A flow which retrieves objects from one S3 bucket and puts them into an S3 bucket in a different AWS account
* A flow which connects to one database and retrieves data, transforms it, and persists it to another database

If there is only _one_ StandardSSLContextService, AWSCredentialsProviderControllerService, or DBCPConnectionPool available in the entire controller, these flows cannot exist. 

I am not saying the retrieval of new flow versions and the matching of referenced controller services cannot be improved, but I would definitely advise caution before going too far down this path without considering all possible side effects and potential constraints on future flow development.  


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:01 PM, Eric Secules <[hidden email]> wrote:

Hello everyone,

I am running into an issue with automated deployment using nipyapi. We would like to be able to pull down flows from a registry and have them ready to go once all their controller services have been turned on. But there are a few issues. Sometimes the flows that we download from the registry reference controller service IDs that don't exist on this machine because the flow was developed in a different environment. That's easy enough to fix if there is just one applicable controller service, but not when there are two or more.

We have taken the decision to put all our controller services at the top level and have one of each kind we need, rather than have multiple of the same controller service attached to individual process groups.

We are running into a problem where some processors can either connect to a JSONTreeReader or a CSVReader and there's no indication in the ProcessorDTO object which type it was originally connected to, just a GUID of a controller service that doesn't exist in this deployment.

Would it be possible to include the type or name of the controller service in the component.config.descriptors section? Are we going about it the wrong way trying to simplify down to the least number of controller services?

Thanks,
Eric



--
Sent from Gmail Mobile
--
Sent from Gmail Mobile

Reply | Threaded
Open this post in threaded view
|

Re: Connecting Controller Services Automatically

Eric Secules
Thanks for the link Mark,

I have found out that my problems are caused by this `continue` [1]. I noticed the following state when I stopped the debugger at [1]. At this moment it's importing a versioned flow (PG C) inside a versioned flow (NiFi Flow --> PG A --> PG B --> PG C). The issue is that the externalControllerServiceReferences is populated with external controller references for PG B. I think the issue is that at this point is that externalControllerServiceReferences does not contain the external references from PG C.
Notice that propertyValue is not in externalControllerServiceReferences.
Screen Shot 2020-05-25 at 10.05.08 AM.png
Screen Shot 2020-05-25 at 10.04.57 AM.png
In order to get it into this state, what I have done is:
  • Import PG A
  • Import PG B into PG A (PG B is versioned to contain PG C)
This is different behaviour from these steps. When doing these steps all of the controller services are properly reconnected.
  • Import PG A
  • import PG B into PG A
  • manually import PG C into PG B
I believe the problem is that the externalControllerServiceReferences are not reassigned when pulling a versioned process group inside a versioned process group.

Thanks,
Eric



On Sat, May 23, 2020 at 5:59 PM Mark Payne <[hidden email]> wrote:
It should be hierarchical. The Set of Controller Services to match by name is obtained by calling `parentGroup.getControllerServices(true)` where parentGroup is the Process Group that components are to be added to.

@Eric, the code that does the matching can be found at [1].

Thanks
Mark





On May 23, 2020, at 8:31 PM, Bryan Bende <[hidden email]> wrote:

Do you have any controller services in process groups above the “controller container”?

I can’t remember if it is based on only the immediate parent, or the entire hierarchy.

On Sat, May 23, 2020 at 8:00 PM Andrew Grande <[hidden email]> wrote:
Maybe something is going on with specific types or hierarchies. I've noticed DefaultSslContext didn't get assigned, even though it was the only one available. Does autowiring logic apply to this one?

Andrew

On Sat, May 23, 2020, 3:54 PM Eric Secules <[hidden email]> wrote:
Hi Bryan,

I have noticed this behaviour sometimes, but not all the time I am running the latest registry and NiFi versions. I haven't found a conclusive pattern but I have a hunch that it has to do with having versioned process groups within versioned process groups. My deployment strategy is this:
  • Have an outer process group which only contains controller services, called the "Controller Container"
    • For now I just have one controller service per type of controller service.
  • When deploying, download all production flows inside the Controller Container.
  • I noticed that some of the controller services find their match, but others don't leaving me with roughly 70 invalid processors out of 800.
If you could point me in the right direction of the code which is supposed to do the matching I might be able to debug better.

Thanks,
Eric

On Sat, May 23, 2020 at 3:27 PM Bryan Bende <[hidden email]> wrote:
If you use registry >= 0.5.0 And nifi >= 1.10.0, then it will auto select external controller services with the same name as long as there is only one of the same type with same name (name is not unique).

On Sat, May 23, 2020 at 3:34 PM Andy LoPresto <[hidden email]> wrote:
My position is that we don’t claim completely automated deployment as a feature, so manually setting the controller service IDs is not exposed. Technically, they are defined in the flow.xml.gz and could be modified by an administrator to be static after generation. This would require frequent manual manipulation of the flow.xml.gz in various environments and frequent restarts of the NiFi service. I do not recommend this. 


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 23, 2020, at 11:05 AM, Andrew Grande <[hidden email]> wrote:

Aren't those IDs generated? How can one enforce it?

Andrew

On Sat, May 23, 2020, 10:53 AM Andy LoPresto <[hidden email]> wrote:
If you want the process to be completely automated, you would have to enforce the controller service IDs to be identical across environments. Otherwise deployment would need a manual intervention to reference the specific controller service in the proper component. 

Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:57 PM, Eric Secules <[hidden email]> wrote:

Hi Andy,

Given that you have a flow which operates on two different S3 accounts for example, how would you do deployment automation? Do you mandate that the controller service with the same ID must exist in both a development and production environment rather than try to connect a processor to a matching controller service?

-Eric

On Fri, May 22, 2020 at 3:44 PM Andy LoPresto <[hidden email]> wrote:
Eric,

I can’t answer all these questions but I would definitely have hesitations around building an expectation that there is only one instance of any given controller service type in an entire canvas. I can think of numerous flows (this may not affect your particular flows, but the concepts still apply) which require multiple instances of the same controller service type to be available: 

* A flow which invokes a mutually-authenticated TLS HTTP API, consumes data, transforms it, and posts it to another mTLS API
* A flow which retrieves objects from one S3 bucket and puts them into an S3 bucket in a different AWS account
* A flow which connects to one database and retrieves data, transforms it, and persists it to another database

If there is only _one_ StandardSSLContextService, AWSCredentialsProviderControllerService, or DBCPConnectionPool available in the entire controller, these flows cannot exist. 

I am not saying the retrieval of new flow versions and the matching of referenced controller services cannot be improved, but I would definitely advise caution before going too far down this path without considering all possible side effects and potential constraints on future flow development.  


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:01 PM, Eric Secules <[hidden email]> wrote:

Hello everyone,

I am running into an issue with automated deployment using nipyapi. We would like to be able to pull down flows from a registry and have them ready to go once all their controller services have been turned on. But there are a few issues. Sometimes the flows that we download from the registry reference controller service IDs that don't exist on this machine because the flow was developed in a different environment. That's easy enough to fix if there is just one applicable controller service, but not when there are two or more.

We have taken the decision to put all our controller services at the top level and have one of each kind we need, rather than have multiple of the same controller service attached to individual process groups.

We are running into a problem where some processors can either connect to a JSONTreeReader or a CSVReader and there's no indication in the ProcessorDTO object which type it was originally connected to, just a GUID of a controller service that doesn't exist in this deployment.

Would it be possible to include the type or name of the controller service in the component.config.descriptors section? Are we going about it the wrong way trying to simplify down to the least number of controller services?

Thanks,
Eric



--
Sent from Gmail Mobile
--
Sent from Gmail Mobile

Reply | Threaded
Open this post in threaded view
|

Re: Connecting Controller Services Automatically

Eric Secules
I have put together a bug on NiFi and added steps to reproduce with a simple template.

-Eric

On Mon, May 25, 2020 at 10:28 AM Eric Secules <[hidden email]> wrote:
Thanks for the link Mark,

I have found out that my problems are caused by this `continue` [1]. I noticed the following state when I stopped the debugger at [1]. At this moment it's importing a versioned flow (PG C) inside a versioned flow (NiFi Flow --> PG A --> PG B --> PG C). The issue is that the externalControllerServiceReferences is populated with external controller references for PG B. I think the issue is that at this point is that externalControllerServiceReferences does not contain the external references from PG C.
Notice that propertyValue is not in externalControllerServiceReferences.
Screen Shot 2020-05-25 at 10.05.08 AM.png
Screen Shot 2020-05-25 at 10.04.57 AM.png
In order to get it into this state, what I have done is:
  • Import PG A
  • Import PG B into PG A (PG B is versioned to contain PG C)
This is different behaviour from these steps. When doing these steps all of the controller services are properly reconnected.
  • Import PG A
  • import PG B into PG A
  • manually import PG C into PG B
I believe the problem is that the externalControllerServiceReferences are not reassigned when pulling a versioned process group inside a versioned process group.

Thanks,
Eric



On Sat, May 23, 2020 at 5:59 PM Mark Payne <[hidden email]> wrote:
It should be hierarchical. The Set of Controller Services to match by name is obtained by calling `parentGroup.getControllerServices(true)` where parentGroup is the Process Group that components are to be added to.

@Eric, the code that does the matching can be found at [1].

Thanks
Mark





On May 23, 2020, at 8:31 PM, Bryan Bende <[hidden email]> wrote:

Do you have any controller services in process groups above the “controller container”?

I can’t remember if it is based on only the immediate parent, or the entire hierarchy.

On Sat, May 23, 2020 at 8:00 PM Andrew Grande <[hidden email]> wrote:
Maybe something is going on with specific types or hierarchies. I've noticed DefaultSslContext didn't get assigned, even though it was the only one available. Does autowiring logic apply to this one?

Andrew

On Sat, May 23, 2020, 3:54 PM Eric Secules <[hidden email]> wrote:
Hi Bryan,

I have noticed this behaviour sometimes, but not all the time I am running the latest registry and NiFi versions. I haven't found a conclusive pattern but I have a hunch that it has to do with having versioned process groups within versioned process groups. My deployment strategy is this:
  • Have an outer process group which only contains controller services, called the "Controller Container"
    • For now I just have one controller service per type of controller service.
  • When deploying, download all production flows inside the Controller Container.
  • I noticed that some of the controller services find their match, but others don't leaving me with roughly 70 invalid processors out of 800.
If you could point me in the right direction of the code which is supposed to do the matching I might be able to debug better.

Thanks,
Eric

On Sat, May 23, 2020 at 3:27 PM Bryan Bende <[hidden email]> wrote:
If you use registry >= 0.5.0 And nifi >= 1.10.0, then it will auto select external controller services with the same name as long as there is only one of the same type with same name (name is not unique).

On Sat, May 23, 2020 at 3:34 PM Andy LoPresto <[hidden email]> wrote:
My position is that we don’t claim completely automated deployment as a feature, so manually setting the controller service IDs is not exposed. Technically, they are defined in the flow.xml.gz and could be modified by an administrator to be static after generation. This would require frequent manual manipulation of the flow.xml.gz in various environments and frequent restarts of the NiFi service. I do not recommend this. 


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 23, 2020, at 11:05 AM, Andrew Grande <[hidden email]> wrote:

Aren't those IDs generated? How can one enforce it?

Andrew

On Sat, May 23, 2020, 10:53 AM Andy LoPresto <[hidden email]> wrote:
If you want the process to be completely automated, you would have to enforce the controller service IDs to be identical across environments. Otherwise deployment would need a manual intervention to reference the specific controller service in the proper component. 

Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:57 PM, Eric Secules <[hidden email]> wrote:

Hi Andy,

Given that you have a flow which operates on two different S3 accounts for example, how would you do deployment automation? Do you mandate that the controller service with the same ID must exist in both a development and production environment rather than try to connect a processor to a matching controller service?

-Eric

On Fri, May 22, 2020 at 3:44 PM Andy LoPresto <[hidden email]> wrote:
Eric,

I can’t answer all these questions but I would definitely have hesitations around building an expectation that there is only one instance of any given controller service type in an entire canvas. I can think of numerous flows (this may not affect your particular flows, but the concepts still apply) which require multiple instances of the same controller service type to be available: 

* A flow which invokes a mutually-authenticated TLS HTTP API, consumes data, transforms it, and posts it to another mTLS API
* A flow which retrieves objects from one S3 bucket and puts them into an S3 bucket in a different AWS account
* A flow which connects to one database and retrieves data, transforms it, and persists it to another database

If there is only _one_ StandardSSLContextService, AWSCredentialsProviderControllerService, or DBCPConnectionPool available in the entire controller, these flows cannot exist. 

I am not saying the retrieval of new flow versions and the matching of referenced controller services cannot be improved, but I would definitely advise caution before going too far down this path without considering all possible side effects and potential constraints on future flow development.  


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 22, 2020, at 3:01 PM, Eric Secules <[hidden email]> wrote:

Hello everyone,

I am running into an issue with automated deployment using nipyapi. We would like to be able to pull down flows from a registry and have them ready to go once all their controller services have been turned on. But there are a few issues. Sometimes the flows that we download from the registry reference controller service IDs that don't exist on this machine because the flow was developed in a different environment. That's easy enough to fix if there is just one applicable controller service, but not when there are two or more.

We have taken the decision to put all our controller services at the top level and have one of each kind we need, rather than have multiple of the same controller service attached to individual process groups.

We are running into a problem where some processors can either connect to a JSONTreeReader or a CSVReader and there's no indication in the ProcessorDTO object which type it was originally connected to, just a GUID of a controller service that doesn't exist in this deployment.

Would it be possible to include the type or name of the controller service in the component.config.descriptors section? Are we going about it the wrong way trying to simplify down to the least number of controller services?

Thanks,
Eric



--
Sent from Gmail Mobile
--
Sent from Gmail Mobile