State Management in a Cluster

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

State Management in a Cluster

nathan.english

Hi All,

 

Apologies if this an obvious question again, but I’ve had a search through the administration guide that’s made me more confused!

 

At the moment in our NiFi cluster deployment, we are configuring the Zookpeer provider in the state-management.xml and leaving the local provider as it is (So pointing at  ./state/local). My main question is if we should have both the local and Zookeeper providers enabled? It seems that should be the case but just wanted to clarify.

 

Secondly, if both the local and Zookeeper state management should be used, what are the differences in the data stored in the local and Zookeeper provider, if any?

 

Kind Regards,

 

Nathan

Reply | Threaded
Open this post in threaded view
|

Re: State Management in a Cluster

Mark Payne
Nathan,

Yes, both are needed. Some processors will store local state while others store clustered state. The difference is whether the state being stored should be readable by all nodes in the cluster or only the local node. Some processors actually make use of either/both. ListFile is a good example. ListFile creates an output FlowFile for each file in a given directory and keeps state about the files that it has already listed. The processor is configured with a property that indicates whether the directory it’s monitoring is on a local file system or a network-mounted drive (NFS mount for example). If the directory is on the local file system, each node in the cluster will want to be monitoring the directory and storing state about its local file system so it uses Local State Management. On the other hand, if it’s an NFS mount, the processors should be run only on the Primary Node and the state should be shared across the cluster. This way, if the Primary Node is shutdown or crashes, a new Primary Node is elected and can read the state that was stored by the previous node. So, in order for that to work, the state must be shared across all nodes in the cluster, so it’s stored using the Cluster State Provider.

Does that make sense?

Thanks
-Mark


On May 21, 2020, at 4:58 AM, [hidden email] wrote:

Hi All,
 
Apologies if this an obvious question again, but I’ve had a search through the administration guide that’s made me more confused!
 
At the moment in our NiFi cluster deployment, we are configuring the Zookpeer provider in the state-management.xml and leaving the local provider as it is (So pointing at  ./state/local). My main question is if we should have both the local and Zookeeper providers enabled? It seems that should be the case but just wanted to clarify.
 
Secondly, if both the local and Zookeeper state management should be used, what are the differences in the data stored in the local and Zookeeper provider, if any?
 
Kind Regards,
 
Nathan

Reply | Threaded
Open this post in threaded view
|

RE: State Management in a Cluster

nathan.english

Hi Mark,

 

It makes complete sense now, thanks for clearing it up!

 

Nathan

 

From: Mark Payne [mailto:[hidden email]]
Sent: 21 May 2020 15:35
To: [hidden email]
Subject: Re: State Management in a Cluster

 

Nathan,

 

Yes, both are needed. Some processors will store local state while others store clustered state. The difference is whether the state being stored should be readable by all nodes in the cluster or only the local node. Some processors actually make use of either/both. ListFile is a good example. ListFile creates an output FlowFile for each file in a given directory and keeps state about the files that it has already listed. The processor is configured with a property that indicates whether the directory it’s monitoring is on a local file system or a network-mounted drive (NFS mount for example). If the directory is on the local file system, each node in the cluster will want to be monitoring the directory and storing state about its local file system so it uses Local State Management. On the other hand, if it’s an NFS mount, the processors should be run only on the Primary Node and the state should be shared across the cluster. This way, if the Primary Node is shutdown or crashes, a new Primary Node is elected and can read the state that was stored by the previous node. So, in order for that to work, the state must be shared across all nodes in the cluster, so it’s stored using the Cluster State Provider.

 

Does that make sense?

 

Thanks

-Mark

 



On May 21, 2020, at 4:58 AM, [hidden email] wrote:

 

Hi All,

 

Apologies if this an obvious question again, but I’ve had a search through the administration guide that’s made me more confused!

 

At the moment in our NiFi cluster deployment, we are configuring the Zookpeer provider in the state-management.xml and leaving the local provider as it is (So pointing at  ./state/local). My main question is if we should have both the local and Zookeeper providers enabled? It seems that should be the case but just wanted to clarify.

 

Secondly, if both the local and Zookeeper state management should be used, what are the differences in the data stored in the local and Zookeeper provider, if any?

 

Kind Regards,

 

Nathan