Indications in the UI of which cluster node hosts a “stuck” thread?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Indications in the UI of which cluster node hosts a “stuck” thread?

James McMahon
Our production nifi cluster is exhibiting repeated problems with threads that do not end. It is happening with processors that have complex configurations and dependencies (ConsumeAMQP), and - more troubling - it is also occurring periodically for simple processors like ControlRate. I’ll have a Control processor sitting in a running state with no active running thread,I select Stop on that processor, get a thread I presume to be responsible for stopping the processor, and that thread will never end. This renders my processor in a useless state - not stopped, not really running, and not accessible to reconfigure.

I read a blog by Pierre Villard on using nifi.sh for thread dumps. I’ll dig into that. My questions:

1. In a cluster, is there anything I can use in the UI to tell me which cluster node hosts the bad thread? Digging through thread dumps from multiple cluster nodes seems impractical, and I’m hoping there’s a way to zero in on a node.

2. What nifi system resources in my configuration influence the management and well-being of these threads?

3. Has anyone debugged such a thread issue in a clustered nifi environment, and if so can you offer any tips based on your experience?

Thanks in advance for any help.
Jim
Reply | Threaded
Open this post in threaded view
|

Re: Indications in the UI of which cluster node hosts a “stuck” thread?

Matt Gilman
Hi Jim,

If you open the Summary page from the global menu you should see the active threads in parentheses next to the scheduled state. Find the row in question and click the cluster icon from the actions column. This will open a dialog with a node-wise breakdown. I believe that the thread count is one of the metrics that is broken down per node.

Hope this helps! Adding this breakdown to the main canvas would be a great addition. Maybe these breakdowns could be offered in a tooltip first each metric.

Matt

Sent from my iPhone

> On Jun 24, 2020, at 21:05, James McMahon <[hidden email]> wrote:
>
> 
> Our production nifi cluster is exhibiting repeated problems with threads that do not end. It is happening with processors that have complex configurations and dependencies (ConsumeAMQP), and - more troubling - it is also occurring periodically for simple processors like ControlRate. I’ll have a Control processor sitting in a running state with no active running thread,I select Stop on that processor, get a thread I presume to be responsible for stopping the processor, and that thread will never end. This renders my processor in a useless state - not stopped, not really running, and not accessible to reconfigure.
>
> I read a blog by Pierre Villard on using nifi.sh for thread dumps. I’ll dig into that. My questions:
>
> 1. In a cluster, is there anything I can use in the UI to tell me which cluster node hosts the bad thread? Digging through thread dumps from multiple cluster nodes seems impractical, and I’m hoping there’s a way to zero in on a node.
>
> 2. What nifi system resources in my configuration influence the management and well-being of these threads?
>
> 3. Has anyone debugged such a thread issue in a clustered nifi environment, and if so can you offer any tips based on your experience?
>
> Thanks in advance for any help.
> Jim
Reply | Threaded
Open this post in threaded view
|

Re: Indications in the UI of which cluster node hosts a “stuck” thread?

James McMahon
This does help, thank you Matt. And I like your suggestion. It would be more at our fingertips if as we hover over the thread count on the processor, the distribution across all cluster nodes is presented in a popup. I wonder if project leads would consider this helpful improvement?

I can now see that my hanging threads are on just two of my cluster nodes. This is very helpful - thanks again. It reduces the amount of thread dumping review I will be doing today.

Jim

On Wed, Jun 24, 2020 at 9:53 PM Matt Gilman <[hidden email]> wrote:
Hi Jim,

If you open the Summary page from the global menu you should see the active threads in parentheses next to the scheduled state. Find the row in question and click the cluster icon from the actions column. This will open a dialog with a node-wise breakdown. I believe that the thread count is one of the metrics that is broken down per node.

Hope this helps! Adding this breakdown to the main canvas would be a great addition. Maybe these breakdowns could be offered in a tooltip first each metric.

Matt

Sent from my iPhone

> On Jun 24, 2020, at 21:05, James McMahon <[hidden email]> wrote:
>
> 
> Our production nifi cluster is exhibiting repeated problems with threads that do not end. It is happening with processors that have complex configurations and dependencies (ConsumeAMQP), and - more troubling - it is also occurring periodically for simple processors like ControlRate. I’ll have a Control processor sitting in a running state with no active running thread,I select Stop on that processor, get a thread I presume to be responsible for stopping the processor, and that thread will never end. This renders my processor in a useless state - not stopped, not really running, and not accessible to reconfigure.
>
> I read a blog by Pierre Villard on using nifi.sh for thread dumps. I’ll dig into that. My questions:
>
> 1. In a cluster, is there anything I can use in the UI to tell me which cluster node hosts the bad thread? Digging through thread dumps from multiple cluster nodes seems impractical, and I’m hoping there’s a way to zero in on a node.
>
> 2. What nifi system resources in my configuration influence the management and well-being of these threads?
>
> 3. Has anyone debugged such a thread issue in a clustered nifi environment, and if so can you offer any tips based on your experience?
>
> Thanks in advance for any help.
> Jim