VolatileContentRepository

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

VolatileContentRepository

Margus Roo

Hi

I am playing with nifi performance using one nifi node.

At the moment I think the bottleneck in my flow is SplitJson processor who can work with 2 000 000 items per 5 minutes (downstrem queues are not full and queue before SplitJson is constantly full).

I tried to change as much repos to volatile but if I change content repo to volatile then speed degrees a lot 2 000 000 to 5000 or smth.

Befor I set content repo to volatile I increased volatile content repo max size:

nifi.volatile.content.repository.max.size=12GB

Do I need increase JVM setting that content repo can live inside JVM?

At the moment I have:

# JVM memory settings
java.arg.2=-Xms2048m
java.arg.3=-Xmx46384m


-- 
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
https://www.facebook.com/allan.tuuring
+372 51 48 780
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: VolatileContentRepository

Pierre Villard
Hi Margus,

I believe that your memory settings are enough. Giving more memory will likely increase duration of garbage collections and won't increase your performances. Others on this mailing list will probably be able to give better recommendations on this one. Also, keep in mind that volatile repositories can cause data loss in case of NiFi shutdown.

I wanted to answer your question to check if you tried to increase the number of concurrent tasks in your SplitJson processor (if you have enough resources, that will likely improve the throughput), but also, if you increased the run duration of the processor? That's something that can make a huge difference on the performances if you don't care about latency.

Also, it depends of what your are doing with your data, but did you consider the new records oriented processors? If your use case can fit with the record processors, that will certainly improve the overall performances of your workflow.

Thanks!
Pierre


2017-08-10 7:50 GMT+02:00 Margus Roo <[hidden email]>:

Hi

I am playing with nifi performance using one nifi node.

At the moment I think the bottleneck in my flow is SplitJson processor who can work with 2 000 000 items per 5 minutes (downstrem queues are not full and queue before SplitJson is constantly full).

I tried to change as much repos to volatile but if I change content repo to volatile then speed degrees a lot 2 000 000 to 5000 or smth.

Befor I set content repo to volatile I increased volatile content repo max size:

nifi.volatile.content.repository.max.size=12GB

Do I need increase JVM setting that content repo can live inside JVM?

At the moment I have:

# JVM memory settings
java.arg.2=-Xms2048m
java.arg.3=-Xmx46384m


-- 
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
https://www.facebook.com/allan.tuuring
<a href="tel:+372%20514%208780" value="+3725148780" target="_blank">+372 51 48 780

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: VolatileContentRepository

BD International
Pierre,

Just a clarification on what you said below the potential data loss on nifi shutdown would only be on provenance information right? 

Or would the data loss affect other repos?

Thanks

Brian

On 10 Aug 2017 8:57 am, "Pierre Villard" <[hidden email]> wrote:
Hi Margus,

I believe that your memory settings are enough. Giving more memory will likely increase duration of garbage collections and won't increase your performances. Others on this mailing list will probably be able to give better recommendations on this one. Also, keep in mind that volatile repositories can cause data loss in case of NiFi shutdown.

I wanted to answer your question to check if you tried to increase the number of concurrent tasks in your SplitJson processor (if you have enough resources, that will likely improve the throughput), but also, if you increased the run duration of the processor? That's something that can make a huge difference on the performances if you don't care about latency.

Also, it depends of what your are doing with your data, but did you consider the new records oriented processors? If your use case can fit with the record processors, that will certainly improve the overall performances of your workflow.

Thanks!
Pierre


2017-08-10 7:50 GMT+02:00 Margus Roo <[hidden email]>:

Hi

I am playing with nifi performance using one nifi node.

At the moment I think the bottleneck in my flow is SplitJson processor who can work with 2 000 000 items per 5 minutes (downstrem queues are not full and queue before SplitJson is constantly full).

I tried to change as much repos to volatile but if I change content repo to volatile then speed degrees a lot 2 000 000 to 5000 or smth.

Befor I set content repo to volatile I increased volatile content repo max size:

nifi.volatile.content.repository.max.size=12GB

Do I need increase JVM setting that content repo can live inside JVM?

At the moment I have:

# JVM memory settings
java.arg.2=-Xms2048m
java.arg.3=-Xmx46384m


-- 
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
https://www.facebook.com/allan.tuuring
<a href="tel:+372%20514%208780" value="+3725148780" target="_blank">+372 51 48 780


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: VolatileContentRepository

Pierre Villard
Hi Brian,

The question is about the content volatile repository. If you are only using the provenance volatile repository (not the content one), then only the provenance data would be lost in case of NiFi shutdown, you're right.

Pierre

2017-08-10 10:03 GMT+02:00 BD International <[hidden email]>:
Pierre,

Just a clarification on what you said below the potential data loss on nifi shutdown would only be on provenance information right? 

Or would the data loss affect other repos?

Thanks

Brian

On 10 Aug 2017 8:57 am, "Pierre Villard" <[hidden email]> wrote:
Hi Margus,

I believe that your memory settings are enough. Giving more memory will likely increase duration of garbage collections and won't increase your performances. Others on this mailing list will probably be able to give better recommendations on this one. Also, keep in mind that volatile repositories can cause data loss in case of NiFi shutdown.

I wanted to answer your question to check if you tried to increase the number of concurrent tasks in your SplitJson processor (if you have enough resources, that will likely improve the throughput), but also, if you increased the run duration of the processor? That's something that can make a huge difference on the performances if you don't care about latency.

Also, it depends of what your are doing with your data, but did you consider the new records oriented processors? If your use case can fit with the record processors, that will certainly improve the overall performances of your workflow.

Thanks!
Pierre


2017-08-10 7:50 GMT+02:00 Margus Roo <[hidden email]>:

Hi

I am playing with nifi performance using one nifi node.

At the moment I think the bottleneck in my flow is SplitJson processor who can work with 2 000 000 items per 5 minutes (downstrem queues are not full and queue before SplitJson is constantly full).

I tried to change as much repos to volatile but if I change content repo to volatile then speed degrees a lot 2 000 000 to 5000 or smth.

Befor I set content repo to volatile I increased volatile content repo max size:

nifi.volatile.content.repository.max.size=12GB

Do I need increase JVM setting that content repo can live inside JVM?

At the moment I have:

# JVM memory settings
java.arg.2=-Xms2048m
java.arg.3=-Xmx46384m


-- 
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
https://www.facebook.com/allan.tuuring
<a href="tel:+372%20514%208780" value="+3725148780" target="_blank">+372 51 48 780



Loading...