SplitJson:GC Overhead Limit Exceeded

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

SplitJson:GC Overhead Limit Exceeded

Mike Harding
Hi All,

I have a flowfile containing a JSON array with 30k objects that I am trying to split into separate flowfiles for down stream processing.

The problem is the processor reports a GC Overhead Limit Exceeded warning and administratively yields.

Is there anyway of setting up a back pressure option or some changes to the nifi config to best address this.

Thanks,
Mike
Reply | Threaded
Open this post in threaded view
|

Re: SplitJson:GC Overhead Limit Exceeded

Mike Harding
..just for info in bootstrap.conf my heap size is as follows:

java.arg.2=-Xms512m

java.arg.3=-Xmx512m

Would it be a simple case of increasing this? The size of the flowfile json array is 35MB.

Mike



On 17 November 2016 at 15:47, Mike Harding <[hidden email]> wrote:
Hi All,

I have a flowfile containing a JSON array with 30k objects that I am trying to split into separate flowfiles for down stream processing.

The problem is the processor reports a GC Overhead Limit Exceeded warning and administratively yields.

Is there anyway of setting up a back pressure option or some changes to the nifi config to best address this.

Thanks,
Mike

Reply | Threaded
Open this post in threaded view
|

Re: SplitJson:GC Overhead Limit Exceeded

Mark Payne
Hi Mike,

Certainly, I would recommend trying to change the max heap to say 2 GB and see if that gives you what you need.
Looking at the code, it does look like this Processor may not be the most efficient in how it is parsing the JSON.
There are libraries, for example, that provide a "Streaming JSON" interface, but this Processor loads the entire JSON
into heap and then creates an Object Model from it.

Also, what do you have set for the Max Concurrent Tasks? If you have multiple threads simultaneously running, you could
have each one using up quite a lot of heap.

Thanks
-Mark


On Nov 17, 2016, at 10:54 AM, Mike Harding <[hidden email]> wrote:

..just for info in bootstrap.conf my heap size is as follows:

java.arg.2=-Xms512m

java.arg.3=-Xmx512m

Would it be a simple case of increasing this? The size of the flowfile json array is 35MB.

Mike



On 17 November 2016 at 15:47, Mike Harding <[hidden email]> wrote:
Hi All,

I have a flowfile containing a JSON array with 30k objects that I am trying to split into separate flowfiles for down stream processing.

The problem is the processor reports a GC Overhead Limit Exceeded warning and administratively yields.

Is there anyway of setting up a back pressure option or some changes to the nifi config to best address this.

Thanks,
Mike


Reply | Threaded
Open this post in threaded view
|

Re: SplitJson:GC Overhead Limit Exceeded

Aldrin Piri
The backing library of the Json processors does indeed require loading the entire doc into memory. We should make sure this consideration is documented if not already.

Could be an interesting idea to not tie SplitJson to this library given that it might not need all the functionalities of JsonPath and would likely be a good candidate for streaming.
On Thu, Nov 17, 2016 at 11:23 Mark Payne <[hidden email]> wrote:
Hi Mike,

Certainly, I would recommend trying to change the max heap to say 2 GB and see if that gives you what you need.
Looking at the code, it does look like this Processor may not be the most efficient in how it is parsing the JSON.
There are libraries, for example, that provide a "Streaming JSON" interface, but this Processor loads the entire JSON
into heap and then creates an Object Model from it.

Also, what do you have set for the Max Concurrent Tasks? If you have multiple threads simultaneously running, you could
have each one using up quite a lot of heap.

Thanks
-Mark


On Nov 17, 2016, at 10:54 AM, Mike Harding <[hidden email]> wrote:

..just for info in bootstrap.conf my heap size is as follows:

java.arg.2=-Xms512m

java.arg.3=-Xmx512m

Would it be a simple case of increasing this? The size of the flowfile json array is 35MB.

Mike



On 17 November 2016 at 15:47, Mike Harding <[hidden email]> wrote:
Hi All,

I have a flowfile containing a JSON array with 30k objects that I am trying to split into separate flowfiles for down stream processing.

The problem is the processor reports a GC Overhead Limit Exceeded warning and administratively yields.

Is there anyway of setting up a back pressure option or some changes to the nifi config to best address this.

Thanks,
Mike


Reply | Threaded
Open this post in threaded view
|

Re: SplitJson:GC Overhead Limit Exceeded

Matt Burgess-2
If we consider streaming for SplitJson (or a new version of it), we
wouldn't be able to support the "micro-batch" functionality as is in
SplitJson today (like the fragment.count attribute, for example).
Might not be a concern, or might warrant a new processor
(SplitJsonStreaming, e.g.) .

Regards,
Matt

On Thu, Nov 17, 2016 at 11:36 AM, Aldrin Piri <[hidden email]> wrote:

> The backing library of the Json processors does indeed require loading the
> entire doc into memory. We should make sure this consideration is documented
> if not already.
>
> Could be an interesting idea to not tie SplitJson to this library given that
> it might not need all the functionalities of JsonPath and would likely be a
> good candidate for streaming.
> On Thu, Nov 17, 2016 at 11:23 Mark Payne <[hidden email]> wrote:
>>
>> Hi Mike,
>>
>> Certainly, I would recommend trying to change the max heap to say 2 GB and
>> see if that gives you what you need.
>> Looking at the code, it does look like this Processor may not be the most
>> efficient in how it is parsing the JSON.
>> There are libraries, for example, that provide a "Streaming JSON"
>> interface, but this Processor loads the entire JSON
>> into heap and then creates an Object Model from it.
>>
>> Also, what do you have set for the Max Concurrent Tasks? If you have
>> multiple threads simultaneously running, you could
>> have each one using up quite a lot of heap.
>>
>> Thanks
>> -Mark
>>
>>
>> On Nov 17, 2016, at 10:54 AM, Mike Harding <[hidden email]> wrote:
>>
>> ..just for info in bootstrap.conf my heap size is as follows:
>>
>> java.arg.2=-Xms512m
>>
>> java.arg.3=-Xmx512m
>>
>> Would it be a simple case of increasing this? The size of the flowfile
>> json array is 35MB.
>>
>> Mike
>>
>>
>>
>> On 17 November 2016 at 15:47, Mike Harding <[hidden email]> wrote:
>>>
>>> Hi All,
>>>
>>> I have a flowfile containing a JSON array with 30k objects that I am
>>> trying to split into separate flowfiles for down stream processing.
>>>
>>> The problem is the processor reports a GC Overhead Limit Exceeded warning
>>> and administratively yields.
>>>
>>> Is there anyway of setting up a back pressure option or some changes to
>>> the nifi config to best address this.
>>>
>>> Thanks,
>>> Mike
>>
>>
>>
>