Possible memory leak in PutHive3Streaming?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Possible memory leak in PutHive3Streaming?

Hengesbach, Martin

Hi,

 

I’m using the PutHive3Streaming processor to write round about 30,000 avro records into a Hive table (round about 150 GB). This works for a while but the used memory is increasing over the time. And after some thousands records, the memory is full.

 

When I remove the PutHive3Streaming processor from my flow, everything works OK. So, the problem must be the PutHive3Streaming processor.

 

As Record Reader I’m using the AvroReader, the NiFi version is 1.11.4, Hive is version 3.1.0.

 

The Hive Table is created as follow:

create table yyyy(id string, data string) clustered by (id) into 50 buckets stored as orc tblproperties("transactional"="true")

 

Any ideas if I make something wrong or is there really a memory leak in the processor?

 

Thanks in advance and best regards

 

Martin

 

 

 

2020-05-08 09:37:22,076 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Creating metastore client for streaming-connection

2020-05-08 09:37:22,091 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to metastore with URI thrift://xxxxxx:9083

2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to metastore, current connections: 1

2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.

2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi (auth:SIMPLE) retries=24 delay=5 lifetime=0

2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Creating metastore client for streaming-connection-heartbeat

2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to metastore with URI thrift://xxxx:9083

2020-05-08 09:37:22,094 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to metastore, current connections: 2

2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.

2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi (auth:SIMPLE) retries=24 delay=5 lifetime=0

2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection STREAMING CONNECTION INFO: { metastore-uri: thrift://xxxxxxxx:9083, database: default, table: yyyy, partitioned-table: false, dynamic-partitioning: false, username: nifi, secure-mode: false, record-writer: HiveRecordWriter, agent-info: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44] }

2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Starting heartbeat thread with interval: 150000 ms initialDelay: 64376 ms for agentInfo: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44]

2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Created new filesystem instance: 484197097

2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Memory monitorings settings - autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0

2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hadoop.hive.common.HeapMemoryMonitor Setting collection usage threshold to 2147483647

2020-05-08 09:37:22,233 WARN [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen memory is already low. Increase memory to improve performance. Used: 129.54GB Max: 130.00GB

2020-05-08 09:37:22,233 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Opened new transaction batch TxnId/WriteIds=[292449/7336...292449/7336] on connection = { metaStoreUri: thrift://xxxxxxxx:9083, database: default, table: yyyy };  TxnStatus[O] LastUsed txnid:0

2020-05-08 09:37:22,272 INFO [Timer-Driven Process Thread-44] org.apache.orc.impl.PhysicalFsWriter ORC writer created for path: hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768

2020-05-08 09:37:22,292 INFO [Timer-Driven Process Thread-44] org.apache.orc.impl.WriterImpl ORC writer created for path: hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768

2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats before flush: [record-updaters: 50, partitions: 1, buffered-records: 1 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]

2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Flushing record updater for partitions: default.yyyy

2020-05-08 09:37:22,578 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats after flush: [record-updaters: 50, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]

2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats before close: [record-updaters: 50, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]

2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Closing updater for partitions: default.yyyy

2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats after close: [record-updaters: 0, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]

2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore, current connections: 1

2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore, current connections: 0

2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Closed streaming connection. Agent: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44] Stats: {records-written: 1, records-size: 0, committed-transactions: 1, aborted-transactions: 0, auto-flushes: 0, metastore-calls: 7 }

 

At the beginning the memory starts with Used: 2.63GB Max: 130.00GB

 



FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
Geschäftsführerin: Sabine Brünger-Weilandt.
Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.

FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".
Reply | Threaded
Open this post in threaded view
|

Re: Possible memory leak in PutHive3Streaming?

Mark Payne
Martin,

I don’t know a lot about Hive. The log message you’re noting there is definitely coming from Hive’s client, but I can’t say whether or not it’s anything that should actually be concerning in terms of Java memory/heap. Is the NiFi heap size actually set to a max of 130 GB? (That would be a really massive heap for NiFi). Are you seeing any full garbage collections occurring? This information would be available in the Cluster table if you’re running clustered, or else in the Summary table (hamburger menu in the top-right -> Summary -> System Diagnostics in the bottom-right).

Thanks
-Mark


On May 11, 2020, at 5:20 AM, Hengesbach, Martin <[hidden email]> wrote:

Hi,
 
I’m using the PutHive3Streaming processor to write round about 30,000 avro records into a Hive table (round about 150 GB). This works for a while but the used memory is increasing over the time. And after some thousands records, the memory is full.
 
When I remove the PutHive3Streaming processor from my flow, everything works OK. So, the problem must be the PutHive3Streaming processor.
 
As Record Reader I’m using the AvroReader, the NiFi version is 1.11.4, Hive is version 3.1.0.
 
The Hive Table is created as follow:
create table yyyy(id string, data string) clustered by (id) into 50 buckets stored as orc tblproperties("transactional"="true")
 
Any ideas if I make something wrong or is there really a memory leak in the processor?
 
Thanks in advance and best regards
 
Martin
 
 
 
2020-05-08 09:37:22,076 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Creating metastore client for streaming-connection
2020-05-08 09:37:22,091 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to metastore with URI <a href="thrift://xxxxxx:9083" style="color: rgb(149, 79, 114); text-decoration: underline;" class="">thrift://xxxxxx:9083
2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to metastore, current connections: 1
2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi (auth:SIMPLE) retries=24 delay=5 lifetime=0
2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Creating metastore client for streaming-connection-heartbeat
2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to metastore with URI <a href="thrift://xxxx:9083" style="color: rgb(149, 79, 114); text-decoration: underline;" class="">thrift://xxxx:9083
2020-05-08 09:37:22,094 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to metastore, current connections: 2
2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi (auth:SIMPLE) retries=24 delay=5 lifetime=0
2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection STREAMING CONNECTION INFO: { metastore-uri: <a href="thrift://xxxxxxxx:9083" style="color: rgb(149, 79, 114); text-decoration: underline;" class="">thrift://xxxxxxxx:9083, database: default, table: yyyy, partitioned-table: false, dynamic-partitioning: false, username: nifi, secure-mode: false, record-writer: HiveRecordWriter, agent-info: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44] }
2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Starting heartbeat thread with interval: 150000 ms initialDelay: 64376 ms for agentInfo: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44]
2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Created new filesystem instance: 484197097
2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Memory monitorings settings - autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0
2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hadoop.hive.common.HeapMemoryMonitor Setting collection usage threshold to 2147483647
2020-05-08 09:37:22,233 WARN [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen memory is already low. Increase memory to improve performance. Used: 129.54GB Max: 130.00GB
2020-05-08 09:37:22,233 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Opened new transaction batch TxnId/WriteIds=[292449/7336...292449/7336] on connection = { metaStoreUri: <a href="thrift://xxxxxxxx:9083" style="color: rgb(149, 79, 114); text-decoration: underline;" class="">thrift://xxxxxxxx:9083, database: default, table: yyyy };  TxnStatus[O] LastUsed txnid:0
2020-05-08 09:37:22,272 INFO [Timer-Driven Process Thread-44] org.apache.orc.impl.PhysicalFsWriter ORC writer created for path: <a href="hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024" style="color: rgb(149, 79, 114); text-decoration: underline;" class="">hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
2020-05-08 09:37:22,292 INFO [Timer-Driven Process Thread-44] org.apache.orc.impl.WriterImpl ORC writer created for path: <a href="hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024" style="color: rgb(149, 79, 114); text-decoration: underline;" class="">hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats before flush: [record-updaters: 50, partitions: 1, buffered-records: 1 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Flushing record updater for partitions: default.yyyy
2020-05-08 09:37:22,578 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats after flush: [record-updaters: 50, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats before close: [record-updaters: 50, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Closing updater for partitions: default.yyyy
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats after close: [record-updaters: 0, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore, current connections: 1
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore, current connections: 0
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Closed streaming connection. Agent: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44] Stats: {records-written: 1, records-size: 0, committed-transactions: 1, aborted-transactions: 0, auto-flushes: 0, metastore-calls: 7 }
 
At the beginning the memory starts with Used: 2.63GB Max: 130.00GB
 

FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892. 
Geschäftsführerin: Sabine Brünger-Weilandt. 
Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.

FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".

Reply | Threaded
Open this post in threaded view
|

Re: Possible memory leak in PutHive3Streaming?

Matt Burgess-2
Martin,

There were two memory leaks a while back, one in NiFi code and one in
Hive client code (brought in as a dependency for the Hive 3
components). NiFi has fixed their side in 1.9.0 (via NIFI-5841 [1])
and Hive has fixed their side in 3.1.1 (via HIVE-20979 [2]). Until
NiFi 1.11.4, we were still using Hive 3.1.0 dependencies which still
has the leak. But as of 1.11.4 (via NIFI-7239 [3]) we upgraded our
Hive 3 dependencies to 3.1.2, which means those memory leaks should be
gone. Can you verify the versions of Hive 3 in your NiFi installation
with something like:

find work -name "hive*.jar" | grep hive3

That should show you the Hive JARs in the Hive 3 NAR, and the version
is part of the filename of each.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-5841
[2] https://issues.apache.org/jira/browse/HIVE-20979
[3] https://issues.apache.org/jira/browse/NIFI-7239

On Mon, May 11, 2020 at 4:42 PM Mark Payne <[hidden email]> wrote:

>
> Martin,
>
> I don’t know a lot about Hive. The log message you’re noting there is definitely coming from Hive’s client, but I can’t say whether or not it’s anything that should actually be concerning in terms of Java memory/heap. Is the NiFi heap size actually set to a max of 130 GB? (That would be a really massive heap for NiFi). Are you seeing any full garbage collections occurring? This information would be available in the Cluster table if you’re running clustered, or else in the Summary table (hamburger menu in the top-right -> Summary -> System Diagnostics in the bottom-right).
>
> Thanks
> -Mark
>
>
> On May 11, 2020, at 5:20 AM, Hengesbach, Martin <[hidden email]> wrote:
>
> Hi,
>
> I’m using the PutHive3Streaming processor to write round about 30,000 avro records into a Hive table (round about 150 GB). This works for a while but the used memory is increasing over the time. And after some thousands records, the memory is full.
>
> When I remove the PutHive3Streaming processor from my flow, everything works OK. So, the problem must be the PutHive3Streaming processor.
>
> As Record Reader I’m using the AvroReader, the NiFi version is 1.11.4, Hive is version 3.1.0.
>
> The Hive Table is created as follow:
> create table yyyy(id string, data string) clustered by (id) into 50 buckets stored as orc tblproperties("transactional"="true")
>
> Any ideas if I make something wrong or is there really a memory leak in the processor?
>
> Thanks in advance and best regards
>
> Martin
>
>
>
> 2020-05-08 09:37:22,076 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Creating metastore client for streaming-connection
> 2020-05-08 09:37:22,091 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to metastore with URI thrift://xxxxxx:9083
> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to metastore, current connections: 1
> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi (auth:SIMPLE) retries=24 delay=5 lifetime=0
> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Creating metastore client for streaming-connection-heartbeat
> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to metastore with URI thrift://xxxx:9083
> 2020-05-08 09:37:22,094 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to metastore, current connections: 2
> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi (auth:SIMPLE) retries=24 delay=5 lifetime=0
> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection STREAMING CONNECTION INFO: { metastore-uri: thrift://xxxxxxxx:9083, database: default, table: yyyy, partitioned-table: false, dynamic-partitioning: false, username: nifi, secure-mode: false, record-writer: HiveRecordWriter, agent-info: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44] }
> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Starting heartbeat thread with interval: 150000 ms initialDelay: 64376 ms for agentInfo: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44]
> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Created new filesystem instance: 484197097
> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Memory monitorings settings - autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0
> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hadoop.hive.common.HeapMemoryMonitor Setting collection usage threshold to 2147483647
> 2020-05-08 09:37:22,233 WARN [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen memory is already low. Increase memory to improve performance. Used: 129.54GB Max: 130.00GB
> 2020-05-08 09:37:22,233 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Opened new transaction batch TxnId/WriteIds=[292449/7336...292449/7336] on connection = { metaStoreUri: thrift://xxxxxxxx:9083, database: default, table: yyyy };  TxnStatus[O] LastUsed txnid:0
> 2020-05-08 09:37:22,272 INFO [Timer-Driven Process Thread-44] org.apache.orc.impl.PhysicalFsWriter ORC writer created for path: hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
> 2020-05-08 09:37:22,292 INFO [Timer-Driven Process Thread-44] org.apache.orc.impl.WriterImpl ORC writer created for path: hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats before flush: [record-updaters: 50, partitions: 1, buffered-records: 1 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Flushing record updater for partitions: default.yyyy
> 2020-05-08 09:37:22,578 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats after flush: [record-updaters: 50, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats before close: [record-updaters: 50, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Closing updater for partitions: default.yyyy
> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats after close: [record-updaters: 0, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore, current connections: 1
> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore, current connections: 0
> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Closed streaming connection. Agent: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44] Stats: {records-written: 1, records-size: 0, committed-transactions: 1, aborted-transactions: 0, auto-flushes: 0, metastore-calls: 7 }
>
> At the beginning the memory starts with Used: 2.63GB Max: 130.00GB
>
> ________________________________
>
> FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
> Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
> Geschäftsführerin: Sabine Brünger-Weilandt.
> Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.
>
> FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".
>
>
Reply | Threaded
Open this post in threaded view
|

AW: Possible memory leak in PutHive3Streaming?

Hengesbach, Martin

Hi Mark and Matt,

 

Thank you for your answers. I have checked the versions of the Hive client libraries. The output is as expected version 3.1.2:

 

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-hcatalog-core-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-upgrade-acid-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-exec-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-streaming-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-serde-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-common-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-vector-code-gen-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-tez-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-classification-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-0.23-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-common-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-client-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-standalone-metastore-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-service-rpc-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-scheduler-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-jdbc-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-common-3.1.2-tests.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-server-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-metastore-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-common-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-cli-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-service-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-storage-api-2.7.0.jar

 

I have done a new test: 12,000 XML documents with a total of 55 GB in size. I have restarted NiFi before the test.

 

At the beginning I have:

2020-05-12 08:13:07,426 INFO [Timer-Driven Process Thread-232] o.a.hive.streaming.AbstractRecordWriter Stats before flush: [record-updaters: 50, partitions: 1, buffered-records: 1 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 659.08MB/130.00GB]

 

At the end of the test:

 

2020-05-12 10:28:12,911 INFO [Timer-Driven Process Thread-257] o.a.hive.streaming.AbstractRecordWriter Stats after close: [record-updaters: 0, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 72.18GB/130.00GB]

 

The increment is round about the same as my documents are in size. It seems somewhere is memory allocated in size of the document but never freed.

Matt the HIVE-20979 has a comment from February that the bug is not fixed in 3.1.2 But I’m not sure whether this bug can lead to such a large memory leak.

 

Any ideas?

 

 

Best regards,

Martin

 

 

-----Ursprüngliche Nachricht-----
Von: Matt Burgess [mailto:[hidden email]]
Gesendet: Montag, 11. Mai 2020 23:00
An: [hidden email]
Betreff: Re: Possible memory leak in PutHive3Streaming?

 

Martin,

 

There were two memory leaks a while back, one in NiFi code and one in Hive client code (brought in as a dependency for the Hive 3 components). NiFi has fixed their side in 1.9.0 (via NIFI-5841 [1]) and Hive has fixed their side in 3.1.1 (via HIVE-20979 [2]). Until NiFi 1.11.4, we were still using Hive 3.1.0 dependencies which still has the leak. But as of 1.11.4 (via NIFI-7239 [3]) we upgraded our Hive 3 dependencies to 3.1.2, which means those memory leaks should be gone. Can you verify the versions of Hive 3 in your NiFi installation with something like:

 

find work -name "hive*.jar" | grep hive3

 

That should show you the Hive JARs in the Hive 3 NAR, and the version is part of the filename of each.

 

Regards,

Matt

 

[1] https://issues.apache.org/jira/browse/NIFI-5841

[2] https://issues.apache.org/jira/browse/HIVE-20979

[3] https://issues.apache.org/jira/browse/NIFI-7239

 

On Mon, May 11, 2020 at 4:42 PM Mark Payne <[hidden email]> wrote:

> 

> Martin,

> 

> I don’t know a lot about Hive. The log message you’re noting there is definitely coming from Hive’s client, but I can’t say whether or not it’s anything that should actually be concerning in terms of Java memory/heap. Is the NiFi heap size actually set to a max of 130 GB? (That would be a really massive heap for NiFi). Are you seeing any full garbage collections occurring? This information would be available in the Cluster table if you’re running clustered, or else in the Summary table (hamburger menu in the top-right -> Summary -> System Diagnostics in the bottom-right).

> 

> Thanks

> -Mark

> 

> 

> On May 11, 2020, at 5:20 AM, Hengesbach, Martin <[hidden email]> wrote:

> 

> Hi,

> 

> I’m using the PutHive3Streaming processor to write round about 30,000 avro records into a Hive table (round about 150 GB). This works for a while but the used memory is increasing over the time. And after some thousands records, the memory is full.

> 

> When I remove the PutHive3Streaming processor from my flow, everything works OK. So, the problem must be the PutHive3Streaming processor.

> 

> As Record Reader I’m using the AvroReader, the NiFi version is 1.11.4, Hive is version 3.1.0.

> 

> The Hive Table is created as follow:

> create table yyyy(id string, data string) clustered by (id) into 50

> buckets stored as orc tblproperties("transactional"="true")

> 

> Any ideas if I make something wrong or is there really a memory leak in the processor?

> 

> Thanks in advance and best regards

> 

> Martin

> 

> 

> 

> 2020-05-08 09:37:22,076 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Creating metastore client for

> streaming-connection

> 2020-05-08 09:37:22,091 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to

> metastore with URI thrift://xxxxxx:9083

> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to

> metastore, current connections: 1

> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.

> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]

> o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class

> org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi

> (auth:SIMPLE) retries=24 delay=5 lifetime=0

> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Creating metastore client for

> streaming-connection-heartbeat

> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to

> metastore with URI thrift://xxxx:9083

> 2020-05-08 09:37:22,094 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to

> metastore, current connections: 2

> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.

> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44]

> o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class

> org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi

> (auth:SIMPLE) retries=24 delay=5 lifetime=0

> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection STREAMING CONNECTION INFO: {

> metastore-uri: thrift://xxxxxxxx:9083, database: default, table: yyyy,

> partitioned-table: false, dynamic-partitioning: false, username: nifi,

> secure-mode: false, record-writer: HiveRecordWriter, agent-info: NiFi

> PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread

> 152[Timer-Driven Process Thread-44] }

> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Starting heartbeat thread with

> interval: 150000 ms initialDelay: 64376 ms for agentInfo: NiFi

> PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread

> 152[Timer-Driven Process Thread-44]

> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Created new filesystem

> instance: 484197097

> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Memory monitorings settings -

> autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0

> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]

> o.a.hadoop.hive.common.HeapMemoryMonitor Setting collection usage

> threshold to 2147483647

> 2020-05-08 09:37:22,233 WARN [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen

> memory is already low. Increase memory to improve performance. Used:

> 129.54GB Max: 130.00GB

> 2020-05-08 09:37:22,233 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Opened new transaction batch

> TxnId/WriteIds=[292449/7336...292449/7336] on connection = {

> metaStoreUri: thrift://xxxxxxxx:9083, database: default, table: yyyy

> };  TxnStatus[O] LastUsed txnid:0

> 2020-05-08 09:37:22,272 INFO [Timer-Driven Process Thread-44]

> org.apache.orc.impl.PhysicalFsWriter ORC writer created for path:

> hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with

> stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize:

> 32768

> 2020-05-08 09:37:22,292 INFO [Timer-Driven Process Thread-44]

> org.apache.orc.impl.WriterImpl ORC writer created for path:

> hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with

> stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize:

> 32768

> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats before flush:

> [record-updaters: 50, partitions: 1, buffered-records: 1

> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Flushing record updater for

> partitions: default.yyyy

> 2020-05-08 09:37:22,578 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats after flush:

> [record-updaters: 50, partitions: 1, buffered-records: 0

> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats before close:

> [record-updaters: 50, partitions: 1, buffered-records: 0

> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Closing updater for

> partitions: default.yyyy

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats after close:

> [record-updaters: 0, partitions: 1, buffered-records: 0 total-records:

> 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to

> metastore, current connections: 1

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to

> metastore, current connections: 0

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Closed streaming connection.

> Agent: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458]

> thread 152[Timer-Driven Process Thread-44] Stats: {records-written: 1,

> records-size: 0, committed-transactions: 1, aborted-transactions: 0,

> auto-flushes: 0, metastore-calls: 7 }

> 

> At the beginning the memory starts with Used: 2.63GB Max: 130.00GB

> 

> ________________________________

> 

> FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.

> Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.

> Geschäftsführerin: Sabine Brünger-Weilandt.

> Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.

> 

> FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".

> 

> 


FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
Geschäftsführerin: Sabine Brünger-Weilandt.
Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.

FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".
Reply | Threaded
Open this post in threaded view
|

Re: Possible memory leak in PutHive3Streaming?

Mark Payne
Thanks Martin,

This is not particularly surprising. What you’re seeing here is that you’ve got 70+ GB of heap used, but no full garbage collections. Java does not free up memory until it needs to do so because it’s more efficient this way. If you were to send that same dataset through 10 more times you’d likely see the heap staying steady. If you get to the point where the heap fills up and remains full even after full garbage collections occur, that’s when you need to be concerned.

That being said, a 130 GB heap is enormous for NiFi. Typically I wouldn’t recommend running higher than 8-16 GB unless you’ve got a particularly specific use case. Is there a reason that you’ve allocated such a large heap?

Thanks
-Mark


On May 12, 2020, at 9:03 AM, Hengesbach, Martin <[hidden email]> wrote:

Hi Mark and Matt,
 
Thank you for your answers. I have checked the versions of the Hive client libraries. The output is as expected version 3.1.2:
 
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-hcatalog-core-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-upgrade-acid-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-exec-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-streaming-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-serde-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-common-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-vector-code-gen-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-tez-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-classification-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-0.23-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-common-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-client-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-standalone-metastore-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-service-rpc-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-scheduler-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-jdbc-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-common-3.1.2-tests.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-server-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-metastore-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-common-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-cli-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-service-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-storage-api-2.7.0.jar
 
I have done a new test: 12,000 XML documents with a total of 55 GB in size. I have restarted NiFi before the test.
 
At the beginning I have:
<image005.png>
2020-05-12 08:13:07,426 INFO [Timer-Driven Process Thread-232] o.a.hive.streaming.AbstractRecordWriter Stats before flush: [record-updaters: 50, partitions: 1, buffered-records: 1 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 659.08MB/130.00GB]
 
At the end of the test:
<image006.png>
 
2020-05-12 10:28:12,911 INFO [Timer-Driven Process Thread-257] o.a.hive.streaming.AbstractRecordWriter Stats after close: [record-updaters: 0, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 72.18GB/130.00GB]
 
The increment is round about the same as my documents are in size. It seems somewhere is memory allocated in size of the document but never freed.
Matt the HIVE-20979 has a comment from February that the bug is not fixed in 3.1.2 But I’m not sure whether this bug can lead to such a large memory leak.
 
Any ideas?
 
 
Best regards,
Martin
 
 
-----Ursprüngliche Nachricht-----
Von: Matt Burgess [[hidden email]] 
Gesendet: Montag, 11. Mai 2020 23:00
An: [hidden email]
Betreff: Re: Possible memory leak in PutHive3Streaming?
 
Martin,
 
There were two memory leaks a while back, one in NiFi code and one in Hive client code (brought in as a dependency for the Hive 3 components). NiFi has fixed their side in 1.9.0 (via NIFI-5841 [1]) and Hive has fixed their side in 3.1.1 (via HIVE-20979 [2]). Until NiFi 1.11.4, we were still using Hive 3.1.0 dependencies which still has the leak. But as of 1.11.4 (via NIFI-7239 [3]) we upgraded our Hive 3 dependencies to 3.1.2, which means those memory leaks should be gone. Can you verify the versions of Hive 3 in your NiFi installation with something like:
 
find work -name "hive*.jar" | grep hive3
 
That should show you the Hive JARs in the Hive 3 NAR, and the version is part of the filename of each.
 
Regards,
Matt
 
 
On Mon, May 11, 2020 at 4:42 PM Mark Payne <[hidden email]> wrote:
> 
> Martin,
> 
> I don’t know a lot about Hive. The log message you’re noting there is definitely coming from Hive’s client, but I can’t say whether or not it’s anything that should actually be concerning in terms of Java memory/heap. Is the NiFi heap size actually set to a max of 130 GB? (That would be a really massive heap for NiFi). Are you seeing any full garbage collections occurring? This information would be available in the Cluster table if you’re running clustered, or else in the Summary table (hamburger menu in the top-right -> Summary -> System Diagnostics in the bottom-right).
> 
> Thanks
> -Mark
> 
> 
> On May 11, 2020, at 5:20 AM, Hengesbach, Martin <[hidden email]> wrote:
> 
> Hi,
> 
> I’m using the PutHive3Streaming processor to write round about 30,000 avro records into a Hive table (round about 150 GB). This works for a while but the used memory is increasing over the time. And after some thousands records, the memory is full.
> 
> When I remove the PutHive3Streaming processor from my flow, everything works OK. So, the problem must be the PutHive3Streaming processor.
> 
> As Record Reader I’m using the AvroReader, the NiFi version is 1.11.4, Hive is version 3.1.0.
> 
> The Hive Table is created as follow:
> create table yyyy(id string, data string) clustered by (id) into 50
> buckets stored as orc tblproperties("transactional"="true")
> 
> Any ideas if I make something wrong or is there really a memory leak in the processor?
> 
> Thanks in advance and best regards
> 
> Martin
> 
> 
> 
> 2020-05-08 09:37:22,076 INFO [Timer-Driven Process Thread-44]
> o.a.h.streaming.HiveStreamingConnection Creating metastore client for
> streaming-connection
> 2020-05-08 09:37:22,091 INFO [Timer-Driven Process Thread-44]
> o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to
> metastore with URI <a href="thrift://xxxxxx:9083" class="">thrift://xxxxxx:9083
> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44]
> o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to
> metastore, current connections: 1
> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]
> o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi
> (auth:SIMPLE) retries=24 delay=5 lifetime=0
> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]
> o.a.h.streaming.HiveStreamingConnection Creating metastore client for
> streaming-connection-heartbeat
> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]
> o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to
> metastore with URI <a href="thrift://xxxx:9083" class="">thrift://xxxx:9083
> 2020-05-08 09:37:22,094 INFO [Timer-Driven Process Thread-44]
> o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to
> metastore, current connections: 2
> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44]
> o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi
> (auth:SIMPLE) retries=24 delay=5 lifetime=0
> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]
> o.a.h.streaming.HiveStreamingConnection STREAMING CONNECTION INFO: {
> metastore-uri: <a href="thrift://xxxxxxxx:9083" class="">thrift://xxxxxxxx:9083, database: default, table: yyyy,
> partitioned-table: false, dynamic-partitioning: false, username: nifi,
> secure-mode: false, record-writer: HiveRecordWriter, agent-info: NiFi
> PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread
> 152[Timer-Driven Process Thread-44] }
> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]
> o.a.h.streaming.HiveStreamingConnection Starting heartbeat thread with
> interval: 150000 ms initialDelay: 64376 ms for agentInfo: NiFi
> PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread
> 152[Timer-Driven Process Thread-44]
> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter Created new filesystem
> instance: 484197097
> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter Memory monitorings settings -
> autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0
> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]
> o.a.hadoop.hive.common.HeapMemoryMonitor Setting collection usage
> threshold to 2147483647
> 2020-05-08 09:37:22,233 WARN [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen
> memory is already low. Increase memory to improve performance. Used:
> 129.54GB Max: 130.00GB
> 2020-05-08 09:37:22,233 INFO [Timer-Driven Process Thread-44]
> o.a.h.streaming.HiveStreamingConnection Opened new transaction batch
> TxnId/WriteIds=[292449/7336...292449/7336] on connection = {
> metaStoreUri: <a href="thrift://xxxxxxxx:9083" class="">thrift://xxxxxxxx:9083, database: default, table: yyyy
> };  TxnStatus[O] LastUsed txnid:0
> 2020-05-08 09:37:22,272 INFO [Timer-Driven Process Thread-44]
> org.apache.orc.impl.PhysicalFsWriter ORC writer created for path:
> <a href="hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024" class=""> hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with
> stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize:
> 32768
> 2020-05-08 09:37:22,292 INFO [Timer-Driven Process Thread-44]
> org.apache.orc.impl.WriterImpl ORC writer created for path:
> <a href="hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024" class=""> hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with
> stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize:
> 32768
> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter Stats before flush:
> [record-updaters: 50, partitions: 1, buffered-records: 1
> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B
> tenured-memory-usage: used/max => 129.54GB/130.00GB]
> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter Flushing record updater for
> partitions: default.yyyy
> 2020-05-08 09:37:22,578 INFO [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter Stats after flush:
> [record-updaters: 50, partitions: 1, buffered-records: 0
> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B
> tenured-memory-usage: used/max => 129.54GB/130.00GB]
> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter Stats before close:
> [record-updaters: 50, partitions: 1, buffered-records: 0
> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B
> tenured-memory-usage: used/max => 129.54GB/130.00GB]
> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter Closing updater for
> partitions: default.yyyy
> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter Stats after close:
> [record-updaters: 0, partitions: 1, buffered-records: 0 total-records:
> 1 buffered-ingest-size: 0B, total-ingest-size: 0B 
> tenured-memory-usage: used/max => 129.54GB/130.00GB]
> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]
> o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to
> metastore, current connections: 1
> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]
> o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to
> metastore, current connections: 0
> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]
> o.a.h.streaming.HiveStreamingConnection Closed streaming connection.
> Agent: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458]
> thread 152[Timer-Driven Process Thread-44] Stats: {records-written: 1,
> records-size: 0, committed-transactions: 1, aborted-transactions: 0,
> auto-flushes: 0, metastore-calls: 7 }
> 
> At the beginning the memory starts with Used: 2.63GB Max: 130.00GB
> 
> ________________________________
> 
> FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
> Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
> Geschäftsführerin: Sabine Brünger-Weilandt.
> Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.
> 
> FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".
> 
> 

FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892. 
Geschäftsführerin: Sabine Brünger-Weilandt. 
Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.

FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".

Reply | Threaded
Open this post in threaded view
|

Re: Possible memory leak in PutHive3Streaming?

Matt Burgess-2
In reply to this post by Hengesbach, Martin
Martin,

I confirmed the comment you found in the Jira that the fix is indeed not in Apache Hive 3.1.1 or 3.1.2, which is very unfortunate (especially given the incorrect Fix Version). Wearing my "vendor hat" I know there is a solution as I work heavily on NiFi / Hive integration. Wearing my "Apache hat" I imagine we'd need a 3.1.3 release of Hive with the fix actually in it, and an update to NiFi to use that version. I have reached out to the Hive community asking if there will be a 3.1.3 release, and if so if it can include this fix. If it does, I will write up the NiFi Jira and upgrade the Hive 3 library for the subsequent release.

Regards,
Matt


On Tue, May 12, 2020 at 9:04 AM Hengesbach, Martin <[hidden email]> wrote:

Hi Mark and Matt,

 

Thank you for your answers. I have checked the versions of the Hive client libraries. The output is as expected version 3.1.2:

 

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-hcatalog-core-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-upgrade-acid-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-exec-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-streaming-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-serde-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-common-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-vector-code-gen-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-tez-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-classification-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-0.23-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-common-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-client-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-standalone-metastore-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-service-rpc-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-scheduler-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-jdbc-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-common-3.1.2-tests.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-server-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-metastore-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-common-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-cli-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-service-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-storage-api-2.7.0.jar

 

I have done a new test: 12,000 XML documents with a total of 55 GB in size. I have restarted NiFi before the test.

 

At the beginning I have:

2020-05-12 08:13:07,426 INFO [Timer-Driven Process Thread-232] o.a.hive.streaming.AbstractRecordWriter Stats before flush: [record-updaters: 50, partitions: 1, buffered-records: 1 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 659.08MB/130.00GB]

 

At the end of the test:

 

2020-05-12 10:28:12,911 INFO [Timer-Driven Process Thread-257] o.a.hive.streaming.AbstractRecordWriter Stats after close: [record-updaters: 0, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 72.18GB/130.00GB]

 

The increment is round about the same as my documents are in size. It seems somewhere is memory allocated in size of the document but never freed.

Matt the HIVE-20979 has a comment from February that the bug is not fixed in 3.1.2 But I’m not sure whether this bug can lead to such a large memory leak.

 

Any ideas?

 

 

Best regards,

Martin

 

 

-----Ursprüngliche Nachricht-----
Von: Matt Burgess [mailto:[hidden email]]
Gesendet: Montag, 11. Mai 2020 23:00
An: [hidden email]
Betreff: Re: Possible memory leak in PutHive3Streaming?

 

Martin,

 

There were two memory leaks a while back, one in NiFi code and one in Hive client code (brought in as a dependency for the Hive 3 components). NiFi has fixed their side in 1.9.0 (via NIFI-5841 [1]) and Hive has fixed their side in 3.1.1 (via HIVE-20979 [2]). Until NiFi 1.11.4, we were still using Hive 3.1.0 dependencies which still has the leak. But as of 1.11.4 (via NIFI-7239 [3]) we upgraded our Hive 3 dependencies to 3.1.2, which means those memory leaks should be gone. Can you verify the versions of Hive 3 in your NiFi installation with something like:

 

find work -name "hive*.jar" | grep hive3

 

That should show you the Hive JARs in the Hive 3 NAR, and the version is part of the filename of each.

 

Regards,

Matt

 

[1] https://issues.apache.org/jira/browse/NIFI-5841

[2] https://issues.apache.org/jira/browse/HIVE-20979

[3] https://issues.apache.org/jira/browse/NIFI-7239

 

On Mon, May 11, 2020 at 4:42 PM Mark Payne <[hidden email]> wrote:

> 

> Martin,

> 

> I don’t know a lot about Hive. The log message you’re noting there is definitely coming from Hive’s client, but I can’t say whether or not it’s anything that should actually be concerning in terms of Java memory/heap. Is the NiFi heap size actually set to a max of 130 GB? (That would be a really massive heap for NiFi). Are you seeing any full garbage collections occurring? This information would be available in the Cluster table if you’re running clustered, or else in the Summary table (hamburger menu in the top-right -> Summary -> System Diagnostics in the bottom-right).

> 

> Thanks

> -Mark

> 

> 

> On May 11, 2020, at 5:20 AM, Hengesbach, Martin <[hidden email]> wrote:

> 

> Hi,

> 

> I’m using the PutHive3Streaming processor to write round about 30,000 avro records into a Hive table (round about 150 GB). This works for a while but the used memory is increasing over the time. And after some thousands records, the memory is full.

> 

> When I remove the PutHive3Streaming processor from my flow, everything works OK. So, the problem must be the PutHive3Streaming processor.

> 

> As Record Reader I’m using the AvroReader, the NiFi version is 1.11.4, Hive is version 3.1.0.

> 

> The Hive Table is created as follow:

> create table yyyy(id string, data string) clustered by (id) into 50

> buckets stored as orc tblproperties("transactional"="true")

> 

> Any ideas if I make something wrong or is there really a memory leak in the processor?

> 

> Thanks in advance and best regards

> 

> Martin

> 

> 

> 

> 2020-05-08 09:37:22,076 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Creating metastore client for

> streaming-connection

> 2020-05-08 09:37:22,091 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to

> metastore with URI thrift://xxxxxx:9083

> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to

> metastore, current connections: 1

> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.

> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]

> o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class

> org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi

> (auth:SIMPLE) retries=24 delay=5 lifetime=0

> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Creating metastore client for

> streaming-connection-heartbeat

> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to

> metastore with URI thrift://xxxx:9083

> 2020-05-08 09:37:22,094 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to

> metastore, current connections: 2

> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.

> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44]

> o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class

> org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi

> (auth:SIMPLE) retries=24 delay=5 lifetime=0

> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection STREAMING CONNECTION INFO: {

> metastore-uri: thrift://xxxxxxxx:9083, database: default, table: yyyy,

> partitioned-table: false, dynamic-partitioning: false, username: nifi,

> secure-mode: false, record-writer: HiveRecordWriter, agent-info: NiFi

> PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread

> 152[Timer-Driven Process Thread-44] }

> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Starting heartbeat thread with

> interval: 150000 ms initialDelay: 64376 ms for agentInfo: NiFi

> PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread

> 152[Timer-Driven Process Thread-44]

> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Created new filesystem

> instance: 484197097

> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Memory monitorings settings -

> autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0

> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]

> o.a.hadoop.hive.common.HeapMemoryMonitor Setting collection usage

> threshold to 2147483647

> 2020-05-08 09:37:22,233 WARN [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen

> memory is already low. Increase memory to improve performance. Used:

> 129.54GB Max: 130.00GB

> 2020-05-08 09:37:22,233 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Opened new transaction batch

> TxnId/WriteIds=[292449/7336...292449/7336] on connection = {

> metaStoreUri: thrift://xxxxxxxx:9083, database: default, table: yyyy

> };  TxnStatus[O] LastUsed txnid:0

> 2020-05-08 09:37:22,272 INFO [Timer-Driven Process Thread-44]

> org.apache.orc.impl.PhysicalFsWriter ORC writer created for path:

> hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with

> stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize:

> 32768

> 2020-05-08 09:37:22,292 INFO [Timer-Driven Process Thread-44]

> org.apache.orc.impl.WriterImpl ORC writer created for path:

> hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with

> stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize:

> 32768

> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats before flush:

> [record-updaters: 50, partitions: 1, buffered-records: 1

> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Flushing record updater for

> partitions: default.yyyy

> 2020-05-08 09:37:22,578 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats after flush:

> [record-updaters: 50, partitions: 1, buffered-records: 0

> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats before close:

> [record-updaters: 50, partitions: 1, buffered-records: 0

> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Closing updater for

> partitions: default.yyyy

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats after close:

> [record-updaters: 0, partitions: 1, buffered-records: 0 total-records:

> 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to

> metastore, current connections: 1

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to

> metastore, current connections: 0

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Closed streaming connection.

> Agent: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458]

> thread 152[Timer-Driven Process Thread-44] Stats: {records-written: 1,

> records-size: 0, committed-transactions: 1, aborted-transactions: 0,

> auto-flushes: 0, metastore-calls: 7 }

> 

> At the beginning the memory starts with Used: 2.63GB Max: 130.00GB

> 

> ________________________________

> 

> FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.

> Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.

> Geschäftsführerin: Sabine Brünger-Weilandt.

> Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.

> 

> FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".

> 

> 


FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
Geschäftsführerin: Sabine Brünger-Weilandt.
Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.

FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".
Reply | Threaded
Open this post in threaded view
|

AW: Possible memory leak in PutHive3Streaming?

Hengesbach, Martin

Hi Matt and Mark,

 

@Matt: thank you, I think this will be the solution.

 

@Mark: The memory isn’t freed from the garbage collector. My NiFi has stopped 2 hours ago and isn’t reachable anymore. The last error message is:

 

2020-05-13 04:28:34,963 WARN [Timer-Driven Process Thread-232] o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen memory is already low. Increase memory to improve performance. Used: 128.79GB Max: 130.00GB

 

Conclusion: only a fixed Hive client will solve the problem.

 

Thank you both

Martin

 

Von: Matt Burgess [mailto:[hidden email]]
Gesendet: Dienstag, 12. Mai 2020 19:24
An: [hidden email]
Betreff: Re: Possible memory leak in PutHive3Streaming?

 

Martin,

 

I confirmed the comment you found in the Jira that the fix is indeed not in Apache Hive 3.1.1 or 3.1.2, which is very unfortunate (especially given the incorrect Fix Version). Wearing my "vendor hat" I know there is a solution as I work heavily on NiFi / Hive integration. Wearing my "Apache hat" I imagine we'd need a 3.1.3 release of Hive with the fix actually in it, and an update to NiFi to use that version. I have reached out to the Hive community asking if there will be a 3.1.3 release, and if so if it can include this fix. If it does, I will write up the NiFi Jira and upgrade the Hive 3 library for the subsequent release.

 

Regards,

Matt

 

 

On Tue, May 12, 2020 at 9:04 AM Hengesbach, Martin <[hidden email]> wrote:

Hi Mark and Matt,

 

Thank you for your answers. I have checked the versions of the Hive client libraries. The output is as expected version 3.1.2:

 

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-hcatalog-core-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-upgrade-acid-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-exec-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-streaming-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-serde-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-common-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-vector-code-gen-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-tez-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-classification-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-0.23-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-common-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-client-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-standalone-metastore-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-service-rpc-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-scheduler-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-jdbc-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-common-3.1.2-tests.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-server-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-metastore-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-common-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-cli-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-service-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-storage-api-2.7.0.jar

 

I have done a new test: 12,000 XML documents with a total of 55 GB in size. I have restarted NiFi before the test.

 

At the beginning I have:

2020-05-12 08:13:07,426 INFO [Timer-Driven Process Thread-232] o.a.hive.streaming.AbstractRecordWriter Stats before flush: [record-updaters: 50, partitions: 1, buffered-records: 1 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 659.08MB/130.00GB]

 

At the end of the test:

 

2020-05-12 10:28:12,911 INFO [Timer-Driven Process Thread-257] o.a.hive.streaming.AbstractRecordWriter Stats after close: [record-updaters: 0, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 72.18GB/130.00GB]

 

The increment is round about the same as my documents are in size. It seems somewhere is memory allocated in size of the document but never freed.

Matt the HIVE-20979 has a comment from February that the bug is not fixed in 3.1.2 But I’m not sure whether this bug can lead to such a large memory leak.

 

Any ideas?

 

 

Best regards,

Martin

 

 

-----Ursprüngliche Nachricht-----
Von: Matt Burgess [mailto:[hidden email]]
Gesendet: Montag, 11. Mai 2020 23:00
An: [hidden email]
Betreff: Re: Possible memory leak in PutHive3Streaming?

 

Martin,

 

There were two memory leaks a while back, one in NiFi code and one in Hive client code (brought in as a dependency for the Hive 3 components). NiFi has fixed their side in 1.9.0 (via NIFI-5841 [1]) and Hive has fixed their side in 3.1.1 (via HIVE-20979 [2]). Until NiFi 1.11.4, we were still using Hive 3.1.0 dependencies which still has the leak. But as of 1.11.4 (via NIFI-7239 [3]) we upgraded our Hive 3 dependencies to 3.1.2, which means those memory leaks should be gone. Can you verify the versions of Hive 3 in your NiFi installation with something like:

 

find work -name "hive*.jar" | grep hive3

 

That should show you the Hive JARs in the Hive 3 NAR, and the version is part of the filename of each.

 

Regards,

Matt

 

[1] https://issues.apache.org/jira/browse/NIFI-5841

[2] https://issues.apache.org/jira/browse/HIVE-20979

[3] https://issues.apache.org/jira/browse/NIFI-7239

 

On Mon, May 11, 2020 at 4:42 PM Mark Payne <[hidden email]> wrote:

> Martin,

> I don’t know a lot about Hive. The log message you’re noting there is definitely coming from Hive’s client, but I can’t say whether or not it’s anything that should actually be concerning in terms of Java memory/heap. Is the NiFi heap size actually set to a max of 130 GB? (That would be a really massive heap for NiFi). Are you seeing any full garbage collections occurring? This information would be available in the Cluster table if you’re running clustered, or else in the Summary table (hamburger menu in the top-right -> Summary -> System Diagnostics in the bottom-right).

> Thanks

> -Mark

> On May 11, 2020, at 5:20 AM, Hengesbach, Martin <[hidden email]> wrote:

> Hi,

> I’m using the PutHive3Streaming processor to write round about 30,000 avro records into a Hive table (round about 150 GB). This works for a while but the used memory is increasing over the time. And after some thousands records, the memory is full.

> When I remove the PutHive3Streaming processor from my flow, everything works OK. So, the problem must be the PutHive3Streaming processor.

> As Record Reader I’m using the AvroReader, the NiFi version is 1.11.4, Hive is version 3.1.0.

> The Hive Table is created as follow:

> create table yyyy(id string, data string) clustered by (id) into 50

> buckets stored as orc tblproperties("transactional"="true")

> Any ideas if I make something wrong or is there really a memory leak in the processor?

> Thanks in advance and best regards

> Martin

> 2020-05-08 09:37:22,076 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Creating metastore client for

> streaming-connection

> 2020-05-08 09:37:22,091 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to

> metastore with URI thrift://xxxxxx:9083

> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to

> metastore, current connections: 1

> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.

> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]

> o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class

> org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi

> (auth:SIMPLE) retries=24 delay=5 lifetime=0

> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Creating metastore client for

> streaming-connection-heartbeat

> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to

> metastore with URI thrift://xxxx:9083

> 2020-05-08 09:37:22,094 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to

> metastore, current connections: 2

> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.

> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44]

> o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class

> org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi

> (auth:SIMPLE) retries=24 delay=5 lifetime=0

> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection STREAMING CONNECTION INFO: {

> metastore-uri: thrift://xxxxxxxx:9083, database: default, table: yyyy,

> partitioned-table: false, dynamic-partitioning: false, username: nifi,

> secure-mode: false, record-writer: HiveRecordWriter, agent-info: NiFi

> PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread

> 152[Timer-Driven Process Thread-44] }

> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Starting heartbeat thread with

> interval: 150000 ms initialDelay: 64376 ms for agentInfo: NiFi

> PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread

> 152[Timer-Driven Process Thread-44]

> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Created new filesystem

> instance: 484197097

> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Memory monitorings settings -

> autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0

> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]

> o.a.hadoop.hive.common.HeapMemoryMonitor Setting collection usage

> threshold to 2147483647

> 2020-05-08 09:37:22,233 WARN [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen

> memory is already low. Increase memory to improve performance. Used:

> 129.54GB Max: 130.00GB

> 2020-05-08 09:37:22,233 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Opened new transaction batch

> TxnId/WriteIds=[292449/7336...292449/7336] on connection = {

> metaStoreUri: thrift://xxxxxxxx:9083, database: default, table: yyyy

> };  TxnStatus[O] LastUsed txnid:0

> 2020-05-08 09:37:22,272 INFO [Timer-Driven Process Thread-44]

> org.apache.orc.impl.PhysicalFsWriter ORC writer created for path:

> hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with

> stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize:

> 32768

> 2020-05-08 09:37:22,292 INFO [Timer-Driven Process Thread-44]

> org.apache.orc.impl.WriterImpl ORC writer created for path:

> hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with

> stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize:

> 32768

> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats before flush:

> [record-updaters: 50, partitions: 1, buffered-records: 1

> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Flushing record updater for

> partitions: default.yyyy

> 2020-05-08 09:37:22,578 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats after flush:

> [record-updaters: 50, partitions: 1, buffered-records: 0

> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats before close:

> [record-updaters: 50, partitions: 1, buffered-records: 0

> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Closing updater for

> partitions: default.yyyy

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats after close:

> [record-updaters: 0, partitions: 1, buffered-records: 0 total-records:

> 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to

> metastore, current connections: 1

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to

> metastore, current connections: 0

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Closed streaming connection.

> Agent: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458]

> thread 152[Timer-Driven Process Thread-44] Stats: {records-written: 1,

> records-size: 0, committed-transactions: 1, aborted-transactions: 0,

> auto-flushes: 0, metastore-calls: 7 }

> At the beginning the memory starts with Used: 2.63GB Max: 130.00GB

> ________________________________

> FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.

> Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.

> Geschäftsführerin: Sabine Brünger-Weilandt.

> Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.

> FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".

FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
Geschäftsführerin: Sabine Brünger-Weilandt.
Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.

FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".


FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
Geschäftsführerin: Sabine Brünger-Weilandt.
Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.

FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".