PutFile set Last Modified Time without file.creationTime

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

PutFile set Last Modified Time without file.creationTime

Valentina Ivanova
Hello!

I need to set Last Modified Time in PutFile however I cannot use file.creationTime as it is retrieved from either ListFile or GetFile.

I am retrieving files from a folder in the middle of my flow using FetchFile and passing the absolute path to the files (as ListFile and GetFile have no input connections).
After FetchFile I retrieve the file metadata with ls - l --time-style=full-iso which outputs something like this:

-rw-r--r--   1 nifi nifi 60 2020-02-14 14:14:07.000000000 +0000 file.txt

From this I retrieve all components of the date and time that are needed and merge them together with the following:

fileMetadata.6   value:2020-02-14
fileMetadata.7  value:14:14:07.000000000
fileMetadata.8  value:+0000

dateMetadata  value:${fileMetadata.6:append(' '):append(${fileMetadata.7:substringBefore('.')})}
Last Modified Time  value:${dateMetadata:replace(' ', 'T'):append(' '):append(${fileMetadata.8}):toDate("yyyy-MM-dd'T'HH:mm:ssZ")}

After this I expect the following value for Last Modified Time    2020-02-14T14:14:07 +0000    which should correspond to the format required in the PutFile processor (yyyy-MM-dd'T'HH:mm:ssZ).
Instead, after the above I obtain  Fri Feb 14 15:10:56 CET 2020  which makes me think that there is some other transformation taking place which I am not aware of.
When  the above value Fri Feb 14 15:10:56 CET 2020 is used in the Last Modified Time I get the following error message:

Could not set file lastModifiedTime to Fri Feb 14 15:10:56 CET 2020 because unparsable date:Fri Feb 14 15:10:56 CET 2020

So I am wondering what I could do to address this issue and if there is another transformation taking place.

Thanks in advance & all the best

Valentina
Reply | Threaded
Open this post in threaded view
|

Re: PutFile set Last Modified Time without file.creationTime

Andy LoPresto
To fix the date formatting specific error, you are correct that you need to use the Expression Language functions toDate() [1] and format() [2] to convert to/from plain strings to date objects. You are currently concatenating the two date values (the year-month-day segment and the hour:minute:second segment), then changing the delimiter from a space to a ’T’ (you can just do this explicitly in the first step), then concatenating the timezone offset and trying to convert this to a timestamp via a prescribed format, but the format doesn’t match the input you have. 

Please use the values below (I tested these against the current main branch build, but nothing should have changed since prior releases):

To concatenate the string attributes into a parseable format and convert it to a date object (internally represented as the number of milliseconds since the epoch began at Jan 1, 1970 00:00:00 UTC): 

${fileMetadata.6:append('T'):append(${fileMetadata.7:substringBefore('.')}):append(' '):append(${fileMetadata.8}):toDate("yyyy-MM-dd'T'HH:mm:ss Z”)}

To parse the result of the above into various timezones:

Local timezone: ${parsedTimestamp:format("yyyy-MM-dd'T'HH:mm:ss Z”)}
UTC timezone: ${parsedTimestamp:format("yyyy-MM-dd'T'HH:mm:ss Z", "UTC”)}

If you set the PutFile Last Modified Time to ${timestampUTCString} (or whatever you name the attribute mentioned in Step 2 above), it will successfully set the file’s timestamp when writing it out (06:14 in February in my timezone is equal to 14:14 UTC):

 /tmp  ll timestamptest               15:28:55
total 0
drwxrwxrwx  14 alopresto  wheel   448B Jul  2 15:29 ./
drwxrwxrwt   7 root       wheel   224B Jul  2 15:28 ../
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 0eec229c-5658-4a86-b6ba-3fe507672bd4
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 113fb95e-5a10-48e4-ba9b-616909b68684
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 13fd2b13-fc8e-455d-8ca9-4afa2886a8e8
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 3228111c-476d-4cf6-a141-587270d821e2
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 397e7a21-944b-4a0c-a0d7-6150e10b385e
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 400313d8-9511-451a-ba40-6a37e7649906
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 46c587f6-06ee-463e-8e91-b432073aa98d
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 4a783b61-2304-44c6-9820-045e0cfaac52
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 a30a3e6c-e3ed-4180-9486-3de274116652
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 d4cdafc4-b5f3-4a18-9548-c7a5a2a3ea68
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 e6b94e07-9bd1-4fbc-aee2-27b687681849
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 f6802781-d820-4e18-b803-ceeaf5abee11

I’m not sure I understand your other concerns — ListFile and GetFile do not accept incoming connections because they are designed to retrieve the list of or explicit files from a particular file system location (e.g. you want to list all the files that appear in /some/location/where/another/process/puts/them/over/time as they appear). If you have some other initial process to determine an absolute file path, you can pass it to FetchFile as you’re doing. 

You can also file a feature request Jira to also read the file metadata and make it available as named attributes in the flowfile after reading the file, as this seems like a useful behavior for you and others moving forward. 


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Jul 2, 2020, at 6:32 AM, Valentina Ivanova <[hidden email]> wrote:

Hello!

I need to set Last Modified Time in PutFile however I cannot use file.creationTime as it is retrieved from either ListFile or GetFile.

I am retrieving files from a folder in the middle of my flow using FetchFile and passing the absolute path to the files (as ListFile and GetFile have no input connections). 
After FetchFile I retrieve the file metadata with ls - l --time-style=full-iso which outputs something like this:

-rw-r--r--   1 nifi nifi 60 2020-02-14 14:14:07.000000000 +0000 file.txt

From this I retrieve all components of the date and time that are needed and merge them together with the following:

fileMetadata.6   value:2020-02-14 
fileMetadata.7  value:14:14:07.000000000
fileMetadata.8  value:+0000

dateMetadata  value:${fileMetadata.6:append(' '):append(${fileMetadata.7:substringBefore('.')})}
Last Modified Time  value:${dateMetadata:replace(' ', 'T'):append(' '):append(${fileMetadata.8}):toDate("yyyy-MM-dd'T'HH:mm:ssZ")}

After this I expect the following value for Last Modified Time    2020-02-14T14:14:07 +0000    which should correspond to the format required in the PutFile processor (yyyy-MM-dd'T'HH:mm:ssZ).
Instead, after the above I obtain  Fri Feb 14 15:10:56 CET 2020  which makes me think that there is some other transformation taking place which I am not aware of.
When  the above value Fri Feb 14 15:10:56 CET 2020 is used in the Last Modified Time I get the following error message: 

Could not set file lastModifiedTime to Fri Feb 14 15:10:56 CET 2020 because unparsable date:Fri Feb 14 15:10:56 CET 2020

So I am wondering what I could do to address this issue and if there is another transformation taking place.

Thanks in advance & all the best

Valentina

Reply | Threaded
Open this post in threaded view
|

Re: PutFile set Last Modified Time without file.creationTime

Valentina Ivanova
Hi Andy!

Many thanks for your reply!

Comparing to your solution, I have missed the format function and missed the space before Z in the formatting string. Now, it works as expected, thanks again.

(As for my other concern - ListFile & GetFile - I was just explaining why doing things in a certain way so anyone aware of a better approach can advise me.)

Many thanks & have a great weekend,

Valentina

From: Andy LoPresto <[hidden email]>
Sent: Friday, 3 July 2020 00:34
To: [hidden email] <[hidden email]>
Subject: Re: PutFile set Last Modified Time without file.creationTime
 
To fix the date formatting specific error, you are correct that you need to use the Expression Language functions toDate() [1] and format() [2] to convert to/from plain strings to date objects. You are currently concatenating the two date values (the year-month-day segment and the hour:minute:second segment), then changing the delimiter from a space to a ’T’ (you can just do this explicitly in the first step), then concatenating the timezone offset and trying to convert this to a timestamp via a prescribed format, but the format doesn’t match the input you have. 

Please use the values below (I tested these against the current main branch build, but nothing should have changed since prior releases):

To concatenate the string attributes into a parseable format and convert it to a date object (internally represented as the number of milliseconds since the epoch began at Jan 1, 1970 00:00:00 UTC): 

${fileMetadata.6:append('T'):append(${fileMetadata.7:substringBefore('.')}):append(' '):append(${fileMetadata.8}):toDate("yyyy-MM-dd'T'HH:mm:ss Z”)}

To parse the result of the above into various timezones:

Local timezone: ${parsedTimestamp:format("yyyy-MM-dd'T'HH:mm:ss Z”)}
UTC timezone: ${parsedTimestamp:format("yyyy-MM-dd'T'HH:mm:ss Z", "UTC”)}

If you set the PutFile Last Modified Time to ${timestampUTCString} (or whatever you name the attribute mentioned in Step 2 above), it will successfully set the file’s timestamp when writing it out (06:14 in February in my timezone is equal to 14:14 UTC):

 /tmp  ll timestamptest               15:28:55
total 0
drwxrwxrwx  14 alopresto  wheel   448B Jul  2 15:29 ./
drwxrwxrwt   7 root       wheel   224B Jul  2 15:28 ../
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 0eec229c-5658-4a86-b6ba-3fe507672bd4
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 113fb95e-5a10-48e4-ba9b-616909b68684
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 13fd2b13-fc8e-455d-8ca9-4afa2886a8e8
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 3228111c-476d-4cf6-a141-587270d821e2
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 397e7a21-944b-4a0c-a0d7-6150e10b385e
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 400313d8-9511-451a-ba40-6a37e7649906
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 46c587f6-06ee-463e-8e91-b432073aa98d
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 4a783b61-2304-44c6-9820-045e0cfaac52
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 a30a3e6c-e3ed-4180-9486-3de274116652
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 d4cdafc4-b5f3-4a18-9548-c7a5a2a3ea68
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 e6b94e07-9bd1-4fbc-aee2-27b687681849
-rw-r--r--   1 alopresto  wheel     0B Feb 14 06:14 f6802781-d820-4e18-b803-ceeaf5abee11

I’m not sure I understand your other concerns — ListFile and GetFile do not accept incoming connections because they are designed to retrieve the list of or explicit files from a particular file system location (e.g. you want to list all the files that appear in /some/location/where/another/process/puts/them/over/time as they appear). If you have some other initial process to determine an absolute file path, you can pass it to FetchFile as you’re doing. 

You can also file a feature request Jira to also read the file metadata and make it available as named attributes in the flowfile after reading the file, as this seems like a useful behavior for you and others moving forward. 


Andy LoPresto
[hidden email]
[hidden email]
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Jul 2, 2020, at 6:32 AM, Valentina Ivanova <[hidden email]> wrote:

Hello!

I need to set Last Modified Time in PutFile however I cannot use file.creationTime as it is retrieved from either ListFile or GetFile.

I am retrieving files from a folder in the middle of my flow using FetchFile and passing the absolute path to the files (as ListFile and GetFile have no input connections). 
After FetchFile I retrieve the file metadata with ls - l --time-style=full-iso which outputs something like this:

-rw-r--r--   1 nifi nifi 60 2020-02-14 14:14:07.000000000 +0000 file.txt

From this I retrieve all components of the date and time that are needed and merge them together with the following:

fileMetadata.6   value:2020-02-14 
fileMetadata.7  value:14:14:07.000000000
fileMetadata.8  value:+0000

dateMetadata  value:${fileMetadata.6:append(' '):append(${fileMetadata.7:substringBefore('.')})}
Last Modified Time  value:${dateMetadata:replace(' ', 'T'):append(' '):append(${fileMetadata.8}):toDate("yyyy-MM-dd'T'HH:mm:ssZ")}

After this I expect the following value for Last Modified Time    2020-02-14T14:14:07 +0000    which should correspond to the format required in the PutFile processor (yyyy-MM-dd'T'HH:mm:ssZ).
Instead, after the above I obtain  Fri Feb 14 15:10:56 CET 2020  which makes me think that there is some other transformation taking place which I am not aware of.
When  the above value Fri Feb 14 15:10:56 CET 2020 is used in the Last Modified Time I get the following error message: 

Could not set file lastModifiedTime to Fri Feb 14 15:10:56 CET 2020 because unparsable date:Fri Feb 14 15:10:56 CET 2020

So I am wondering what I could do to address this issue and if there is another transformation taking place.

Thanks in advance & all the best

Valentina