Need help converting xml data

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Need help converting xml data

jmkofoed
Hi there
I'm a newbie regarding processing records in nifi and I'm stuck.
One of my issues is I don't know the complete schema for the data I have to process.
Therefore I have configured a XMLReader to use the Infer Schema. The other issue is that I have problems converting sub records. My records looks something like this:
<RootLabel>
    <Part1>
        <name>John Doe</name>
        <adress>some there</adress>
    </Part1>
    <Part2>
        <Job>workingman</Job>
    </Part2>
    <Part3>
        <Details>
            <additionalInfo name="Location">New York</additionalInfo>
            <additionalInfo name="Company">A Company</additionalInfo>
        </Details>
    </Part3>
</RootLabel>

The issues are with the subrecords in part 3. I have configured the XMLReader property "Field Name for Content" = value

When the data is being converted via a XMLWriter the output for the additionalInfo fields looks like this:
<Part3>
    <Details>
        <additionalInfo>MapRecord[{name=Location, value=New York}]</additionalInfo>
        <additionalInfo>MapRecord[{name=Company, value=A Company}]</additionalInfo>
    </Details>
</Part3>

If I use a JSONWriter I gets this:
"Part3": {
    "Details": {
        "additionalInfo": [ "MapRecord[{name=Location, value=New York}]", "MapRecord[{name=Company, value=A Company}]" ]
    }
}

How do I get the same xml output as the original input?
How can I convert the input to JSON so it looks something like this:
"Part3": {
    "Details": {
        "additionalInfo": {
            "Location": "New York",
            "Company": "A Company"
        }
    }
}

Please help...

Kind regards
Jens M. Kofoed
Reply | Threaded
Open this post in threaded view
|

Re: Need help converting xml data

Mark Payne
Hi Jens,

Unfortunately, this looks like a bug in the schema inference for XML. The schema inference appears to be inferring a type of String for the Details, but the XML Reader is actually returning a Record. As a result, it turns that record into a String, which gives you the odd output like "MapRecord[{name=Location, value=New York)}]”

I have filed a Jira [1] to address this.

Until that is addressed, you may end up needing to provide an explicit schema. The good news is that in the Record Writer, you can configure it to add the schema to the ‘Avro.schema’ attribute. This inferred schema should be *almost* what you need, though obviously not entirely correct because the additionalInfo element here needs to be a Record. But it may get you 90% of the way there.

THanks
-Mark

[1] https://issues.apache.org/jira/browse/NIFI-7493

On May 28, 2020, at 2:22 AM, Jens M. Kofoed <[hidden email]> wrote:

Hi there
I'm a newbie regarding processing records in nifi and I'm stuck.
One of my issues is I don't know the complete schema for the data I have to process.
Therefore I have configured a XMLReader to use the Infer Schema. The other issue is that I have problems converting sub records. My records looks something like this:
<RootLabel>
    <Part1>
        <name>John Doe</name>
        <adress>some there</adress>
    </Part1>
    <Part2>
        <Job>workingman</Job>
    </Part2>
    <Part3>
        <Details>
            <additionalInfo name="Location">New York</additionalInfo>
            <additionalInfo name="Company">A Company</additionalInfo>
        </Details>
    </Part3>
</RootLabel>

The issues are with the subrecords in part 3. I have configured the XMLReader property "Field Name for Content" = value

When the data is being converted via a XMLWriter the output for the additionalInfo fields looks like this:
<Part3>
    <Details>
        <additionalInfo>MapRecord[{name=Location, value=New York}]</additionalInfo>
        <additionalInfo>MapRecord[{name=Company, value=A Company}]</additionalInfo>
    </Details>
</Part3>

If I use a JSONWriter I gets this:
"Part3": {
    "Details": {
        "additionalInfo": [ "MapRecord[{name=Location, value=New York}]", "MapRecord[{name=Company, value=A Company}]" ]
    }
}

How do I get the same xml output as the original input?
How can I convert the input to JSON so it looks something like this:
"Part3": {
    "Details": {
        "additionalInfo": {
            "Location": "New York",
            "Company": "A Company"
        }
    }
}

Please help...

Kind regards
Jens M. Kofoed