How to deal Schema Evolution with Dataset API

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to deal Schema Evolution with Dataset API

Jorge Machado
Hello everyone,

One question to the community.

Imagine I have this

        Case class Person(age: int)

        spark.read.parquet(“inputPath”).as[Person]


After a few weeks of coding I change the class to:
        Case class Person(age: int, name: Option[String] = None)


Then when I run the new code on the same input it fails saying that It cannot find the name on the schema from the parquet file.

Spark version 2.3.3

How is the best way to guard or fix this? Regenerating all data seems not to be a option for us.

Thx



Reply | Threaded
Open this post in threaded view
|

Re: How to deal Schema Evolution with Dataset API

Mike Thomsen
This should be posted on the Spark user list, not the NiFi one.

On Sat, May 9, 2020 at 3:07 AM Jorge Machado <[hidden email]> wrote:
Hello everyone,

One question to the community.

Imagine I have this

        Case class Person(age: int)

        spark.read.parquet(“inputPath”).as[Person]


After a few weeks of coding I change the class to:
        Case class Person(age: int, name: Option[String] = None)


Then when I run the new code on the same input it fails saying that It cannot find the name on the schema from the parquet file.

Spark version 2.3.3

How is the best way to guard or fix this? Regenerating all data seems not to be a option for us.

Thx