Issue with GenerateTableFetch bulk load when using MS SQL Server database type

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Issue with GenerateTableFetch bulk load when using MS SQL Server database type

Bryan Quinn

Hi,

I’m trying to use the GenerateTableFetch component for ‘MS SQL 2012+’ database – for a full table export each time its run.

However I’m forced to set a column name in the ‘Maximum-value Columns’ field when the component is run (error only shows during execution if I omit this setting).

I’m seeing the following error:

IllegalArgumentException: Order by clause cannot be null or empty when using row paging

 

So it looks like it only works with incremental mode – which I don’t want.

 

Can someone see why this is the case or is it a bug?

I’m not getting this error if I set the database type to ‘Generic’ – but unfortunately the sql it generates isn’t valid for MS SQL server.

I’m using Nifi v1.3.0

 

Thanks,

Bryan

 

---

Bryan Quinn | Asavie 

Principal Analytics Architect

E: [hidden email] 

W: www.asavie.com | T: @asavie

 

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Issue with GenerateTableFetch bulk load when using MS SQL Server database type

Matt Burgess
Bryan,

This is likely a bug, I will investigate and write it up if so. In the meantime, are you planning on doing the bulk fetch in parallel (with a Remote Process Group into ExecuteSQL across a NiFi cluster)? If not, you may find that QueryDatabaseTable is a good alternative, it can be configured to fetch N rows at a time. The trade off is that it is not parallel or concurrent (each N rows will be fetched one batch at a time), but then each batch can be processed concurrently or in parallel.

Another alternative, if you know the approximate size of the database, is to use GenerateFlowFile to write the SQL statements yourself, then SplitText to get one statement per flow file, then ExecuteSQL (concurrently or in parallel). This solution does not scale with your database size but with some copy and paste (and updating as the DB grows), can emulate GenerateTableFetch in the short term.

Regards,
Matt


On Jul 24, 2017, at 8:55 AM, Bryan Quinn <[hidden email]> wrote:

Hi,

I’m trying to use the GenerateTableFetch component for ‘MS SQL 2012+’ database – for a full table export each time its run.

However I’m forced to set a column name in the ‘Maximum-value Columns’ field when the component is run (error only shows during execution if I omit this setting).

I’m seeing the following error:

IllegalArgumentException: Order by clause cannot be null or empty when using row paging

 

So it looks like it only works with incremental mode – which I don’t want.

 

Can someone see why this is the case or is it a bug?

I’m not getting this error if I set the database type to ‘Generic’ – but unfortunately the sql it generates isn’t valid for MS SQL server.

I’m using Nifi v1.3.0

 

Thanks,

Bryan

 

---

Bryan Quinn | Asavie 

Principal Analytics Architect

E: [hidden email] 

W: www.asavie.com | T: @asavie

 

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Issue with GenerateTableFetch bulk load when using MS SQL Server database type

Bryan Quinn

Hi Matt.

I’ll have a look at option a. What’s the timeframe for a fix and prod release for this assuming it’s a bug? Ballpark.

 

Thanks,

Bryan

 

From: Matt Burgess [mailto:[hidden email]]
Sent: Monday 24 July 2017 14:40
To: [hidden email]
Subject: Re: Issue with GenerateTableFetch bulk load when using MS SQL Server database type

 

Bryan,

 

This is likely a bug, I will investigate and write it up if so. In the meantime, are you planning on doing the bulk fetch in parallel (with a Remote Process Group into ExecuteSQL across a NiFi cluster)? If not, you may find that QueryDatabaseTable is a good alternative, it can be configured to fetch N rows at a time. The trade off is that it is not parallel or concurrent (each N rows will be fetched one batch at a time), but then each batch can be processed concurrently or in parallel.

 

Another alternative, if you know the approximate size of the database, is to use GenerateFlowFile to write the SQL statements yourself, then SplitText to get one statement per flow file, then ExecuteSQL (concurrently or in parallel). This solution does not scale with your database size but with some copy and paste (and updating as the DB grows), can emulate GenerateTableFetch in the short term.

 

Regards,

Matt


On Jul 24, 2017, at 8:55 AM, Bryan Quinn <[hidden email]> wrote:

Hi,

I’m trying to use the GenerateTableFetch component for ‘MS SQL 2012+’ database – for a full table export each time its run.

However I’m forced to set a column name in the ‘Maximum-value Columns’ field when the component is run (error only shows during execution if I omit this setting).

I’m seeing the following error:

IllegalArgumentException: Order by clause cannot be null or empty when using row paging

 

So it looks like it only works with incremental mode – which I don’t want.

 

Can someone see why this is the case or is it a bug?

I’m not getting this error if I set the database type to ‘Generic’ – but unfortunately the sql it generates isn’t valid for MS SQL server.

I’m using Nifi v1.3.0

 

Thanks,

Bryan

 

---

Bryan Quinn | Asavie 

Principal Analytics Architect

E: [hidden email] 

W: www.asavie.com | T: @asavie

 

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Issue with GenerateTableFetch bulk load when using MS SQL Server database type

Matt Burgess-2
This can be considered a bug, at least in the sense that it is not
consistent with how the Generic DB Adapter handles it (i.e. no
requirement for an order by clause). Not to say that it's a good idea
to not have an ORDER BY clause when you are getting pages of rows, but
in this case, the incremental functionality requires a max-value
column, and if not using incremental, it could be presumed that the
user is getting all rows (certainly GenerateTableFetch would generate
the SQL to get all rows, no matter the order). So if we say that the
requirement for "ordering when using limits/offsets" is on the
processor and not the DB adapter, then we can remove the requirement
from the DB adapter(s), and let it be enforced at a processor level if
it makes sense to do so.

How does that sound to everyone?  A Jira case (bug or improvement,
whatever) to remove the requirement on ordering being necessary for
pagination, in the MSSQL 2012+ adapter (and any others where it is
imposed)?

Thanks,
Matt


On Mon, Jul 24, 2017 at 9:55 AM, Bryan Quinn <[hidden email]> wrote:

> Hi Matt.
>
> I’ll have a look at option a. What’s the timeframe for a fix and prod
> release for this assuming it’s a bug? Ballpark.
>
>
>
> Thanks,
>
> Bryan
>
>
>
> From: Matt Burgess [mailto:[hidden email]]
> Sent: Monday 24 July 2017 14:40
> To: [hidden email]
> Subject: Re: Issue with GenerateTableFetch bulk load when using MS SQL
> Server database type
>
>
>
> Bryan,
>
>
>
> This is likely a bug, I will investigate and write it up if so. In the
> meantime, are you planning on doing the bulk fetch in parallel (with a
> Remote Process Group into ExecuteSQL across a NiFi cluster)? If not, you may
> find that QueryDatabaseTable is a good alternative, it can be configured to
> fetch N rows at a time. The trade off is that it is not parallel or
> concurrent (each N rows will be fetched one batch at a time), but then each
> batch can be processed concurrently or in parallel.
>
>
>
> Another alternative, if you know the approximate size of the database, is to
> use GenerateFlowFile to write the SQL statements yourself, then SplitText to
> get one statement per flow file, then ExecuteSQL (concurrently or in
> parallel). This solution does not scale with your database size but with
> some copy and paste (and updating as the DB grows), can emulate
> GenerateTableFetch in the short term.
>
>
>
> Regards,
>
> Matt
>
>
> On Jul 24, 2017, at 8:55 AM, Bryan Quinn <[hidden email]> wrote:
>
> Hi,
>
> I’m trying to use the GenerateTableFetch component for ‘MS SQL 2012+’
> database – for a full table export each time its run.
>
> However I’m forced to set a column name in the ‘Maximum-value Columns’ field
> when the component is run (error only shows during execution if I omit this
> setting).
>
> I’m seeing the following error:
>
> IllegalArgumentException: Order by clause cannot be null or empty when using
> row paging
>
>
>
> So it looks like it only works with incremental mode – which I don’t want.
>
>
>
> Can someone see why this is the case or is it a bug?
>
> I’m not getting this error if I set the database type to ‘Generic’ – but
> unfortunately the sql it generates isn’t valid for MS SQL server.
>
> I’m using Nifi v1.3.0
>
>
>
> Thanks,
>
> Bryan
>
>
>
> ---
>
> Bryan Quinn | Asavie
>
> Principal Analytics Architect
>
> E: [hidden email]
>
> W: www.asavie.com | T: @asavie
>
>
Loading...