Partly unpredictable query result using ADO.NET driver

joergschaber · December 2023

Hi,
we are experiencing partly unpredictable query results using the ADO.NET vertica driver with a query including
a 'LIMIT 1 OVER (PARTITION BY X ORDER BY Y DESC NULLS LAST' clause.
The query results were different between different query source nodes (Vertica.Data.VerticaClient.VerticaConnection.DataSource).
However, this could not be reproduced using the JDBC driver.

Morever, the behaviour changed with the ADO.NET driver, when we re-formlated the LIMIT-1-clause using the classical 'row_number() OVER(PARTITION BY X ORDER BY Y DESC NULLS LAST) AS Rank'-clause.
Then qery result were the same independent from the query source node (Vertica.Data.VerticaClient.VerticaConnection.DataSource)

Might that be a bug in the ADO.NET driver?

All drivers are version 23.4.

VValdar · December 2023

Hi Jörg,

Is it the same LIMIT 1 OVER(PARTITION BY EventId, EventTimeStamp, DeviceId ORDER BY EventId) from the other topic?
As I said - but not clearly enough, my bad - I would challenge this code because it doesn't seem deterministic to me.

Let's say you have true duplicates in your table, like this:

EventId  EventTimeStamp    DeviceId  ColumnVal
-------  ----------------  --------  ---------
      1  2013-12-11 08:00        45        128
      1  2013-12-11 08:00        45        128

Your code will work fine.

But if you have "semi-duplicates" like this:

EventId  EventTimeStamp    DeviceId  ColumnVal
-------  ----------------  --------  ---------
      1  2013-12-11 08:00        45        128
      1  2013-12-11 08:00        45        130

There is nothing in the LIMIT 1 (nor the ROW_NUMBER) to specify what row to pick up.
The database will just pickup one randomly, and different runs of the same query can produce different results.

In this example, you'll need to add ColumnVal in the ORDER BY clause to make your query deterministic.

joergschaber · December 2023

Hi Valdar,

you are completly right and we are aware of this shortcoming. And no, it is not the same code as in the last question. Usually, we use this LIMIT 1 clause to remove key duplicates and we do not care too much about "semi-duplicates" in non-key fields.

In any case, we will challenge the code again having "semi-duplicates" in mind That might indeed by a problem.
Still, can that explain the differences between drivers and source nodes?

joergschaber · December 2023

Hi Valdar,

I ran a query, grouping by X and Y checking if there was a 'having count(*) > 1'. There was none. So in this case, there seem to be no "semi-duplicates" .

VValdar · December 2023

Thanks for checking, bug is always a possibility of course.
Could you open a case so our support check more stuff and if needed we'll forward to R&D?

joergschaber · December 2023

I did already.

We're Moving!

Create My New Community Account Now

Partly unpredictable query result using ADO.NET driver

Answers

Leave a Comment