Partly unpredictable query result using ADO.NET driver

Hi,
we are experiencing partly unpredictable query results using the ADO.NET vertica driver with a query including
a 'LIMIT 1 OVER (PARTITION BY X ORDER BY Y DESC NULLS LAST' clause.
The query results were different between different query source nodes (Vertica.Data.VerticaClient.VerticaConnection.DataSource).
However, this could not be reproduced using the JDBC driver.

Morever, the behaviour changed with the ADO.NET driver, when we re-formlated the LIMIT-1-clause using the classical 'row_number() OVER(PARTITION BY X ORDER BY Y DESC NULLS LAST) AS Rank'-clause.
Then qery result were the same independent from the query source node (Vertica.Data.VerticaClient.VerticaConnection.DataSource)

Might that be a bug in the ADO.NET driver?

All drivers are version 23.4.

Answers

  • VValdarVValdar Vertica Employee Employee
    edited December 2023

    Hi Jörg,

    Is it the same LIMIT 1 OVER(PARTITION BY EventId, EventTimeStamp, DeviceId ORDER BY EventId) from the other topic?
    As I said - but not clearly enough, my bad - I would challenge this code because it doesn't seem deterministic to me.

    Let's say you have true duplicates in your table, like this:

    EventId  EventTimeStamp    DeviceId  ColumnVal
    -------  ----------------  --------  ---------
          1  2013-12-11 08:00        45        128
          1  2013-12-11 08:00        45        128
    

    Your code will work fine.

    But if you have "semi-duplicates" like this:

    EventId  EventTimeStamp    DeviceId  ColumnVal
    -------  ----------------  --------  ---------
          1  2013-12-11 08:00        45        128
          1  2013-12-11 08:00        45        130
    

    There is nothing in the LIMIT 1 (nor the ROW_NUMBER) to specify what row to pick up.
    The database will just pickup one randomly, and different runs of the same query can produce different results.

    In this example, you'll need to add ColumnVal in the ORDER BY clause to make your query deterministic.

  • Hi Valdar,

    you are completly right and we are aware of this shortcoming. And no, it is not the same code as in the last question. Usually, we use this LIMIT 1 clause to remove key duplicates and we do not care too much about "semi-duplicates" in non-key fields.

    In any case, we will challenge the code again having "semi-duplicates" in mind That might indeed by a problem.
    Still, can that explain the differences between drivers and source nodes?

  • Hi Valdar,

    I ran a query, grouping by X and Y checking if there was a 'having count(*) > 1'. There was none. So in this case, there seem to be no "semi-duplicates" .

  • VValdarVValdar Vertica Employee Employee

    Thanks for checking, bug is always a possibility of course.
    Could you open a case so our support check more stuff and if needed we'll forward to R&D?

  • I did already.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file