Bug: Query returns non-existent result

Hi,

I believe I have found a bug in Vertica 6.1.2 when using a combination of:

  1. RLE encoding on a sorted column with duplicate values
  2. UNSEGMENTED table
  3. COPY DIRECT
  4. Use of ROW_NUMBER() function

The script below demonstrates the problem. It creates a table, inserts some data and runs some queries. The first query returns a non-existent date from the test table.

I believe there may be a problem with decoding a run-length encoded value under certain circumstances.

CREATE TABLE bug (name VARCHAR, forecast_date DATE ENCODING RLE) ORDER BY forecast_date UNSEGMENTED ALL NODES; /* NOTE: BUG ONLY OCCURS WITH DIRECT COPY! */ COPY bug FROM STDIN DELIMITER ',' DIRECT; A1,2008-01-29 A1,2008-01-29 A2,2008-01-29 A2,2008-01-29 A3,2008-01-29 A3,2008-01-29 \. SELECT name, forecast_date, row_number() OVER() AS rownum FROM bug WHERE name = 'A2' ; SELECT name, forecast_date, row_number() OVER(ORDER BY 1) AS rownum FROM bug WHERE name = 'A2' ; SELECT name, forecast_date, row_number() OVER(PARTITION BY 1) AS rownum FROM bug WHERE name = 'A2' ; SELECT name, forecast_date FROM bug WHERE name = 'A2';

This gives the following output:

         CREATE TABLE           name | forecast_date | rownum          ------+---------------+--------          A2   | 2008-01-29    |      1          A2   | 2000-01-01    |      2         (2 rows)           name | forecast_date | rownum          ------+---------------+--------          A2   | 2008-01-29    |      1          A2   | 2008-01-29    |      2         (2 rows)           name | forecast_date | rownum          ------+---------------+--------          A2   | 2008-01-29    |      1          A2   | 2008-01-29    |      2         (2 rows)           name | forecast_date          ------+---------------          A2   | 2008-01-29          A2   | 2008-01-29         (2 rows) 

Note the invalid value for forecast_date in the first query. The value 2001-01-01 does not even exist in the table!!

Adding ORDER BY 1 or PARTITION BY 1 to the OVER clause solves the problem

The problem is not observed when doing either of the following:

  • Remove duplicate rows
  • Make the table segmented
  • Remove the DIRECT directive from the COPY statement

Regards, Rob

Comments

  • Fyi - I just tried your reproducer in 6.1.3 and it reproduces there too.  
  • Is the only time you see this with an empty OVER()?  While the results are incorrect, using row_number() without an ORDER BY is non-deterministic and doesn't seem very useful.

  • Yes. Only with an empty OVER(). I only use row_number in this query to generate a temporary unique identifier.
  • I am also able to reproduce the issue and will log a defect with Engineering to address the issue.
    Thanks, Evan
    HP Vertica Technical Support

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file