We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Bug: Query returns non-existent result — Vertica Forum

Bug: Query returns non-existent result

Hi,

I believe I have found a bug in Vertica 6.1.2 when using a combination of:

  1. RLE encoding on a sorted column with duplicate values
  2. UNSEGMENTED table
  3. COPY DIRECT
  4. Use of ROW_NUMBER() function

The script below demonstrates the problem. It creates a table, inserts some data and runs some queries. The first query returns a non-existent date from the test table.

I believe there may be a problem with decoding a run-length encoded value under certain circumstances.

CREATE TABLE bug (name VARCHAR, forecast_date DATE ENCODING RLE) ORDER BY forecast_date UNSEGMENTED ALL NODES; /* NOTE: BUG ONLY OCCURS WITH DIRECT COPY! */ COPY bug FROM STDIN DELIMITER ',' DIRECT; A1,2008-01-29 A1,2008-01-29 A2,2008-01-29 A2,2008-01-29 A3,2008-01-29 A3,2008-01-29 \. SELECT name, forecast_date, row_number() OVER() AS rownum FROM bug WHERE name = 'A2' ; SELECT name, forecast_date, row_number() OVER(ORDER BY 1) AS rownum FROM bug WHERE name = 'A2' ; SELECT name, forecast_date, row_number() OVER(PARTITION BY 1) AS rownum FROM bug WHERE name = 'A2' ; SELECT name, forecast_date FROM bug WHERE name = 'A2';

This gives the following output:

         CREATE TABLE           name | forecast_date | rownum          ------+---------------+--------          A2   | 2008-01-29    |      1          A2   | 2000-01-01    |      2         (2 rows)           name | forecast_date | rownum          ------+---------------+--------          A2   | 2008-01-29    |      1          A2   | 2008-01-29    |      2         (2 rows)           name | forecast_date | rownum          ------+---------------+--------          A2   | 2008-01-29    |      1          A2   | 2008-01-29    |      2         (2 rows)           name | forecast_date          ------+---------------          A2   | 2008-01-29          A2   | 2008-01-29         (2 rows) 

Note the invalid value for forecast_date in the first query. The value 2001-01-01 does not even exist in the table!!

Adding ORDER BY 1 or PARTITION BY 1 to the OVER clause solves the problem

The problem is not observed when doing either of the following:

  • Remove duplicate rows
  • Make the table segmented
  • Remove the DIRECT directive from the COPY statement

Regards, Rob

Comments

  • Fyi - I just tried your reproducer in 6.1.3 and it reproduces there too.  
  • Is the only time you see this with an empty OVER()?  While the results are incorrect, using row_number() without an ORDER BY is non-deterministic and doesn't seem very useful.

  • Yes. Only with an empty OVER(). I only use row_number in this query to generate a temporary unique identifier.
  • I am also able to reproduce the issue and will log a defect with Engineering to address the issue.
    Thanks, Evan
    HP Vertica Technical Support

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file