Bug: Query returns non-existent result
Hi,
I believe I have found a bug in Vertica 6.1.2 when using a combination of:
- RLE encoding on a sorted column with duplicate values
- UNSEGMENTED table
- COPY DIRECT
- Use of ROW_NUMBER() function
The script below demonstrates the problem. It creates a table, inserts some data and runs some queries. The first query returns a non-existent date from the test table.
I believe there may be a problem with decoding a run-length encoded value under certain circumstances.
CREATE TABLE bug (name VARCHAR, forecast_date DATE ENCODING RLE) ORDER BY forecast_date UNSEGMENTED ALL NODES; /* NOTE: BUG ONLY OCCURS WITH DIRECT COPY! */ COPY bug FROM STDIN DELIMITER ',' DIRECT; A1,2008-01-29 A1,2008-01-29 A2,2008-01-29 A2,2008-01-29 A3,2008-01-29 A3,2008-01-29 \. SELECT name, forecast_date, row_number() OVER() AS rownum FROM bug WHERE name = 'A2' ; SELECT name, forecast_date, row_number() OVER(ORDER BY 1) AS rownum FROM bug WHERE name = 'A2' ; SELECT name, forecast_date, row_number() OVER(PARTITION BY 1) AS rownum FROM bug WHERE name = 'A2' ; SELECT name, forecast_date FROM bug WHERE name = 'A2';This gives the following output:
CREATE TABLE name | forecast_date | rownum ------+---------------+-------- A2 | 2008-01-29 | 1 A2 | 2000-01-01 | 2 (2 rows) name | forecast_date | rownum ------+---------------+-------- A2 | 2008-01-29 | 1 A2 | 2008-01-29 | 2 (2 rows) name | forecast_date | rownum ------+---------------+-------- A2 | 2008-01-29 | 1 A2 | 2008-01-29 | 2 (2 rows) name | forecast_date ------+--------------- A2 | 2008-01-29 A2 | 2008-01-29 (2 rows)
Note the invalid value for forecast_date in the first query. The value 2001-01-01 does not even exist in the table!!
Adding ORDER BY 1 or PARTITION BY 1 to the OVER clause solves the problem
The problem is not observed when doing either of the following:
- Remove duplicate rows
- Make the table segmented
- Remove the DIRECT directive from the COPY statement
Regards, Rob
0
Comments
Thanks, Evan
HP Vertica Technical Support