Hm... Well, is the table in use? Are people or scripts creating/updating/deleting rows in between queries? Also, what's the full query? (Wondering if there's anything nondeterministic or time-variant in it.)
Has this been fixed yet? I see this is from 2013 but I just downloaded the latest community edition 7.2.x for Debian Linux 7 and Ubuntu 14.04 LTS (I am using Ubuntu) and am running into the same issue.
Here are the steps I followed.
Installed Vertica community edition and set up the VMart database.
I have connected through SquirrelSql and my own Java application using vertica-jdbc-7.2.3-0.jar.
I am running the query select * from "online_sales"."online_sales_fact"
The following are my results, limited to one row (picture is cut off at customer_key).
As we can see the two queries, which are identical, produce entirely different results. I am not altering anything, this is running on my local machine that no one else is accessing. I have not been able to find anything to suggest that this is expected behaviour.
Looks like SquirrelSql is adding a LIMIT 1 to the end of the query? You don't have an ORDER BY clause, so you're getting any one random row - the first row that arrives at the initiator node ready to be output. If you care about which row you get, then add an ORDER BY clause.
So I am new to Vertica, this may just be ignorance on how the product works, but why should I need to add an ORDERBY statement just to get a consistent response? I honestly do not want to order any of the columns, I just want the natural order that they are stored in. If this is not how the product works could you point me to some documentation that explains why this is the case, as I have not been able to find any myself.
I cancelled the limit that Squirrel put on the query and just added my own to ensure the offset is correct. I guess I just dont understand how each one these rows can be returned when I am specifying the same offset?
The SQL standard requires that order be preserved only when ORDER BY is specificed. The fact that other databases may return data in a predictable order is simply a by-product of how they happen to work.
Vertica runs queries with multiple threads, reading data from multiple ROS containers and from multiple nodes. That data gets combined both on individual nodes and back on the initiator node for the query in an unpredictable ordering unless you use ORDER BY. I don't think this is specifically covered in any documentation, it's just how it works.
This is a common "surprise" when people start using a distributed database if they are used to not needing ORDER BY.
Comments
Has this been fixed yet? I see this is from 2013 but I just downloaded the latest community edition 7.2.x for Debian Linux 7 and Ubuntu 14.04 LTS (I am using Ubuntu) and am running into the same issue.
Here are the steps I followed.
The following are my results, limited to one row (picture is cut off at customer_key).
As we can see the two queries, which are identical, produce entirely different results. I am not altering anything, this is running on my local machine that no one else is accessing. I have not been able to find anything to suggest that this is expected behaviour.
Looks like SquirrelSql is adding a LIMIT 1 to the end of the query? You don't have an ORDER BY clause, so you're getting any one random row - the first row that arrives at the initiator node ready to be output. If you care about which row you get, then add an ORDER BY clause.
So I am new to Vertica, this may just be ignorance on how the product works, but why should I need to add an ORDERBY statement just to get a consistent response? I honestly do not want to order any of the columns, I just want the natural order that they are stored in. If this is not how the product works could you point me to some documentation that explains why this is the case, as I have not been able to find any myself.
I cancelled the limit that Squirrel put on the query and just added my own to ensure the offset is correct. I guess I just dont understand how each one these rows can be returned when I am specifying the same offset?
The SQL standard requires that order be preserved only when ORDER BY is specificed. The fact that other databases may return data in a predictable order is simply a by-product of how they happen to work.
Vertica runs queries with multiple threads, reading data from multiple ROS containers and from multiple nodes. That data gets combined both on individual nodes and back on the initiator node for the query in an unpredictable ordering unless you use ORDER BY. I don't think this is specifically covered in any documentation, it's just how it works.
This is a common "surprise" when people start using a distributed database if they are used to not needing ORDER BY.
--Sharon
Ok, learned something new today , thanks for the explanation.