Order by clause in Views
I created a view against a table using the following format "select col1, count(col2), count(distinct col3) from table group by col1, order by col1 desc". But when I do a select * from view_name, it seems that the results are not sorted according to my view definition. They should be sorted in descending order by the first column. Does Vertica not respect the order by clause in views? If no then why are my results not coming out correctly? and If yes, why does it have an example of a query in the SQL Documentation that has an order by clause? Edit: I am running on version 6.1.3.
0
Comments
Take a look at the plan to see the order by step
Thanks for the prompt reply. I do see the order by clause in my explain plan. I should have added this yesterday in my question yesterday. My results set is kinda odd. It gives the first few records in the right descending order but then it starts over again. It seems like its doing the order by in stages. ex. posted below:
Please ignore the outer select * from. I was trying something different.
create view prodrush.v_session_count_Test as select * from (select dim_Date_gid, count(*), count(distinct dim_user_Gid) session_Actual from prodrush.core_Session_STart group by dim_Date_Gid ) a order by a.dim_Date_gid desc;
explain select * from prodrush.v_session_count_Test;
Access Path:
+-SORT [Cost: 75K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| Order: a.dim_Date_gid DESC
| Execute on: All Nodes
| Execute on: All Nodes
| +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 75K, Rows: 10K (NO STATISTICS)] (PATH ID: 4)
| | Aggregates: count(DISTINCT CORE_SESSION_START.DIM_USER_GID), sum_of_count(*)
| | Group By: CORE_SESSION_START.DIM_DATE_GID
| | Execute on: All Nodes
| | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 75K, Rows: 10K (NO STATISTICS)] (PATH ID: 5)
| | | Aggregates: count(*)
| | | Group By: CORE_SESSION_START.DIM_DATE_GID, CORE_SESSION_START.DIM_USER_GID
| | | Execute on: All Nodes
| | | +---> STORAGE ACCESS for CORE_SESSION_START [Cost: 10K, Rows: 47M (NO STATISTICS)] (PATH ID: 6)
| | | | Projection: prodrush.CORE_SESSION_START_b0
| | | | Materialize: CORE_SESSION_START.DIM_USER_GID, CORE_SESSION_START.DIM_DATE_GID
| | | | Execute on: All Nodes
dim_Date_gid | count | session_Actual
--------------+--------+----------------
20140506 | 430248 | 136415
20140428 | 442677 | 148317
20140425 | 443542 | 150815
20140421 | 474549 | 158368
20140411 | 447423 | 142000
20140408 | 477893 | 148368
20140407 | 484159 | 149017
20140326 | 553793 | 179314
20140313 | 355971 | 102434
20140306 | 363223 | 101124
20140302 | 346030 | 100294
20140227 | 340832 | 98949
20140226 | 345988 | 98747
20140223 | 358341 | 100646
20140212 | 357566 | 101414
20140202 | 512550 | 206070
20140201 | 512621 | 238803
20140131 | 318714 | 160801
20140122 | 118070 | 36597
20140114 | 112691 | 35164
20140108 | 109384 | 36227
20140104 | 107475 | 35681
20140103 | 103139 | 34757
20131231 | 81115 | 26936
20131226 | 78721 | 26897
20131129 | 35680 | 12101
20131123 | 35266 | 11737
20131119 | 35381 | 11012
20131115 | 33234 | 10068
20131110 | 26475 | 7741
20131107 | 21335 | 5960
20131102 | 20599 | 5655
20131015 | 9 | 2
| 2 | 0
20140508 | 424302 | 135379
20140426 | 456872 | 157898
20140409 | 467122 | 146348
20140403 | 498568 | 155057
20140330 | 528140 | 165537
20140323 | 390879 | 129574
20140317 | 336300 | 99432
20140315 | 335889 | 98972
20140314 | 336840 | 98915
20140303 | 343591 | 99907
20140224 | 351225 | 99135
20140214 | 342582 | 100042
20140210 | 359154 | 102985
20140125 | 131294 | 40865
20140102 | 99228 | 33244
20131204 | 37285 | 12617
20131201 | 36773 | 12397
20131127 | 37755 | 12612
20131125 | 35714 | 11826
20131124 | 36710 | 11830
20131118 | 34857 | 10492
20131031 | 171 | 37
One thing - NO STATISTICS - run refresh on the used projections.
True that that the result set output is kind strange - i have no exp for that !
Hm... No, I don't know offhand. I could hazard a guess. Really, though, I think Vertica assumes that the output of any SELECT statement need not be ordered unless it has an explicit ORDER BY clause on the outermost SQL statement itself. (Not in a subquery, view, etc -- maybe the data was ordered coming out of the subquery, but there are cases where it's advantageous to give up on keeping track of the ordering as part of the next step of the query.)
In general -- you can put an ORDER BY on a table but you can't assume that SELECT will fetch the data in exactly that order (though imposing the order on the data will be very cheap in this case); I would expect the same for views.
Do you have a pointer to the bit of the documentation that suggests putting an ORDER BY in the view?
Thanks,
Adam
Very interesting find. I was able to replicate your behavior. When I compare the explain plan of the view and the underlying SELECT statement used in the view, the (SORT OUTPUT) step is missing in the view's plan. Looks like Vertica is not sorting your data at all!
Querying the VIEWS system table for the view definition still shows the order by clause. Even the example in the Vertica documentation has an Order By clause for Create View.
Adam,
the relevant link, including the Order By example is at https://my.vertica.com/docs/6.1.x/HTML/index.htm#10810.htm
/Sajan
As you can see, there a SORT OUTPUT step in the plan. When creating view with the same select statement, the explain plan doesn't contain the SORT OUTPUT step or anything equivalent.
I am running this in a 3 node cluster and the projection used is segmented. You may not be able to replicate the issue in a single node instance or a replicated projection, although I havent tried that yet.
Folks here are still taking a look. But the existing answer is, as I understand it, the SQL standard doesn't require that views be sorted, even if the SQL inside the view definition contains an ORDER BY clause. And I don't think we intend to promise anything stronger here than is guaranteed by the standard. (Views, regardless of their definition, are like tables: They are a collection of rows; even in Vertica, where we keep our data sorted on disk, the order when SELECT'ing from either a view or a table is unspecified.)
This would seem to imply that the documentation page that you cited is, at best, misleading. It doesn't actually state that the sorted output will always be sorted because an ORDER BY clause is used there. But it would be entirely silly to use an ORDER BY there otherwise...
I would expect (though can't promise here) that this will be addressed, by either a new-feature announcement or a documentation change, in a future release.
Adam