Vertica not picking the right projection
Hi ,
I have a table with 40 columns and and a super projection.
Actually i need only 2 columns for most of myqueries, so I created a query specific projection unsegmented on all nodes(refresed the table)
When I did an explain plan on the two columns from the table, I see the super projection in the plan,
I tried with SET_OPTIMIZER_DIRECTIVES('AvoidUsingProjections=') and avoided super projection and able to see the newly created projection ,
but when I do a select * from that table it throw me exception(not enough projections).
How to optionally overide the super projection?
What is the reason for such behaviour, i know vertica is able to pick the projection with minimum number of columns with which it could answer the query. but why is it unable to pick the right projection in this scenario...
Comments
Hi!
>> How to optionally overide the super projection?
Option 1: you can query a particular projection not just a table (so query your "good" projection)
Option 2: create a view that query a specific projection and query a view.
>> why is it unable to pick the right projection in this scenario...
most important question, but not enough info.
what info is sufficient for the question2
Hi
Without seeing your projection design i can only guess that your two column projection include only 2 columns or only partial columns from the base table on its select list part.
When you execute "select *" you asking the optimizer for columns that do not exists on your "special projection" ( you only have partial set ) , so he must used the supper projection which include all the columns and can answer your "select *" query
I hope this is answering your question.
create table user(c1,c2,c3.....c60);
user_b0, user_b1 are super projections.
I frequently use select c1,c2 from user;
So I created a query specific projection create projection user_tune as select c1,c2 from user unsegmented on all nodes;
When I query select c1,c2; the super projection is used.
I avoided the super projection and tried the same query select c1,c2; here the query specific projection user_tune proj is used.
When I tried select * from user ; we encountered an error which is obvious.
I removed the avoid super projection parameter and tried again with select c1,c2;
Now again super projection is used..
My doubt is when I have a user_tune projection which is sufficient to answer the query select c1,c2; why vertica is not intelligent enough to pick it.
Hi!
Can you post EXPLAIN VERBOSE <select c1, c2 from super projection> and EXPLAIN VERBOSE <select c1, c2 from tuned projection> (use in pastebin)? Vertica chooses the best projection by COST(io/ram/cpu/net usage) and not by execution time.
>> why vertica is not intelligent enough to pick it.
may be a bug, mey be too stupid
Hi,
Your query, do not include any WHERE part , in colum base database like vertica , the cost in many cases to scan two column (without any filter) should be same whenever the projectios include two or 60 colums (assuming they have the same encoding and sort order)
Other option that can impcat the cost /execution plan,is the fact that the tune projection is unsegmented .
So at the end ,you should check the cost of each one of your excution plans.
And last one ,make sure you collection statistics on the table ,this can also impact execution plans
Thanks.
points learned:
for same sort order ,encoding without where clause dont have any impact on cost.
doubts:
You said the projection is unsegmented, but I heard that if a table is unsegmented there wont be any need to wait for other node`s result. it would be fats.
There are lot of parameters in Explain verbose and no proper documentation is provided how to read them in vertica official site.
There is no needs to create projection instead of superprojection for olny one purpoe - select a small number of columns. Its a column store DB. Vertica materialize tuple with your two columns from superprojection as fast as from "tuned" projection.
Unsegmented projection usualy use for prevention of data broadcast o data resegmentation during join operations between small dimension table (unsegmented) and fact segmented table, or beteween two or more unsegmented small tables.
>> ... heard that if a table is unsegmented there wont be any need to wait for other node`s result.
depends on data and query - "Two heads are better than one", with "two heads" you can achive a better parallelism and if request is CPU bounded, so perhaps one node not a best choise. Why Vertica claims for linear sclability? In your case, <select a, b from table> its IO bounded.
Anyway the fact table will be denormalised and the small tables values will be added to fact table.
Now I have another design issue dowe need multiple projections only to solve the order by clause.
I can have two types of queries on this table with 2 different set of where clauses but the columns in the select would be same.
So do I need 2 projections with 2 combination s of order by clauses.
What about the encoding types to follow.
Hello,
I think you shoul try to make user table projection with segmentation based on join keys and ordered with join keys and predicate filters. This projection should have only join fields, lookup fields and where clause fields.
You will have fast merge join instead of hash, and multiple processing of where caluse in all nodes at the same time.
Remember, encoding type sholud be the same on both of joining table for merge join.
Try it.
Hi ,
In addition to the all good replays you get in this thread , one additional point regarding unsegmented :
One of the big advantage with unsegmented queries is with the fact that the parsing engine do not need to execute distributed parsing and coordination around the cluster , this coordination can add some extra overhead to the total execution time , so for small queries (queries that scan small amount of data ) where you need to get 200Ms queries speed , unsegmented can be something to consider .
The other side of the coin is that unsegmented projections required more disk space and resource (your data load will consume more resource ) and you lost you ability to parallel your query , as the query will always run on singel node .
All at all , the projection you define must suit your use case , you need to think about each option and see the advantage / disadvantage before you decide .
I hope you will find it helpful.
Thanks for all your reply`s.
The response is overwhelming for such a small community like vertica.
After reading all the suggestions I got this feeling like i have to learn a lot in Vertica.
Quite right! Vertica is not so simple as sales usaulily tell
What are the chances for future of Vertica..
How are job oppurtunities on HP Vertica.
I know the combination of Reporting tool will help, any carrer suggestions.
I worked for 3 years in Oracle and moved to Vertica 6 months back.
diversification of expertise is always good to raise your chances to get the work of your dream.
PS
9 years in oracle
1,5 years in netezza
2,5 year in vertica
and some other zoo ))