how projection works!
I have some basic knowleage on the projection. It can be use for encoding,sort order,segmentation. But I am quite interesting on how it works. Can I create as many projection as I can, is it cost anything? What is the cost, if I am re-creating the projection everyday? Is there any document regarding how the projection works? Thanks. Martin
0
Comments
Projection is the physical storage of your data. So you do nor recreate them, the data sorted and encoded and segmented in the projection when you load the data.
As the physical storage of the data, a table has to have projections if you load data into it. If you do not create one when created the table, Vertica will create an "auto projection" when you load data the first time. To improve query performance you want to have sort order, segmentation and encoding. When you have samples queries you can run the DBD and Vertica will define the best projections for those queries.
In theory you can have as many as you want. You always have to have a super projection ( 2 if you are k-safe) and query specific projections are optional. The cost is that when you load the data, you will have to load it for each projection what it can make your loads slower. But the true is that you do not need to have too many projections, most customer have the supper projections and maybe 2 or 3 more for fact table but those are query specific, so they just use the columns that they need.
The query specific projections, you can drop them and create new ones, but it means moving the data to the new projection again, so it can take time. You should not need to recreate projections, only if you want to change the sort order or encoding, in that case is a new projection.
The documentation should have details about projections. You should not worry that much, it should be transparent for you. If you want to have better performance, just let the DBD chose the best projections definition for you.
I hope this answer your questions,
Regards,
Eugenia
1. when I load data into the table, will vertica segment the data across nodes? If so, what columns will be used to segment the data?
2. will super projection be dropped by system, once query based projections are created?
Rao
If you create new projections and refresh it, Vertica won't automatically drop the projection, you have to run DROP PROJECTION statement to drop projections that you not longer want.
Eugenia
what is the best practice for creating a logical table? what Clauses(seg key, sort order etc) do you recommend adding to the logical key, so that it gives better ETL/QUERY performance? I believe segmenting and sorting by all keys cause unnecessary overload. At the same time, I would not have information on what queries are going to be executed against the table after the table is loaded.