how projection works!

I have some basic knowleage on the projection. It can be use for encoding,sort order,segmentation. But I am quite interesting on how it works. Can I create as many projection as I can, is it cost anything? What is the cost, if I am re-creating the projection everyday? Is there any document regarding how the projection works? Thanks. Martin


  • Options

    Projection is the physical storage of your data. So you do nor recreate them, the data sorted and encoded and segmented in the projection when you load the data. 

    As the physical storage of the data, a table has to have projections if you load data into it. If you do not create one when created the table, Vertica will create an "auto projection" when you load data the first time. To improve query performance you want to have sort order, segmentation and encoding. When you have samples queries you can run the DBD and Vertica will define the best projections for those queries.

    In theory you can have as many as you want. You always have to have a super projection ( 2 if you are k-safe) and query specific projections are optional. The cost is that when you load the data,  you will have to load it for each projection what it can make your loads slower. But the true is that you do not need to have too many projections, most customer have the supper projections and maybe 2 or 3 more for fact table but those are query specific, so they just use the columns that they need.

    The query specific projections, you can drop them and create new ones, but it means moving the data to the new projection again, so it can take time. You should not need to recreate projections, only if you want to change the sort order or encoding, in that case is a new projection. 

    The documentation should have details about projections. You should not worry that much, it should be transparent for you. If you want to have better performance, just let the DBD chose the best projections definition for you. 

    I hope this answer your questions, 


  • Options
    Can you pls help me understand how Vertica loads data in these scenarios?  Assume that a logical table is created for a large table with no segmentation, no sort order, no encoding.

    1.  when I load data into the table, will vertica segment the data across nodes?  If so, what columns will be used to segment the data?

    2.  will super projection be dropped by system, once query based projections are created?

  • Options
    If you create a table without projections in a big or small table, when you load data the first time Vertica will create super projection segmented by hash key of all the columns in the table and sort order by all the columns in that table. 

    If you create new projections and refresh it, Vertica won't automatically drop the projection, you have to run DROP PROJECTION statement to drop projections that you not longer want. 

  • Options
    Thanks for the quick reply.  In continuation to that,

    what is the best practice for creating a logical table?  what Clauses(seg key, sort order etc) do you recommend adding to the logical key, so that it gives better ETL/QUERY performance?  I believe segmenting and sorting by all keys cause unnecessary overload.  At the same time, I would not have information on what queries are going to be executed against the table after the table is loaded.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file