Options

Generating example queries for dbdesigner with multiple fields in SELECT/WHERE/GROUP BY operators?

Hello,

Let's suppose that we have table with 5 fields.
CREATE TABLE mytable (
money int,
sourceA int,
sourceB int, sourceC int  
);
I want vertica to create projections according to my example queries.
Queries are always containing SUM(money) and different combinations of sourceA,sourceB,sourceC fields in SELECT / WHERE and GROUP BY sections. For example, 
SELECT sourceA,sourceB,SUM(money) FROM mytable WHERE sourceC=15 GROUP BY sourceA,sourceB;  SELECT sourceA,sourceB,sourceC SUM(money) FROM mytable WHERE 1 GROUP BY sourceA,sourceB,sourceC;  SELECT sourceC,sourceB,sourceA SUM(money) FROM mytable WHERE 1 GROUP BY sourceC,sourceB,sourceA;
What is the best practice to provide all possible queries? Should I write a script that will generate them? Or maybe it is not worth doing it?

Sorry for my English.

Thank you!


Comments

  • Options
    Navin_CNavin_C Vertica Customer
    Hi,

    It all depends on the number of queries your are going to come up with. You will need to write a script.

    I think the Database designer will comeup with many different projections for a single table, which is not recommended.

    You need to understand optimal projections are good but too many of them can slower your loading to table as it will have many projections.

    Hope this helps

  • Options
    Hi all,

    Actually, I don't think this case would be so bad.  I would expect the DBD to only come up with a few projections here.  (But if you want to know for sure, try it yourself :-)  The DBD takes the layout of your data into consideration when planning projections, so might do something different for your data.)

    The trick is -- yes, you could have projections optimized for each query, and that might be the fastest option; but Vertica is designed to be clever and perform well even if the projections aren't quite what you might think of as optimal.

    For example, the following single projection should handle all three of those queries pretty efficiently:
    create projection myproj(sourceC encoding RLE, sourceA encoding RLE, sourceB encoding RLE, money)  as select sourceC, sourceA, sourceB, money from mytable  order by  sourceC, sourceA, sourceB  segmented by hash(sourceA, sourceB, sourceC) ALL NODES;
    If you ever wanted to group by "sourceA,sourceB" without a predicate on C, you could add one more projection; some variant of:
    create projection myproj  as select * from mytable  order by  sourceA, sourceB, sourceC;
    Adam
  • Options
    If you want to see how Vertica does it, try creating the table and the projection, then run EXPLAIN on each of the three queries.  You don't need to load any data to get a basic EXPLAIN plan; just create the projections.  We do have to scan the whole data set in all cases; that's how these queries are constructed, they're performing a sum across all rows; but the query is local (no network resegmentation to get the data to the right nodes for computation) and pipelined (no memory-intensive, potentially spilling group-by operations).

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file