after materiaize, are the data process in columns based format or row based format?
The data are originally stored in columns based format, and in the plan we often saw materilize, so after this materilize is done, will vertica process the data in row format afterward?
0
Answers
Just look at this explain plan:
------------------------------ QUERY PLAN DESCRIPTION: ------------------------------ explain SELECT sourceip , sum(adrevenue) AS totalrevenue , avg(pagerank) AS pagerank FROM rankings r JOIN ( SELECT sourceip , desturl , adrevenue FROM uservisits uv WHERE uv.visitdate > '1970-01-01' AND uv.visitdate < '1970-02-01' ) nuv ON (r.pageurl = nuv.desturl) GROUP BY sourceip ORDER BY totalrevenue DESC LIMIT 1; Access Path: +-SELECT LIMIT 1 [Cost: 250K, Rows: 1 (NO STATISTICS)] (PATH ID: 0) | Output Only: 1 tuples | +---> SORT [TOPK] [Cost: 250K, Rows: 0 (NO STATISTICS)] (PATH ID: 1) | | Order: sum(uv.adrevenue) DESC | | Output Only: 1 tuples | | +---> GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 250K, Rows: 0 (NO STATISTICS)] (PATH ID: 2) | | | Aggregates: sum(uv.adrevenue), sum_float(r.pagerank), count(r.pagerank) | | | Group By: uv.sourceip | | | +---> JOIN HASH [Cost: 250K, Rows: 0 (NO STATISTICS)] (PATH ID: 3) | | | | Join Cond: (r.pageurl = uv.desturl) | | | | Materialize at Output: r.pagerank | | | | +-- Outer -> STORAGE ACCESS for r [Cost: 187K, Rows: 22M (NO STATISTICS)] (PATH ID: 4) | | | | | Projection: dbadmin.rankings_super | | | | | Materialize: r.pageurl | | | | | Runtime Filter: (SIP1(HashJoin): r.pageurl) | | | | +-- Inner -> SELECT LIMIT 1 [Cost: 0, Rows: 0] (PATH ID: 5)Each of the operators (those you recognise by the fact that they have a PATH ID and an operator type like STORAGE ACCESS, JOIN HASH, GROUPBY HASH, SORT, etc) can have a "Materialize" line , an "Aggregates" line, a "Group By" line.
All the values described like that are materialised , and will exist in a sort of vector of pointers to values, passed on as some sort of messages from one operator to the next (the higher PATH ID-s send their data to the lower PATH ID-s), and, where possible, all operators work in parallel on other intermediate results of the query. I actually do imagine these values as being tuples or rows, as you put it, of intermediate results.