after materiaize, are the data process in columns based format or row based format?

The data are originally stored in columns based format, and in the plan we often saw materilize, so after this materilize is done, will vertica process the data in row format afterward?

Answers

  • marcothesanemarcothesane - Select Field - Administrator

    Just look at this explain plan:

    ------------------------------ 
    QUERY PLAN DESCRIPTION: 
    ------------------------------
    
    explain
    SELECT 
      sourceip
    , sum(adrevenue) AS totalrevenue
    , avg(pagerank) AS pagerank
    FROM rankings r
    JOIN (
       SELECT sourceip
       , desturl
       , adrevenue
       FROM uservisits uv
       WHERE uv.visitdate > '1970-01-01'
         AND uv.visitdate < '1970-02-01'
    ) nuv ON (r.pageurl = nuv.desturl)
    GROUP BY sourceip
    ORDER BY totalrevenue DESC
    LIMIT 1;
    
    Access Path:
    +-SELECT  LIMIT 1 [Cost: 250K, Rows: 1 (NO STATISTICS)] (PATH ID: 0)
    |  Output Only: 1 tuples
    | +---> SORT [TOPK] [Cost: 250K, Rows: 0 (NO STATISTICS)] (PATH ID: 1)
    | |      Order: sum(uv.adrevenue) DESC
    | |      Output Only: 1 tuples
    | | +---> GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 250K, Rows: 0 (NO STATISTICS)] (PATH ID: 2)
    | | |      Aggregates: sum(uv.adrevenue), sum_float(r.pagerank), count(r.pagerank)
    | | |      Group By: uv.sourceip
    | | | +---> JOIN HASH [Cost: 250K, Rows: 0 (NO STATISTICS)] (PATH ID: 3)
    | | | |      Join Cond: (r.pageurl = uv.desturl)
    | | | |      Materialize at Output: r.pagerank
    | | | | +-- Outer -> STORAGE ACCESS for r [Cost: 187K, Rows: 22M (NO STATISTICS)] (PATH ID: 4)
    | | | | |      Projection: dbadmin.rankings_super
    | | | | |      Materialize: r.pageurl
    | | | | |      Runtime Filter: (SIP1(HashJoin): r.pageurl)
    | | | | +-- Inner -> SELECT  LIMIT 1 [Cost: 0, Rows: 0] (PATH ID: 5)
    

    Each of the operators (those you recognise by the fact that they have a PATH ID and an operator type like STORAGE ACCESS, JOIN HASH, GROUPBY HASH, SORT, etc) can have a "Materialize" line , an "Aggregates" line, a "Group By" line.
    All the values described like that are materialised , and will exist in a sort of vector of pointers to values, passed on as some sort of messages from one operator to the next (the higher PATH ID-s send their data to the lower PATH ID-s), and, where possible, all operators work in parallel on other intermediate results of the query. I actually do imagine these values as being tuples or rows, as you put it, of intermediate results.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file