QUERY spending lot of time in operator "ParallelUnion"

Hello All, 

 

I have a complex query  with 2 inner joins and 5 LEFT outer joins on multiple dimension tables.

Fact table size : 70 million rows.

 

-  I created a view with the query.

-  The projections are all pretty good to operate.  

-  Explain plan appears ok.

 

Environment :  Vertica Analytic Database v7.1.2-0

 

 

 

When I  execute the query with a limit 10k rows, the output is generated in 15 seconds.   But when I attempt CTAS  task (create table as select * from view)    , it runs forever.

 

When I look in to execution_engine_profiles it reports highest clock time spent on operator "ParallelUnion". The strange issue is that normal select delivers data as expected but the same does not work on CTAS task.

 

 

Let me know if anyone can shed some light on this operator. 

 

regards,

Comments

  • Hi,

     

    Can you share the plan and tell us in wich path_id of the plan the parallel union takes the longest?

    Eugenia 

  • path id = 4 doing parallelunion

     

    JOIN HASH [LeftOuter] [Cost: 7M, Rows: 76M] (PATH ID: 3) Outer (LOCAL ROUND ROBIN) Inner (BROADCAST)

    | |      Join Cond: (cust.Cust_Id = aggregate.sold_to_cust_id) AND ((customer.sales_org_cd) = aggregate.sales_org_cd)

    | |      Execute on: All Nodes

    | | +-- Outer -> JOIN MERGEJOIN(inputs presorted) [LeftOuter] [Cost: 5M, Rows: 76M] (PATH ID: 4) Inner (RESEGMENT)

    | | |      Join Cond: (aggregate.prod_id = prd_attrb.prod_id)

    | | |      Join Filter: (aggregate.region= 'EURO')

    | | |      Execute on: All Nodes

    | | | +-- Outer -> SELECT [Cost: 4M, Rows: 76M] (PATH ID: 5)

    | | | |      Execute on: All Nodes

    | | | | +---> STORAGE ACCESS for laggregate_table [Cost: 4M, Rows: 76M] (PATH ID: 6)

  • You have a re-segmentation there, what means that the data is sent by network while doing the JOIN. There is much more info that you need to compare, the parallel union may just be stall waiting for data. If the query print the 10k in 15 seconds, maybe is doing it with the local data not the whole set. Have you tried to run the query complete with select count(*) from (<query>)a; or using \o /dev/null?

     

    If you saved the whole profile data have you compare other operators? Sometimes I use a query like this:

     

    SELECT path_id, operator_name, count(distinct operator_id) num_operators, count(distinct node_name) num_nodes , sum(DECODE(counter_name, 'bytes received', counter_value, NULL)) AS 'bytes_received' , sum(DECODE(counter_name, 'bytes sent', counter_value, NULL)) AS 'bytes_sent' , sum(DECODE(counter_name, 'execution time (us)', counter_value, NULL)) AS 'execution_time_us' , sum(DECODE(counter_name, 'rows received', counter_value, NULL)) AS 'rows_received' --, sum(DECODE(counter_name, 'rle rows produced', counter_value, NULL)) AS 'rle_rows_produced' , sum(DECODE(counter_name, 'rows produced', counter_value, NULL)) AS 'rows_produced' , sum(DECODE(counter_name, 'consumer stall (us)', counter_value, NULL)) AS 'consumer_stall' , sum(DECODE(counter_name, 'producer stall (us)', counter_value, NULL)) AS 'producer_stall' --, sum(DECODE(counter_name, 'cumulative size of raw temp data (bytes)', counter_value, NULL)) AS 'cumulative size of raw temp data' from dc_execution_engine_profiles where transaction_id=45035996363026792  and statement_id=3 group by 1, 2

     

    that helps me to compare the different counters and operator and find the issues.

     

    ParelellUnion in that path what is doing is combine the segmented received from the other nodes. If you want to avoid the re-segmentation you need to have the projection equal segmented or replicate one the projections. Hope this make sense , if the bottle neck is there it has nothing to do with the CTAS, but rather with processing the whole data set. I hope this make sense.

     

    Eugenia

     

  • Thanks  a ton Eugenia.

     

    I believe that it is almost impossible to avoid resegmentation of dimension table data.  The fact table's 10 attributes are joined with 10 dimension tables.    The fact table contains 70 million rows and we are doing a left outer join with all dimensions. 

     

    To avoid re-segmentation of dimension data, I can unsegment all the dimension tables < 100k rows. 

     

    I plan to try that out.    

     

    Also I am going to use your profile query mentioned here to compare profile operators.   I think Vertica documentation does not provide a description of each operators and what it means.   The document has description about counter names.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file