Query planning & performance when using views

Antoine_Niek · June 2013

Hi, In order to test our reporting application backed by vertica, we are using a staging schema composed of views of our production schema. Our production schema is composed of a big fact table and multiple dimension tables to join against. Each of these tables are represented as views in the staging schema. We have noticed for some queries that the performance when hitting the staging schema were up to 6 time worse than when hitting the production schema due to some different query plans. My understanding is that a vertica view is simply a subselect, should I expect the same query plans for the same queries ? Here is an example of query:

select country, region, city from delivery left join tag_info on delivery.tag_id = tag_info.id where tag_info.buyer_id = 2 group by delivery.country, delivery.region, delivery.city order by country, region, city asc limit 10;

Query plan on production env:

  Access Path:   +-SELECT  LIMIT 10 [Cost: 19M, Rows: 10] (PATH ID: 0)   |  Output Only: 10 tuples   |  Execute on: Query Initiator   | +---> GROUPBY HASH (SORT OUTPUT) (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 19M, Rows: 6K] (PATH ID:  2)   | |      Group By: this_.country, this_.region, this_.city   | |      Execute on: All Nodes   | | +---> JOIN HASH [Cost: 409K, Rows: 290M] (PATH ID: 3)   | | |      Join Cond: (this_.tag_id = tag1_.id)   | | |      Materialize at Output: this_.country, this_.region, this_.city   | | |      Execute on: All Nodes   | | | +-- Outer -> STORAGE ACCESS for this_ [Cost: 280K, Rows: 350M] (PATH ID: 4)   | | | |      Projection: rtb_production.delivery_DBD_1_seg_optimized_v5_b0   | | | |      Materialize: this_.tag_id   | | | |      Execute on: All Nodes   | | | |      Runtime Filter: (SIP1(HashJoin): this_.tag_id)   | | | +-- Inner -> STORAGE ACCESS for tag1_ [Cost: 249, Rows: 1K] (PATH ID: 5)   | | | |      Projection: rtb_production.tag_info_DBD_1_rep_optimized_node0001   | | | |      Materialize: tag1_.id   | | | |      Filter: (tag1_.buyer_id = 2)   | | | |      Execute on: All Nodes

Query plan on staging env:

   Access Path:   +-SELECT  LIMIT 10 [Cost: 22M, Rows: 10] (PATH ID: 0)   |  Output Only: 10 tuples   |  Execute on: Query Initiator   | +---> GROUPBY HASH (SORT OUTPUT) (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 22M, Rows: 10K] (PATH ID:   2)   | |      Group By: this_.country, this_.region, this_.city   | |      Execute on: All Nodes   | | +---> JOIN HASH [Cost: 3M, Rows: 290M] (PATH ID: 3)   | | |      Join Cond: (this_.tag_id = tag1_.id)   | | |      Execute on: All Nodes   | | | +-- Outer -> SELECT [Cost: 2M, Rows: 350M] (PUSHED GROUPING) (PATH ID: 4)   | | | |      Execute on: All Nodes   | | | | +---> STORAGE ACCESS for delivery [Cost: 2M, Rows: 350M] (PATH ID: 5)   | | | | |      Projection: rtb_production.delivery_DBD_1_seg_optimized_v5_b0   | | | | |      Materialize: delivery.country, delivery.city, delivery.tag_id, delivery.region   | | | | |      Execute on: All Nodes   | | | | |      Runtime Filter: (SIP1(HashJoin): this_.tag_id)   | | | +-- Inner -> STORAGE ACCESS for tag1_ [Cost: 249, Rows: 1K] (PATH ID: 6)   | | | |      Projection: rtb_production.tag_info_DBD_1_rep_optimized_node0001   | | | |      Materialize: tag1_.id   | | | |      Filter: (tag1_.buyer_id = 2)   | | | |      Execute on: All Nodes

What are we doing wrong ? Thank you for your help. Antoine

We're Moving!

Create My New Community Account Now

Query planning & performance when using views

Leave a Comment

We're Moving!

Create My New Community Account Now

Query planning &amp; performance when using views

Leave a Comment

Query planning & performance when using views