Options

How to speed up multiple join query

HyeontaeJuHyeontaeJu Vertica Customer

The Query is this
SELECT
*
FROM
products a
LEFT OUTER JOIN products_image b ON
a.product_id = b.product_id
LEFT OUTER JOIN ws_product_brand_map c ON
a.product_id = c.product_id
LEFT OUTER JOIN ws_product_category_map d ON
a.product_id = d.product_id
LEFT OUTER JOIN ws_category e ON
d.cate_mst_cd = e.master_code

and query plan is this
Access Path:
+-JOIN MERGEJOIN(inputs presorted) [LeftOuter] [Cost: 17M, Rows: 44M] (PATH ID: 1)
| Join Cond: (a.product_id = d.product_id)
| +-- Outer -> JOIN MERGEJOIN(inputs presorted) [LeftOuter] [Cost: 4M, Rows: 44M] (PATH ID: 2)
| | Join Cond: (a.product_id = b.product_id)
| | Materialize at Input: a.product_id
| | +-- Outer -> JOIN MERGEJOIN(inputs presorted) [LeftOuter] [Cost: 3M, Rows: 44M] (PATH ID: 3)
| | | Join Cond: (a.product_id = c.product_id)
| | | +-- Outer -> STORAGE ACCESS for a [Cost: 460K, Rows: 44M] (PATH ID: 4)
| | | | Projection: newbiz_product_dev.products_super
| | | | Materialize: a.product_id
| | | +-- Inner -> STORAGE ACCESS for c [Cost: 2M, Rows: 44M] (PATH ID: 5)
| | | | Projection: newbiz_product_dev.ws_product_brand_map_super
| | | | Materialize: c.product_id, c.mall_id, c.pipe_id, c.brand_id, c.domain_brand, c.brand, c.brand_status, c.brand_ctrl_disp_yn, c.reg_user, c.reg_dt, c.mod_user, c.mod_dt
| | +-- Inner -> STORAGE ACCESS for b [Cost: 474K, Rows: 4M] (PATH ID: 6)
| | | Projection: newbiz_product_dev.products_image_super
| | | Materialize: b.product_id, b.mall_id, b.pipe_id, b.image_link, b.cdn_image_path, b.cdn_image_name, b.cdn_image_name_orig, b.image_status, b.clustered_yn, b.reg_images, b.file_checksum, b.collect_time, b.registerd_time, b.update_time
| +-- Inner -> JOIN HASH [LeftOuter] [Cost: 65K, Rows: 32M] (PATH ID: 7)
| | Join Cond: (d.cate_mst_cd = e.master_code)
| | +-- Outer -> STORAGE ACCESS for d [Cost: 40K, Rows: 32M] (PATH ID: 8)
| | | Projection: newbiz_product_dev.ws_product_category_map_super
| | | Materialize: d.cate_mst_cd
| | +-- Inner -> STORAGE ACCESS for e [Cost: 482, Rows: 5K] (PATH ID: 9)
| | | Projection: newbiz_product_dev.ws_category_super
| | | Materialize: e.master_code, e."1depth_code", e."1depth_name", e."2depth_code", e."2depth_name", e."3depth_code", e."3depth_name", e."4depth_code", e."4depth_name", e."5depth_code", e."5depth_name", e.category_text, e.use_yn, e.mod_date

plz help me

Answers

  • Options
    Navin_CNavin_C Vertica Customer

    Based on the query and plan (along with statistics)
    Some generic optimization rules .
    1. Always reduce the data before any joins as much possible.
    2. When joining, make sure smaller tables are on the left side of join syntax, which makes this data set to be in memory / broadcasted to all the vertica nodes and makes join faster.
    3. Join on INT columns, preferred over any other types, it makes it faster.
    4. Select on the columns which are really needed. -- Reduce materializing of columns
    5. Make sure stats are updated on the table.

    In this case, one of your joins is assuming the huge table in the INNER part of join, which can ask for more memory and reduce the execution speed.
    -- Outer -> JOIN MERGEJOIN(inputs presorted) [LeftOuter] [Cost: 3M, Rows: 44M] (PATH ID: 3)
    | | | Join Cond: (a.product_id = c.product_id)
    | | | +-- Outer -> STORAGE ACCESS for a [Cost: 460K, Rows: 44M] (PATH ID: 4)
    | | | | Projection: newbiz_product_dev.products_super
    | | | | Materialize: a.product_id
    | | | +-- Inner -> STORAGE ACCESS for c [Cost: 2M, Rows: 44M] (PATH ID: 5)
    | | | | Projection: newbiz_product_dev.ws_product_brand_map_super
    | | | | Materialize: c.product_id, c.mall_id, c.pipe_id, c.brand_id, c.domain_brand, c.brand, c.brand_status, c.brand_ctrl_disp_yn, c.reg_user, c.reg_dt, c.mod_user, c.mod_dt

  • Options
    HyeontaeJuHyeontaeJu Vertica Customer

    @Navin_C Thank you very much,,, all table's data is very big,,
    products : 40,000,000
    ws_product_brand_map : 40,000,000
    ws_product_category_map : 40,000,000
    ws_category = 4,000
    so,, In this case How can i reduce the response time ??

  • Options
    SankarmnSankarmn Community Edition User ✭✭

    @HyeontaeJu , Optimal performance depends in great part on the projections that are available for a given query. Check best practices for joins, https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AnalyzingData/Optimizations/OptimizingJOINQueries.htm?tocpath=Analyzing Data|Query Optimization|JOIN Queries|_____0

  • Options
    marcothesanemarcothesane - Select Field - Administrator

    You have a great deal of MERGEJOIN operators - this is very good and can't be beaten.

    None of the JOIN operators show RESEGMENT or BROADCAST or ROUND-ROBIN, so you have segmented your table's data well.

    But:
    What are you running your query with?
    According to what the statistics say, you are fetching 44 million rows from Vertica to the client.

    To avoid the data fetching process, just try:

    CREATE LOCAL TEMPORARY TABLE dropme
    ON COMMIT PRESERVE ROWS AS
    -- your full query
    SELECT
    *
    FROM
    products a
    LEFT OUTER JOIN products_image b ON
    a.product_id = b.product_id
    LEFT OUTER JOIN ws_product_brand_map c ON
    a.product_id = c.product_id
    LEFT OUTER JOIN ws_product_category_map d ON
    a.product_id = d.product_id
    LEFT OUTER JOIN ws_category e ON
    d.cate_mst_cd = e.master_code
    -- your full query ends here
    KSAFE 0
    ;
    

    .. and measure the time when the data does not have to be prepared for sending to client, and the client has no data to fetch.

    You seriously risk to test the network speed rather than the database.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file