analytical function degrading the performance

I have a query which takes more than 3 hours .. the main fact table is written in a 3 level inner query.

In one sub query we use lag function with more than 20 columns as partition clause and same columns in order by of partiotion.

This is the bottle neck I see in explainplan as well as I query by pieces and stuck here.

How to deal with this scenario. Can a projection with the columns in analytical partition help me?

Comments

  • EXPLAIN PLAN FOR LAG:

     

    Access Path:
    LDISTRIB_UNSEGMENTED
    +-ANALYTICAL [Cost: 8192908.000000, Rows: 6719672.000000 Disk(B): 35627700944.000000 CPU(B): 1321764404451.936768 Memory(B): 112574665016.000000 Netwrk(B): 76946964072.000000 Parallelism: 12.000000 (NO STATISTICS)] [OutRowSz (B): 11451] (PATH ID: 1)
    | Analytic Group
    | Functions: lag()
    | Group Global & Local Resegment: upper(IFF.SNN), upper(IFF.PNN), upper(IFF.CTT), upper(IFF.CNTSS), upper(IFF.CHRM), upper(IFF.feed), upper(IFF.DOC), upper(IFF.IPBIN), upper(IFF.OPBIN), upper(IFF.CONFAC), upper(IFF.MEDSIZ), upper(IFF.JOB), upper(IFF.PLEX), upper(IFF.LTR), upper(IFF.unit), upper(IFF.IMG), upper(IFF.PRQULI), upper(IFF.media_key), upper(IFF."position"), upper(IFF.SEL), upper(IFF.INK), upper(IFF.CATR)
    | Group Sort: upper(IFF.SNN) ASC, upper(IFF.PNN) ASC, upper(IFF.CTT) ASC, upper(IFF.CNTSS) ASC, upper(IFF.CHRM) ASC, upper(IFF.feed) ASC, upper(IFF.DOC) ASC, upper(IFF.IPBIN) ASC, upper(IFF.OPBIN) ASC, upper(IFF.CONFAC) ASC, upper(IFF.MEDSIZ) ASC, upper(IFF.JOB) ASC, upper(IFF.PLEX) ASC, upper(IFF.LTR) ASC, upper(IFF.unit) ASC, upper(IFF.IMG) ASC, upper(IFF.PRQULI) ASC, upper(IFF.media_key) ASC, upper(IFF."position") ASC, upper(IFF.SEL) ASC, upper(IFF.INK) ASC, upper(IFF.CATR) ASC, upper(IFF.SNN) ASC NULLS LAST, upper(IFF.PNN) ASC NULLS LAST, upper(IFF.CTT) ASC NULLS LAST, upper(IFF.CNTSS) ASC NULLS LAST, upper(IFF.CHRM) ASC NULLS LAST, upper(IFF.feed) ASC NULLS LAST, upper(IFF.DOC) ASC NULLS LAST, upper(IFF.IPBIN) ASC NULLS LAST, upper(IFF.OPBIN) ASC NULLS LAST, upper(IFF.CONFAC) ASC NULLS LAST, upper(IFF.MEDSIZ) ASC NULLS LAST, upper(IFF.JOB) ASC NULLS LAST, upper(IFF.PLEX) ASC NULLS LAST, upper(IFF.LTR) ASC NULLS LAST, upper(IFF.unit) ASC NULLS LAST, upper(IFF.IMG) ASC NULLS LAST, upper(IFF.PRQULI) ASC NULLS LAST, upper(IFF.media_key) ASC NULLS LAST, upper(IFF."position") ASC NULLS LAST, upper(IFF.SEL) ASC NULLS LAST, upper(IFF.INK) ASC NULLS LAST, upper(IFF.CATR) ASC NULLS LAST, IFF.capture_dtm ASC NULLS LAST, IFF.value_float DESC NULLS FIRST
    | Execute on: All Nodes
    | LDISTRIB_SEGMENTED
    | Execute on: All Nodes
    | LDISTRIB_UNSEGMENTED
    | +---> UNION ALL [Cost: 46025.000000, Rows: 6719672.000000 Disk(B): 0.000000 CPU(B): 0.000000 Memory(B): 0.000000 Netwrk(B): 0.000000 Parallelism: 1.000000 (NO STATISTICS)] [OutRowSz (B): 2651] (PATH ID: 3)
    | | Execute on: All Nodes
    | | LDISTRIB_UNSEGMENTED

  • Hi ,

    Analytic functions in some cases can be very slow comparing to regular aggregation functions ,  as  many times  Analytic functions run’s a single threads and for large data set is the main bottleneck .

    Projections can improved the sorting part , but will not change the single threads executions . As you do not  attach the query its hard to advice how to advice you the write the query

     

    I hope you will find it helpful

     

    Thanks  

  • SELECT LAG (value_f ,1) OVER (
    PARTITION BY UPPER (sn),
    UPPER (pn),
    UPPER (ct),
    UPPER (csc),
    UPPER (chrm),
    UPPER (feed),
    UPPER (df),
    UPPER (ipbin),
    UPPER (opbin),
    UPPER (cf),
    UPPER (msn),
    UPPER (jb),
    UPPER (pr),
    UPPER (lse),
    UPPER (unit),
    UPPER (image_type),
    UPPER (pq),
    UPPER (mk),
    UPPER (position),
    UPPER (sen),
    UPPER (iu),
    UPPER (cpn)
    ORDER BY
    UPPER (sn),
    UPPER (pn),
    UPPER (ct),
    UPPER (csc),
    UPPER (chrm),
    UPPER (feed),
    UPPER (df),
    UPPER (ipbin),
    UPPER (opbin),
    UPPER (cf),
    UPPER (msn),
    UPPER (jb),
    UPPER (pr),
    UPPER (lse),
    UPPER (unit),
    UPPER (image_type),
    UPPER (pq),
    UPPER (mk),
    UPPER (position),
    UPPER (sen),
    UPPER (iu),
    UPPER (cpn),
    capture_d ASC,
    value_f DESC)pr_f ,gi_fact.*
    FROM (SELECT gi.file_name,
    dev.bornd,
    gi.ticket_type,
    gi.gi_flag,
    gi.ca_status,
    gi.fqi,
    dev.sn,
    dev.pn,
    gi.ct,
    gi.csc,
    gi.chromatic_mode,
    gi.feed,
    gi.df,
    gi.ipbin,
    gi.opbin,
    gi.cf,
    gi.msn,
    gi.jb,
    gi.pr,
    gi.lse,
    gi.unit,
    gi.image_type,
    gi.pq,
    gi.mk,
    gi."position",
    gi.sen,
    TRIM(gi.iu) iu,
    gi.cpn,
    gi.capture_d ,
    gi.value_f
    from STG_DEV_IMP gi, STG_DEV_INFO dev
    WHERE gi.fn = dev.fn
    UNION ALL
    SELECT null fn,
    bornd,
    ticket_type_desc as ticket_type,
    gi_flag,
    ca_status,
    fqi,
    sn,
    pn,
    ct,
    csc,
    chrm,
    feed,
    df,
    ipbin,
    opbin,
    cf,
    msn,
    jb,
    pr,
    lse,
    unit,
    image_type,
    pq,
    mk,
    "position",
    sen,
    TRIM(iu) iu,
    catr AS cpn,
    capture_d,
    GLB_im as value_float
    from FACT_GL_IMP WHERE sn||pn IN (SELECT DISTINCT im.sn || im.pn FROM STG_DEV_INFO im) )gi_fact

  • FACT_GL_IMP table is huge Fact table with morethan 1.5 TB of storage space.

  • How does the grouped clause for creating a query specific projection like below help me.

    CREATE PROJECTION  perf_tem_per2_grp /*+createtype(L)*/
    (grouped(na_id,name))
    AS
    SELECT perf_tem.name,perf_tem.na_id
    FROM PPS_DEVICE_QARI.perf_tem
    ORDER BY perf_tem.na_id,perf_tem.name
    SEGMENTED BY hash(perf_tem.na_id, perf_tem.name) ALL NODES KSAFE 1;

  • If you execute this often and want to optimize for it with a projection, you need to consider a couple of things:

     

    - You are using UPPER() on a number of columns.  You should "fix" your data model so that the data is normalized on the way in so that UPPER() isn't needed.  Such expressions become expensive if they are always needed on billions of rows.  Another option is an expression projection, but you should really start with a clean data model.

     

    - Projections can only optimize data sorted in ascending order.  One of your columns is sorted descending.  If it's possible to rewrite the query so that all data is sorted ascending, and use LEAD() instead of LAG(), then you could optimize using a projection.

     

      --Sharon

     

  • How does  expression projection work?

    If I create a projection with Upper of columns and use the query as it is will the expression projection be picked?

     

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.