Calculate median on different grouping sets in Vertica

danandanan Vertica Customer
edited March 2023 in General Discussion

I'm trying to calculate median on different grouping sets in Vertica and I get the following error:

[0A000][6467] [Vertica]VJDBC ERROR: User defined aggregate cannot be used in query with MLAs"

this is a template of my query which doesn't work:

"""select field_a, field_b, APPROXIMATE_MEDIAN(field_c)

from my_table

group by cube (field_a, field_b)"""

where in this one, when I group by just one field, it works just fine:

"""select field_a, APPROXIMATE_MEDIAN(field_c)

from my_table

group by field_a"""

Would love to get any idea for why this is happening and how can I rewrite my query so it'll work. My next step is to union the calculations per every grouping, but I hope to find a more elegant solution.

Thanks!

tried to search for this error, tried to run on a smaller case, tried to think how can I get the same result using the analytic version of median, but couldn't find anything that supplies my need and can't find the logic behind why it does not work.

Best Answers

  • SruthiASruthiA Vertica Employee Administrator
    Answer ✓

    When you use cube function, it is a Multi Level aggregation. whereas using just group by is single level aggregation. Using cube with APPROXIMATE_MEDIAN is not a supported feature. Hence it is expected behaviour.

  • marcothesanemarcothesane - Select Field - Administrator
    Answer ✓

    You must remember that GROUP BY CUBE or GROUP BY ROLLUP often are multi-pass query access plans, even if you don't code them as such. And it does not hurt, as Vertica's query engine has pipeline parallelism - many parts of the plan are executed by different threads in parallel. So while it might look awkward to see it in code, it's not so un-elegant after all if you UNION SELECT all levels of aggregation together:

    WITH
    indata(fld_a,fld_b,fld_c) AS (
                SELECT 1,1,  1
      UNION ALL SELECT 1,2,  2
      UNION ALL SELECT 1,3,  3
      UNION ALL SELECT 1,4,  4
      UNION ALL SELECT 1,5,  5
      UNION ALL SELECT 2,1, 10
      UNION ALL SELECT 2,2, 20
      UNION ALL SELECT 2,3, 30
      UNION ALL SELECT 3,4, 40
      UNION ALL SELECT 3,5, 50
      UNION ALL SELECT 3,1,100
      UNION ALL SELECT 3,2,200
      UNION ALL SELECT 3,3,300
      UNION ALL SELECT 3,4,400
      UNION ALL SELECT 3,5,500
    ) 
    SELECT
      'fld_a' AS grain
    , fld_a
    , NULL::int   AS fld_b
    , APPROXIMATE_MEDIAN(fld_c) AS app_med_c
    FROM indata
    GROUP BY fld_a
    UNION ALL
    SELECT
      'fld_b'     AS grain
    , NULL::int   AS fld_a
    , fld_b
    , APPROXIMATE_MEDIAN(fld_c) AS app_med_c
    FROM indata
    GROUP BY fld_b
    UNION ALL
    SELECT
      'both' AS grain
    , fld_a
    , fld_b
    , APPROXIMATE_MEDIAN(fld_c) AS app_med_c
    FROM indata
    GROUP BY fld_a,fld_b
    ORDER BY 1,2,3
    ;
    
    grain fld_a fld_b app_med_c
    both 1 1 1
    both 1 2 2
    both 1 3 3
    both 1 4 4
    both 1 5 5
    both 2 1 10
    both 2 2 20
    both 2 3 30
    both 3 1 100
    both 3 2 200
    both 3 3 300
    both 3 4 220
    both 3 5 275
    fld_a 1 (null) 3
    fld_a 2 (null) 20
    fld_a 3 (null) 200
    fld_b (null) 1 10
    fld_b (null) 2 20
    fld_b (null) 3 30
    fld_b (null) 4 40
    fld_b (null) 5 50
This discussion has been closed.