Calculate median on different grouping sets in Vertica
I'm trying to calculate median on different grouping sets in Vertica and I get the following error:
[0A000][6467] [Vertica]VJDBC ERROR: User defined aggregate cannot be used in query with MLAs"
this is a template of my query which doesn't work:
"""select field_a, field_b, APPROXIMATE_MEDIAN(field_c)
from my_table
group by cube (field_a, field_b)"""
where in this one, when I group by just one field, it works just fine:
"""select field_a, APPROXIMATE_MEDIAN(field_c)
from my_table
group by field_a"""
Would love to get any idea for why this is happening and how can I rewrite my query so it'll work. My next step is to union the calculations per every grouping, but I hope to find a more elegant solution.
Thanks!
tried to search for this error, tried to run on a smaller case, tried to think how can I get the same result using the analytic version of median, but couldn't find anything that supplies my need and can't find the logic behind why it does not work.
Best Answers
-
SruthiA Administrator
When you use cube function, it is a Multi Level aggregation. whereas using just group by is single level aggregation. Using cube with APPROXIMATE_MEDIAN is not a supported feature. Hence it is expected behaviour.
0 -
marcothesane - Select Field - Administrator
You must remember that
GROUP BY CUBE
orGROUP BY ROLLUP
often are multi-pass query access plans, even if you don't code them as such. And it does not hurt, as Vertica's query engine has pipeline parallelism - many parts of the plan are executed by different threads in parallel. So while it might look awkward to see it in code, it's not so un-elegant after all if youUNION SELECT
all levels of aggregation together:WITH indata(fld_a,fld_b,fld_c) AS ( SELECT 1,1, 1 UNION ALL SELECT 1,2, 2 UNION ALL SELECT 1,3, 3 UNION ALL SELECT 1,4, 4 UNION ALL SELECT 1,5, 5 UNION ALL SELECT 2,1, 10 UNION ALL SELECT 2,2, 20 UNION ALL SELECT 2,3, 30 UNION ALL SELECT 3,4, 40 UNION ALL SELECT 3,5, 50 UNION ALL SELECT 3,1,100 UNION ALL SELECT 3,2,200 UNION ALL SELECT 3,3,300 UNION ALL SELECT 3,4,400 UNION ALL SELECT 3,5,500 ) SELECT 'fld_a' AS grain , fld_a , NULL::int AS fld_b , APPROXIMATE_MEDIAN(fld_c) AS app_med_c FROM indata GROUP BY fld_a UNION ALL SELECT 'fld_b' AS grain , NULL::int AS fld_a , fld_b , APPROXIMATE_MEDIAN(fld_c) AS app_med_c FROM indata GROUP BY fld_b UNION ALL SELECT 'both' AS grain , fld_a , fld_b , APPROXIMATE_MEDIAN(fld_c) AS app_med_c FROM indata GROUP BY fld_a,fld_b ORDER BY 1,2,3 ;
grain fld_a fld_b app_med_c both 1 1 1 both 1 2 2 both 1 3 3 both 1 4 4 both 1 5 5 both 2 1 10 both 2 2 20 both 2 3 30 both 3 1 100 both 3 2 200 both 3 3 300 both 3 4 220 both 3 5 275 fld_a 1 (null) 3 fld_a 2 (null) 20 fld_a 3 (null) 200 fld_b (null) 1 10 fld_b (null) 2 20 fld_b (null) 3 30 fld_b (null) 4 40 fld_b (null) 5 50 0