Calculate median on different grouping sets in Vertica

danan · March 2023

I'm trying to calculate median on different grouping sets in Vertica and I get the following error:

[0A000][6467] [Vertica]VJDBC ERROR: User defined aggregate cannot be used in query with MLAs"

this is a template of my query which doesn't work:

"""select field_a, field_b, APPROXIMATE_MEDIAN(field_c)

from my_table

group by cube (field_a, field_b)"""

where in this one, when I group by just one field, it works just fine:

"""select field_a, APPROXIMATE_MEDIAN(field_c)

from my_table

group by field_a"""

Would love to get any idea for why this is happening and how can I rewrite my query so it'll work. My next step is to union the calculations per every grouping, but I hope to find a more elegant solution.

Thanks!

tried to search for this error, tried to run on a smaller case, tried to think how can I get the same result using the analytic version of median, but couldn't find anything that supplies my need and can't find the logic behind why it does not work.

SruthiA · March 2023

When you use cube function, it is a Multi Level aggregation. whereas using just group by is single level aggregation. Using cube with APPROXIMATE_MEDIAN is not a supported feature. Hence it is expected behaviour.

marcothesane · March 2023

You must remember that GROUP BY CUBE or GROUP BY ROLLUP often are multi-pass query access plans, even if you don't code them as such. And it does not hurt, as Vertica's query engine has pipeline parallelism - many parts of the plan are executed by different threads in parallel. So while it might look awkward to see it in code, it's not so un-elegant after all if you UNION SELECT all levels of aggregation together:

WITH
indata(fld_a,fld_b,fld_c) AS (
            SELECT 1,1,  1
  UNION ALL SELECT 1,2,  2
  UNION ALL SELECT 1,3,  3
  UNION ALL SELECT 1,4,  4
  UNION ALL SELECT 1,5,  5
  UNION ALL SELECT 2,1, 10
  UNION ALL SELECT 2,2, 20
  UNION ALL SELECT 2,3, 30
  UNION ALL SELECT 3,4, 40
  UNION ALL SELECT 3,5, 50
  UNION ALL SELECT 3,1,100
  UNION ALL SELECT 3,2,200
  UNION ALL SELECT 3,3,300
  UNION ALL SELECT 3,4,400
  UNION ALL SELECT 3,5,500
) 
SELECT
  'fld_a' AS grain
, fld_a
, NULL::int   AS fld_b
, APPROXIMATE_MEDIAN(fld_c) AS app_med_c
FROM indata
GROUP BY fld_a
UNION ALL
SELECT
  'fld_b'     AS grain
, NULL::int   AS fld_a
, fld_b
, APPROXIMATE_MEDIAN(fld_c) AS app_med_c
FROM indata
GROUP BY fld_b
UNION ALL
SELECT
  'both' AS grain
, fld_a
, fld_b
, APPROXIMATE_MEDIAN(fld_c) AS app_med_c
FROM indata
GROUP BY fld_a,fld_b
ORDER BY 1,2,3
;

grain	fld_a	fld_b	app_med_c
both	1	1	1
both	1	2	2
both	1	3	3
both	1	4	4
both	1	5	5
both	2	1	10
both	2	2	20
both	2	3	30
both	3	1	100
both	3	2	200
both	3	3	300
both	3	4	220
both	3	5	275
fld_a	1	(null)	3
fld_a	2	(null)	20
fld_a	3	(null)	200
fld_b	(null)	1	10
fld_b	(null)	2	20
fld_b	(null)	3	30
fld_b	(null)	4	40
fld_b	(null)	5	50

Calculate median on different grouping sets in Vertica

Best Answers