# SUM, AVERAGE, COUNT, STDEV across columns?

edited November 2022

is there a concise way in Vertica to select metrics across a set of columns or expressions within a row, analogous to the way that GREATEST and LEAST work?

## Goal

Specifically I'd like efficient reusable function to calculate a sum, average, count, variance across an arbitrary list of expressions for each row:

E.g.

```SELECT
userid,
SUM(a1, a2, a3, a4, a5, a6) AS a_sum,
COUNT(a1, a2, a3, a4, a5, a6) AS a_sum,
AVG(a1, a2, a3, a4, a5, a6) AS a_avg,
VARIANCE(a1, a2, a3, a4, a5, a6) AS a_variance,
FROM table
```

The best I can think of is to write this out atomically like in this suggestion. This is ok for sums, but gets very pedantic for higher things like average and variance esp when there can be null values.
```

## Custom function

I suspect this does not exist in standard SQL or vSQL, and may be best implemented as a custom function.

Q1. Can these be implemented in as user-defined SQL functions? It would be nice to avoid C++. But I don't know how to iterate over a list of arguments

Q2. Is there a library of existing community-provided UDFs to search if this may already exist and/or contribute such function later? I see this GitHub repo: https://github.com/vertica/UDx-Examples/tree/master/Java-and-C++

https://stackoverflow.com/questions/15719557/how-to-compute-the-sum-of-multiple-columns-in-postgresql

Tagged:

edited November 2022

Please consider the following example in pure Vertica SQL.

```CREATE TABLE my_table (userid int, a1 int, a2 int, a3 int, a4 int, a5 int, a6 int) order by userid;

COPY my_table FROM STDIN DELIMITER ',' NULL AS 'NULL' ABORT ON ERROR;
1,1,2,3,4,5,6,
2,7,8,9,10,11,NULL,
3,12,13,14,15,NULL,16,
\.

SELECt * FROM my_table ORDER BY 1;
--  userid | a1 | a2 | a3 | a4 | a5 | a6
-- --------+----+----+----+----+----+----
--       1 |  1 |  2 |  3 |  4 |  5 |  6
--       2 |  7 |  8 |  9 | 10 | 11 |
--       3 | 12 | 13 | 14 | 15 |    | 16
-- (3 rows)

SELECT
userid,
APPLY_COUNT(ARRAY[a1,a2,a3,a4,a5,a6]) AS COUNT,
APPLY_AVG(ARRAY[a1,a2,a3,a4,a5,a6]) AS AVG
FROM my_table
ORDER BY 1;
--  userid | COUNT | AVG
-- --------+-------+-----
--       1 |     6 | 3.5
--       2 |     5 |   9
--       3 |     5 |  14
-- (3 rows)

SELECT EXPLODE(userid,ARRAY[a1,a2,a3,a4,a5,a6]) OVER(PARTITION BEST)
FROM my_table
ORDER BY 1;
--  userid | position | value
-- --------+----------+-------
--       1 |        0 |     1
--       1 |        1 |     2
--       1 |        2 |     3
--       1 |        3 |     4
--       1 |        4 |     5
--       1 |        5 |     6
--       2 |        0 |     7
--       2 |        1 |     8
--       2 |        2 |     9
--       2 |        3 |    10
--       2 |        4 |    11
--       2 |        5 |
--       3 |        0 |    12
--       3 |        1 |    13
--       3 |        2 |    14
--       3 |        3 |    15
--       3 |        4 |
--       3 |        5 |    16
-- (18 rows)

WITH
list1 AS (SELECT EXPLODE(userid,ARRAY[a1,a2,a3,a4,a5,a6]) OVER(PARTITION BEST) FROM my_table)
SELECT userid, SUM(value), COUNT(value), AVG(value), VARIANCE(value), STDDEV(value)
FROM list1
GROUP BY userid
ORDER BY 1;
--  userid | SUM | COUNT | AVG | VARIANCE |      STDDEV
-- --------+-----+-------+-----+----------+------------------
--       1 |  21 |     6 | 3.5 |      3.5 | 1.87082869338697
--       2 |  45 |     5 |   9 |      2.5 | 1.58113883008419
--       3 |  70 |     5 |  14 |      2.5 | 1.58113883008419
-- (3 rows)
```
• Aha, the Collection functions are exactly the concept I was seeking. Thank you!
These work perfectly for this purpose.

Do you know if there is an APPLY_VARIANCE or APPLY_STDDEV or equivalent?

• @mosheg The function to iterate over arrays via `EXPLODE(x) OVER (PARTITION BY ..)` is amazing