Summing parts of a column
yaronkoral
Community Edition User ✭
I have a column which stores multiple vertical bar separated values.
For example:
40|30|20
70
100|20|1|1|1|10
I would like to get a sum of all parts of the string:
90
70
133
Is there a way to do it in Vertica?
0
Answers
@yaronkoral : Please find the solution below. You need to use another column from your table to sum up the results of a column of a particular row
dbadmin=> select * from test_f;
a | b
-----------------+---
40|30|20 | 1
70 | 2
100|20|1|1|1|10 | 3
(3 rows)
dbadmin=> select b, sum(a) from (SELECT b,v_txtindex.stringTokenizerDelim(a, '|') OVER(partition by b order by b) as a FROM test_f order by 1,2) g group by 1;
b | sum
---+-----
1 | 90
2 | 70
3 | 133
(3 rows)
dbadmin=>
Thank you @SruthiA, works perfectly!
@yaronkoral - Just for fun, here is another route via a User Defined SQL Function:
verticademos=> SELECT * FROM test_f; a ----------------- 40|30|20 70 100|20|1|1|1|10 (3 rows) verticademos=> CREATE OR REPLACE FUNCTION sumvalue (x VARCHAR) RETURN INT verticademos-> AS verticademos-> BEGIN verticademos-> RETURN verticademos-> DECODE(split_part(x, '|', 1), '', '0', split_part(x, '|', 1)) + verticademos-> DECODE(split_part(x, '|', 2), '', '0', split_part(x, '|', 2)) + verticademos-> DECODE(split_part(x, '|', 3), '', '0', split_part(x, '|', 3)) + verticademos-> DECODE(split_part(x, '|', 4), '', '0', split_part(x, '|', 4)) + verticademos-> DECODE(split_part(x, '|', 5), '', '0', split_part(x, '|', 5)) + verticademos-> DECODE(split_part(x, '|', 6), '', '0', split_part(x, '|', 6)) + verticademos-> DECODE(split_part(x, '|', 7), '', '0', split_part(x, '|', 7)) + verticademos-> DECODE(split_part(x, '|', 8), '', '0', split_part(x, '|', 8)) + verticademos-> DECODE(split_part(x, '|', 9), '', '0', split_part(x, '|', 9)) + verticademos-> DECODE(split_part(x, '|', 10), '', '0', split_part(x, '|', 10)); verticademos-> END; CREATE FUNCTION verticademos=> SELECT a, sumvalue(a) sum FROM test_f; a | sum -----------------+----- 40|30|20 | 90 70 | 70 100|20|1|1|1|10 | 133 (3 rows)If you have to handle more that 10 values to sum, you'd have to add more SPLIT_PART calls
Or yet another two:
a) convert to array and explode:
-- need an identifier to link back to the row ... WITH indata(id,lists) AS ( SELECT 1, '40|30|20' UNION ALL SELECT 2, '100|20|1|1|1|10' ) , pivoted AS ( SELECT EXPLODE( id , lists , STRING_TO_ARRAY( '['||REPLACE(lists,'|',',')||']' ) ) OVER(PARTITION BEST) AS (id,lists,idx,summand) FROM indata ) -- test query ... -- SELECT * FROM pivoted; -- out id | lists | idx | summand -- out ----+-----------------+-----+--------- -- out 2 | 100|20|1|1|1|10 | 0 | 100 -- out 2 | 100|20|1|1|1|10 | 1 | 20 -- out 2 | 100|20|1|1|1|10 | 2 | 1 -- out 2 | 100|20|1|1|1|10 | 3 | 1 -- out 2 | 100|20|1|1|1|10 | 4 | 1 -- out 2 | 100|20|1|1|1|10 | 5 | 10 -- out 1 | 40|30|20 | 0 | 40 -- out 1 | 40|30|20 | 1 | 30 -- out 1 | 40|30|20 | 2 | 20 SELECT id , lists , SUM(summand) As the_sum_per_row FROM pivoted GROUP BY 1,2; -- out id | lists | the_sum_per_row -- out ----+-----------------+----------------- -- out 2 | 100|20|1|1|1|10 | 133 -- out 1 | 40|30|20 | 90b) CROSS JOIN with an in-line index table ...
WITH indata(id,lists) AS ( SELECT 1, '40|30|20' UNION ALL SELECT 2, '100|20|1|1|1|10' ) , -- need an index of integers,here ... i(i) AS ( SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 ) , pivoted AS ( SELECT id , lists , i AS idx , SPLIT_PART(lists,'|',i+1)::INT AS summand FROM indata CROSS JOIN i WHERE SPLIT_PART(lists,'|',i+1) <>'' ORDER BY id,idx ) -- control query ... -- SELECT * FROM pivoted; -- out id | lists | idx | summand -- out ----+-----------------+-----+--------- -- out 1 | 40|30|20 | 0 | 40 -- out 1 | 40|30|20 | 1 | 30 -- out 1 | 40|30|20 | 2 | 20 -- out 2 | 100|20|1|1|1|10 | 0 | 100 -- out 2 | 100|20|1|1|1|10 | 1 | 20 -- out 2 | 100|20|1|1|1|10 | 2 | 1 -- out 2 | 100|20|1|1|1|10 | 3 | 1 -- out 2 | 100|20|1|1|1|10 | 4 | 1 -- out 2 | 100|20|1|1|1|10 | 5 | 10 SELECT id , lists , SUM(summand) As the_sum_per_row FROM pivoted GROUP BY 1,2; -- out id | lists | the_sum_per_row -- out ----+-----------------+----------------- -- out 1 | 40|30|20 | 90 -- out 2 | 100|20|1|1|1|10 | 133