Need to split multiple column which contains pipeline seperated data in to multiple rows
Hello,
My data is stored in the below format:
ID VALUE_1 VALUE_2
1|2 20|30 40|50
Expected format:
ID VALUE_1 VALUE_2
1 20 40
2 30 50
Can anyone help me on this as i am stucked with the above.
Thanks,
Suhrid Ghosh
0
Answers
This might be easiest to solve with a pre-processor, either an external script or a UDx that explodes each row.
Current Vertica supports ARRAY type, so you could load data like this, but need to edit array and record delimiters:
dbadmin=> create table array3 (a1 array[int], a2 array[int], a3 array[int]);
dbadmin=> copy array3 from stdin;
1,2|20,30|40,50
dbadmin=> select * from array3;
a1 | a2 | a3
-------+---------+---------
[1,2] | [20,30] | [40,50]
Then use array functions like position access and EXPLODE to expand each row.
@Suhrid_R__Ghosh - Here is a an example per @Bryan_H 's great suggestion!
dbadmin=> CREATE TABLE t_temp (id ARRAY[INT], value1 ARRAY[INT], value2 ARRAY[INT]); CREATE TABLE dbadmin=> \! cat /home/dbadmin/t.txt ID VALUE_1 VALUE_2 1|2 20|30 40|50 dbadmin=> COPY t_temp FROM '/home/dbadmin/t.txt' DELIMITER ' ' COLLECTIONDELIMITER '|' SKIP 1; Rows Loaded ------------- 1 (1 row) dbadmin=> SELECT * FROM t_temp; id | value1 | value2 -------+---------+--------- [1,2] | [20,30] | [40,50] (1 row) dbadmin=> SELECT id, value1, value2 FROM dbadmin-> (SELECT explode(id) OVER (PARTITION BEST) AS (pos, id) FROM t_temp) a dbadmin-> JOIN dbadmin-> (SELECT explode(value1) OVER (PARTITION BEST) AS (pos, value1) FROM t_temp) b USING (pos) dbadmin-> JOIN dbadmin-> (SELECT explode(value2) OVER (PARTITION BEST) AS (pos, value2) FROM t_temp) c USING (pos); id | value1 | value2 ----+--------+-------- 1 | 20 | 20 2 | 40 | 50 (2 rows)If you don't have Vertica 10, here's another possibility:
dbadmin=> CREATE TABLE t (id INT, value_1 INT, value_2 INT); CREATE TABLE dbadmin=> \! cat /home/dbadmin/t.txt ID VALUE_1 VALUE_2 1|2 20|30 40|50 dbadmin=> COPY t(data_f FILLER VARCHAR, id AS (split_part(split_part(data_f, ' ', 1), '|', 1))::INT, value_1 AS (split_part(split_part(data_f, ' ', 2), '|', 1))::INT, value_2 AS (split_part(split_part(data_f, ' ', 3), '|', 1))::INT) FROM '/home/dbadmin/t.txt' DELIMITER '.' SKIP 1; Rows Loaded ------------- 1 (1 row) dbadmin=> COPY t(data_f FILLER VARCHAR, id AS (split_part(split_part(data_f, ' ', 1), '|', 2))::INT, value_1 AS (split_part(split_part(data_f, ' ', 2), '|', 2))::INT, value_2 AS (split_part(split_part(data_f, ' ', 3), '|', 2))::INT) FROM '/home/dbadmin/t.txt' DELIMITER '.' SKIP 1; Rows Loaded ------------- 1 (1 row) dbadmin=> SELECT * FROM t; id | value_1 | value_2 ----+---------+--------- 1 | 20 | 40 2 | 30 | 50 (2 rows)Keep it simple ... we have
SPLIT_PART()in Vertica ...WITH -- your input .. indata(id , value_1,value_2) AS ( SELECT '1|2','20|30','40|50' ) -- real query starts here, replace "," with "WITH" ... , i(i) AS (-- just need 2 ints ... SELECT 1 UNION ALL SELECT 2 ) SELECT SPLIT_PART(id ,'|',i)::INT AS id , SPLIT_PART(value_1,'|',i)::INT AS value_1 , SPLIT_PART(value_2,'|',i)::INT AS value_1 FROM indata CROSS JOIN i ORDER BY 1; -- out id | value_1 | value_1 -- out ----+---------+--------- -- out 1 | 20 | 40 -- out 2 | 30 | 50