We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Need to split multiple column which contains pipeline seperated data in to multiple rows — Vertica Forum

Need to split multiple column which contains pipeline seperated data in to multiple rows

Hello,
My data is stored in the below format:
ID VALUE_1 VALUE_2
1|2 20|30 40|50
Expected format:
ID VALUE_1 VALUE_2
1 20 40
2 30 50
Can anyone help me on this as i am stucked with the above.
Thanks,
Suhrid Ghosh

Answers

  • Bryan_HBryan_H Vertica Employee Administrator

    This might be easiest to solve with a pre-processor, either an external script or a UDx that explodes each row.
    Current Vertica supports ARRAY type, so you could load data like this, but need to edit array and record delimiters:
    dbadmin=> create table array3 (a1 array[int], a2 array[int], a3 array[int]);
    dbadmin=> copy array3 from stdin;
    1,2|20,30|40,50
    dbadmin=> select * from array3;
    a1 | a2 | a3
    -------+---------+---------
    [1,2] | [20,30] | [40,50]
    Then use array functions like position access and EXPLODE to expand each row.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited December 2020

    @Suhrid_R__Ghosh - Here is a an example per @Bryan_H 's great suggestion!

    dbadmin=> CREATE TABLE t_temp (id ARRAY[INT], value1 ARRAY[INT], value2 ARRAY[INT]);
    CREATE TABLE
    
    dbadmin=> \! cat /home/dbadmin/t.txt
    ID VALUE_1 VALUE_2
    1|2 20|30 40|50
    
    dbadmin=> COPY t_temp FROM '/home/dbadmin/t.txt' DELIMITER ' ' COLLECTIONDELIMITER '|' SKIP 1;
     Rows Loaded
    -------------
               1
    (1 row)
    
    dbadmin=> SELECT * FROM t_temp;
      id   | value1  | value2
    -------+---------+---------
     [1,2] | [20,30] | [40,50]
    (1 row)
    
    dbadmin=> SELECT id, value1, value2 FROM
    dbadmin->   (SELECT explode(id) OVER (PARTITION BEST) AS (pos, id) FROM t_temp) a
    dbadmin->   JOIN
    dbadmin->   (SELECT explode(value1) OVER (PARTITION BEST) AS (pos, value1) FROM t_temp) b USING (pos)
    dbadmin->   JOIN
    dbadmin->   (SELECT explode(value2) OVER (PARTITION BEST) AS (pos, value2) FROM t_temp) c USING (pos);
     id | value1 | value2
    ----+--------+--------
      1 |     20 |     20
      2 |     40 |     50
    (2 rows)
    

    If you don't have Vertica 10, here's another possibility:

    dbadmin=> CREATE TABLE t (id INT, value_1 INT, value_2 INT);
    CREATE TABLE
    
    dbadmin=> \! cat /home/dbadmin/t.txt
    ID VALUE_1 VALUE_2
    1|2 20|30 40|50
    
    dbadmin=> COPY t(data_f FILLER VARCHAR, id AS (split_part(split_part(data_f, ' ', 1), '|', 1))::INT, value_1 AS (split_part(split_part(data_f, ' ', 2), '|', 1))::INT, value_2 AS (split_part(split_part(data_f, ' ', 3), '|', 1))::INT) FROM '/home/dbadmin/t.txt' DELIMITER '.' SKIP 1;
     Rows Loaded
    -------------
               1
    (1 row)
    
    dbadmin=> COPY t(data_f FILLER VARCHAR, id AS (split_part(split_part(data_f, ' ', 1), '|', 2))::INT, value_1 AS (split_part(split_part(data_f, ' ', 2), '|', 2))::INT, value_2 AS (split_part(split_part(data_f, ' ', 3), '|', 2))::INT) FROM '/home/dbadmin/t.txt' DELIMITER '.' SKIP 1;
     Rows Loaded
    -------------
               1
    (1 row)
    
    dbadmin=> SELECT * FROM t;
     id | value_1 | value_2
    ----+---------+---------
      1 |      20 |      40
      2 |      30 |      50
    (2 rows)
    
  • marcothesanemarcothesane - Select Field - Administrator

    Keep it simple ... we have SPLIT_PART() in Vertica ...

    WITH
    -- your input ..
    indata(id , value_1,value_2) AS (
      SELECT '1|2','20|30','40|50'
    )
    -- real query starts here, replace "," with "WITH" ...
    ,
    i(i) AS (-- just need 2 ints ...
                SELECT 1
      UNION ALL SELECT 2
    )
    SELECT
      SPLIT_PART(id     ,'|',i)::INT AS id
    , SPLIT_PART(value_1,'|',i)::INT AS value_1
    , SPLIT_PART(value_2,'|',i)::INT AS value_1
    FROM indata CROSS JOIN i
    ORDER BY 1;
    -- out  id | value_1 | value_1 
    -- out ----+---------+---------
    -- out   1 |      20 |      40
    -- out   2 |      30 |      50
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file