How to Split column into multiple rows

I want to split data in a column into multiple rows. How can do that in Vertica?

SERIALNO  |       TIME_1        |                                                    PATTERN
-----------+---------------------+---------------------------------------------------------------------------------------------------------------
 194011502 | 2014-03-17 15:09:08 | 15,1392573670,0;7963,1392573670,0;7964,1392573670,0;17448,1392573670,0;23048,1392573670,0;32648,1392573670,0;
 
  SERIALNO  |            TIME_1          |      PATTERN
-----------------+---------------------------+----------------------
 194011502   | 2014-03-17 15:09:08 | 15,1392573670,0
 194011502   | 2014-03-17 15:09:08 | 7963,1392573670
 194011502   | 2014-03-17 15:09:08 | 7964,1392573670,0
 194011502   | 2014-03-17 15:09:08 | 17448,1392573670,0
 194011502   | 2014-03-17 15:09:08 | 23048,1392573670,0
 194011502   | 2014-03-17 15:09:08 | 32648,1392573670,0

Comments

  • Navin_CNavin_C Vertica Customer
    Hello Siddarth,

    There is a way , we can do this in Vertica.

    Consider this example.

    Creating a table as mentioned above.

    create table navin.test_column_split(
    serial_number int,
    time_1 timestamp,
    pattern varchar(200)
    )
    Inserting records into this table 
    nnani=>  select * from navin.test_column_split; serial_number |           time_1           |                                                   pattern
    ---------------+----------------------------+--------------------------------------------------------------------------------------------------------------
            102123 | 2014-06-27 08:29:11.077139 | 15,1392573670,0;7963,1392573670,0;7964,1392573670,0;17448,1392573670,0;23048,1392573670,0;32648,1392573670,0
            102124 | 2014-06-27 09:03:57.350652 | 15,139257,0;7963,139257,0;7964,1392570,0;17448,1370,0;23048,1393670,0;32648,1373670,0;21365,1245879,0
            102125 | 2014-06-27 09:31:13.343068 | 15,1357,0;7963,1357,0;7964,12570,0;17448,1370,0;23048,13970,0;32648,13770,0
    (3 rows)
    nnani=>

    Then we need to know what is the maximum number of semicolons in the records.
    nnani=>   select max(regexp_count(pattern,';')) from navin.test_column_split; max
    -----
       6
    (1 row)
    nnani=>
    Here we found the maximum semicolons that a table has in single records is 6.
    Remember this number as this is going to be used ahead.

    Now we write our main SQL.
    SELECT serial_number, time_1,split_part(pattern,';',n.digit+1) AS sep_pattern FROM navin.test_column_split 
      JOIN (SELECT 0 digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6) n
      ON LENGTH(REPLACE(pattern, ';' , '')) <= LENGTH(pattern)-n.digit
    serial_number |           time_1           |    sep_pattern---------------+----------------------------+--------------------
            102123 | 2014-06-27 08:29:11.077139 | 15,1392573670,0
            102123 | 2014-06-27 08:29:11.077139 | 7963,1392573670,0
            102123 | 2014-06-27 08:29:11.077139 | 7964,1392573670,0
            102123 | 2014-06-27 08:29:11.077139 | 17448,1392573670,0
            102123 | 2014-06-27 08:29:11.077139 | 23048,1392573670,0
            102123 | 2014-06-27 08:29:11.077139 | 32648,1392573670,0
            102124 | 2014-06-27 09:03:57.350652 | 15,139257,0
            102124 | 2014-06-27 09:03:57.350652 | 7963,139257,0
            102124 | 2014-06-27 09:03:57.350652 | 7964,1392570,0
            102124 | 2014-06-27 09:03:57.350652 | 17448,1370,0
            102124 | 2014-06-27 09:03:57.350652 | 23048,1393670,0
            102124 | 2014-06-27 09:03:57.350652 | 32648,1373670,0
            102124 | 2014-06-27 09:03:57.350652 | 21365,1245879,0
            102125 | 2014-06-27 09:31:13.343068 | 15,1357,0
            102125 | 2014-06-27 09:31:13.343068 | 7963,1357,0
            102125 | 2014-06-27 09:31:13.343068 | 7964,12570,0
            102125 | 2014-06-27 09:31:13.343068 | 17448,1370,0
            102125 | 2014-06-27 09:31:13.343068 | 23048,13970,0
            102125 | 2014-06-27 09:31:13.343068 | 32648,13770,0
    (19 rows)

    Note - This SELECT....UNION CLAUSE uses the number of maximum semicolons we found before this step, accordingly you need to increment the UNION clause starting from index 0.

    This is how we can get a column split into number of rows according to delimiter.

    Hope this helps.








  • Great solution from Navin. A similar approach, but something that you can do in a single pass below. You should use the maximum number of patterns that can exist in a single input row in the inline view and also do the same number of split parts, the null condition check in the where clause will make sure the right number of rows are returned. This query works even when the input table has variable number of patterns in different rows.
    select a.serialno,a.time_1,decode(b.idx,1,p1,2,p2,3,p3,4,p4,5,p5,6,p6,7,p7) as ptn from (select 
            serialno,
            time_1,
            split_part(pattern,';',1) as p1, 
            split_part(pattern,';',2) as p2, 
            split_part(pattern,';',3) as p3,
            split_part(pattern,';',4) as p4, 
            split_part(pattern,';',5) as p5, 
            split_part(pattern,';',6) as p6, 
            split_part(pattern,';',7) as p7
    from pattern_input) a 
    cross join (select 1 as idx union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7) b where decode(b.idx,1,p1,2,p2,3,p3,4,p4,5,p5,6,p6,7,p7) is not null;
  • Thanks both for the solution :)

    I have used a tokenizer UDF for implementing this.

    dbadmin=> SELECT * FROM data_stg;

    SERIALNO  |       TIME_1        |                                                    PATTERN

    -----------+---------------------+---------------------------------------------------------------------------------------------------------------

    194011502 | 2014-03-17 15:09:08 | 15,1392573670,0;7963,1392573670,0;7964,1392573670,0;17448,1392573670,0;23048,1392573670,0;32648,1392573670,0;

    194011502 | 2014-03-17 15:09:08 | 15,1392573670,0;7963,1392573670,0;7964,1392573670,0;17448,1392573670,0;23048,1392573670,0;32648,1392573670,0;

    194395874 | 2014-04-17 15:09:08 | 15,1392573670,0;7963,1392573670,0;7964,1392573670,0;17448,1392573670,0;23048,1392573670,0;32648,1392573670,0;

    194395876 | 2014-02-17 15:09:08 | 15,1392573670,0;7963,1392573670,0;7964,1392573670,0;17448,1392573670,0;23048,1392573670,0;32648,1392573670,0;

    (4 rows)

     

    dbadmin=> SELECT SERIALNO,TIME_1,tokenize(replace(PATTERN,';',' ')) OVER (partition by SERIALNO,TIME_1) as PATTERN FROM data_stg;

    SERIALNO  |       TIME_1        |      PATTERN

    -----------+---------------------+--------------------

    194011502 | 2014-03-17 15:09:08 | 15,1392573670,0

    194011502 | 2014-03-17 15:09:08 | 7963,1392573670,0

    194011502 | 2014-03-17 15:09:08 | 7964,1392573670,0

    194011502 | 2014-03-17 15:09:08 | 17448,1392573670,0

    194011502 | 2014-03-17 15:09:08 | 23048,1392573670,0

    194011502 | 2014-03-17 15:09:08 | 32648,1392573670,0

    194011502 | 2014-03-17 15:09:08 | 15,1392573670,0

    194011502 | 2014-03-17 15:09:08 | 7963,1392573670,0

    194011502 | 2014-03-17 15:09:08 | 7964,1392573670,0

    194011502 | 2014-03-17 15:09:08 | 17448,1392573670,0

    194011502 | 2014-03-17 15:09:08 | 23048,1392573670,0

    194011502 | 2014-03-17 15:09:08 | 32648,1392573670,0

    194395874 | 2014-04-17 15:09:08 | 15,1392573670,0

    194395874 | 2014-04-17 15:09:08 | 7963,1392573670,0

    194395874 | 2014-04-17 15:09:08 | 7964,1392573670,0

    194395874 | 2014-04-17 15:09:08 | 17448,1392573670,0

    194395874 | 2014-04-17 15:09:08 | 23048,1392573670,0

    194395874 | 2014-04-17 15:09:08 | 32648,1392573670,0

    194395876 | 2014-02-17 15:09:08 | 15,1392573670,0

    194395876 | 2014-02-17 15:09:08 | 7963,1392573670,0

    194395876 | 2014-02-17 15:09:08 | 7964,1392573670,0

    194395876 | 2014-02-17 15:09:08 | 17448,1392573670,0

    194395876 | 2014-02-17 15:09:08 | 23048,1392573670,0

    194395876 | 2014-02-17 15:09:08 | 32648,1392573670,0

    (24 rows)

  •  Why do i get the felling you already had this solution before you asked the question ? :) 
    What is great about this is that we can see Vertica can take care of the business in many way ! :) 
     Great solution ! 
  • Hi Adrian, solution was already there but I did not wanted to use UDF for implementing it. We are getting few columns of our cdr data in this format and I was looking for some sql based generic solution. :)

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file