How to Split column into multiple rows
I want to split data in a column into multiple rows. How can do that in Vertica?
SERIALNO | TIME_1 | PATTERN
-----------+---------------------+---------------------------------------------------------------------------------------------------------------
194011502 | 2014-03-17 15:09:08 | 15,1392573670,0;7963,1392573670,0;7964,1392573670,0;17448,1392573670,0;23048,1392573670,0;32648,1392573670,0;
SERIALNO | TIME_1 | PATTERN
-----------------+---------------------------+----------------------
194011502 | 2014-03-17 15:09:08 | 15,1392573670,0
194011502 | 2014-03-17 15:09:08 | 7963,1392573670
194011502 | 2014-03-17 15:09:08 | 7964,1392573670,0
194011502 | 2014-03-17 15:09:08 | 17448,1392573670,0
194011502 | 2014-03-17 15:09:08 | 23048,1392573670,0
194011502 | 2014-03-17 15:09:08 | 32648,1392573670,0
SERIALNO | TIME_1 | PATTERN
-----------+---------------------+---------------------------------------------------------------------------------------------------------------
194011502 | 2014-03-17 15:09:08 | 15,1392573670,0;7963,1392573670,0;7964,1392573670,0;17448,1392573670,0;23048,1392573670,0;32648,1392573670,0;
SERIALNO | TIME_1 | PATTERN
-----------------+---------------------------+----------------------
194011502 | 2014-03-17 15:09:08 | 15,1392573670,0
194011502 | 2014-03-17 15:09:08 | 7963,1392573670
194011502 | 2014-03-17 15:09:08 | 7964,1392573670,0
194011502 | 2014-03-17 15:09:08 | 17448,1392573670,0
194011502 | 2014-03-17 15:09:08 | 23048,1392573670,0
194011502 | 2014-03-17 15:09:08 | 32648,1392573670,0
0
Comments
There is a way , we can do this in Vertica.
Consider this example.
Creating a table as mentioned above. Inserting records into this table
Then we need to know what is the maximum number of semicolons in the records. Here we found the maximum semicolons that a table has in single records is 6.
Remember this number as this is going to be used ahead.
Now we write our main SQL.
Note - This SELECT....UNION CLAUSE uses the number of maximum semicolons we found before this step, accordingly you need to increment the UNION clause starting from index 0.
This is how we can get a column split into number of rows according to delimiter.
Hope this helps.
I have used a tokenizer UDF for implementing this.
dbadmin=> SELECT * FROM data_stg;
SERIALNO | TIME_1 | PATTERN
-----------+---------------------+---------------------------------------------------------------------------------------------------------------
194011502 | 2014-03-17 15:09:08 | 15,1392573670,0;7963,1392573670,0;7964,1392573670,0;17448,1392573670,0;23048,1392573670,0;32648,1392573670,0;
194011502 | 2014-03-17 15:09:08 | 15,1392573670,0;7963,1392573670,0;7964,1392573670,0;17448,1392573670,0;23048,1392573670,0;32648,1392573670,0;
194395874 | 2014-04-17 15:09:08 | 15,1392573670,0;7963,1392573670,0;7964,1392573670,0;17448,1392573670,0;23048,1392573670,0;32648,1392573670,0;
194395876 | 2014-02-17 15:09:08 | 15,1392573670,0;7963,1392573670,0;7964,1392573670,0;17448,1392573670,0;23048,1392573670,0;32648,1392573670,0;
(4 rows)
dbadmin=> SELECT SERIALNO,TIME_1,tokenize(replace(PATTERN,';',' ')) OVER (partition by SERIALNO,TIME_1) as PATTERN FROM data_stg;
SERIALNO | TIME_1 | PATTERN
-----------+---------------------+--------------------
194011502 | 2014-03-17 15:09:08 | 15,1392573670,0
194011502 | 2014-03-17 15:09:08 | 7963,1392573670,0
194011502 | 2014-03-17 15:09:08 | 7964,1392573670,0
194011502 | 2014-03-17 15:09:08 | 17448,1392573670,0
194011502 | 2014-03-17 15:09:08 | 23048,1392573670,0
194011502 | 2014-03-17 15:09:08 | 32648,1392573670,0
194011502 | 2014-03-17 15:09:08 | 15,1392573670,0
194011502 | 2014-03-17 15:09:08 | 7963,1392573670,0
194011502 | 2014-03-17 15:09:08 | 7964,1392573670,0
194011502 | 2014-03-17 15:09:08 | 17448,1392573670,0
194011502 | 2014-03-17 15:09:08 | 23048,1392573670,0
194011502 | 2014-03-17 15:09:08 | 32648,1392573670,0
194395874 | 2014-04-17 15:09:08 | 15,1392573670,0
194395874 | 2014-04-17 15:09:08 | 7963,1392573670,0
194395874 | 2014-04-17 15:09:08 | 7964,1392573670,0
194395874 | 2014-04-17 15:09:08 | 17448,1392573670,0
194395874 | 2014-04-17 15:09:08 | 23048,1392573670,0
194395874 | 2014-04-17 15:09:08 | 32648,1392573670,0
194395876 | 2014-02-17 15:09:08 | 15,1392573670,0
194395876 | 2014-02-17 15:09:08 | 7963,1392573670,0
194395876 | 2014-02-17 15:09:08 | 7964,1392573670,0
194395876 | 2014-02-17 15:09:08 | 17448,1392573670,0
194395876 | 2014-02-17 15:09:08 | 23048,1392573670,0
194395876 | 2014-02-17 15:09:08 | 32648,1392573670,0
(24 rows)
What is great about this is that we can see Vertica can take care of the business in many way !
Great solution !