The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.
Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!
Split a string into parts
[Deleted User] Administrator
This blog post was authored by Jim Knicely.
One of my favorite functions in Vertica is named SPLIT_PART. It splits up a string into parts by a given delimiter.
dbadmin=> SELECT split_part(my_text, ',', 1) the_first_part, dbadmin-> split_part(my_text, ',', 2) the_second_part, dbadmin-> split_part(my_text, ',', 3) the_third_part, dbadmin-> split_part(my_text, ',', 4) the_fourth_part dbadmin-> FROM (SELECT 'ONE,TWO,THREE,FOUR' my_text) foo; the_first_part | the_second_part | the_third_part | the_fourth_part ----------------+-----------------+----------------+----------------- ONE | TWO | THREE | FOUR (1 row)
But what if I don’t know how many parts there are in my text? As the following example shows, I can use the ROW NUMBER analytic function to dynamically split my text up!
dbadmin=> SELECT * FROM split_this_up ORDER BY 1; id | some_text ----+---------------------------------------- 1 | Please split this sentence up by word 2 | And do the same with this sentence too (2 rows) dbadmin=> SELECT id, word dbadmin-> FROM (SELECT id, split_part(some_text, ' ', row_number() over (PARTITION BY id)) word dbadmin(> FROM split_this_up dbadmin(> CROSS JOIN columns) foo dbadmin-> WHERE word <> ''; id | word ----+---------- 1 | Please 1 | split 1 | this 1 | sentence 1 | up 1 | by 1 | word 2 | And 2 | do 2 | the 2 | same 2 | with 2 | this 2 | sentence 2 | too (15 rows)
This is neat. What is the table
columnsin the phrase
CROSS JOIN columnsand how does that work?
EDIT Is this just using the system table
v_catalogas an arbitrary table that is likely to have more rows than any string to be split has delimiters? For the user
dbadminthat is likely to be true, but does that also result in millions of blank rows in the to be filtered out by
WHERE word <> ''?
This might equivalently have been joined by something else, such as...
CROSS JOIN ( SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10 ) bar
I tend to agree -
ROW_NUMBER()on the v_catalog.columns table , when, in one of the databases I work with, I end up with over 300 million total columns in the database, is a bit like chopping off a tree to get a toothpick ...
There are actually several ways to do the same thing:
CROSS JOINof a series of consecutive INTEGERs;
some_textwith square brackets, do a
String_to_Array()on that one, and applying an
EXPLODE()on the resulting array
StringTokenizerDelim()function from the Text-Index library .
The first is the slowest, the last is the fastest, in a 300,000 row table with 100 tokens each .
Here's a script for you to try, based on Jim's example.
I do like the UNION ALL SELECT of consecutive integers, up to 10, I'd say.
If there are more, I "abuse" a side effect of the TIMESERIES clause to get a series of consecutive integers. Note the two CTE-s in the first of the 3 queries ...