Options

Vertica SQL Data Transformation

Dear Friends,

My source table has a table with 2 columns: id, items

ID        ITEMS
------------------------------
1         X1, X2, X3, ... , Xn

2         Y1, Y2

And I would like to get the following output.

ID     COMBINATION
------------------
1      X1
1      X2
1      X3
...
...
...
1      Xn
2      Y1
2      Y2


Please help. Thank you.

Regards,
Kushal

Comments

  • Options
    You can use Vertica's tokenize UDF for implementing this.

    /opt/vertica/sdk/examples/TransformFunctions/StringTokenizer.cpp
  • Options
    Sid,
    I have not worked on UDF before. Can't this be done using SQLs

    Regards,
    Kushal
  • Options
    Sid,
    I have not worked on UDF before. Can't this be done using SQLs

    Regards,
    Kushal
  • Options
    The most simple method is to use UNION , each query will substr the next string from the line 
  • Options
    There is also a String Tokenizer in the Vertica Marketplace:

    https://vertica.hpwsportal.com/#/Product/Search/{%22qtext%22%3A%22string%22}/Show

    You can install it as a Vertica add-on.  Once installed, it is accessible from SQL.  (The Marketplace package is based on the "strings_package" code on our GitHub site, in case anyone's curious where it comes from.)

    This function is built into Vertica 7.1+ as "TxtIndex.StringTokenizer()".
  • Options
    Hi Kushal,

    If you find yourself working with data like this a lot, I would encourage you to take a look at Vertica's Flex extensions:

    http://my.vertica.com/docs/7.1.x/PDF/HP_Vertica_7.1.x_Flex_Tables.pdf

    In particular, you can parse the comma-separated list into an array using either the "FDelimitedParser()" function (at data-load time, see page 79 in the above doc) or the "MapDelimitedExtractor()" function (after the data has been loaded, see page 67); then use the "MapValues()" function (page 109) to explode the array out to one value per row.

    I think "MapDelimitedExtractor()" might be new in Vertica 7.1?  The others all ship with 7.0.

    (Note that these functions are often used with Vertica Flex Tables, but they don't have to be; you can use them anywhere.)

    What you have here is data that's not initially in traditional row/column form.  Vertica's Flex extensions are (among other things) intended to help you easily get it into a form from which you can manipulate it using more-traditional SQL.

    Adam
  • Options
    Hi!

    I don't remember where (not here), but somewhere I posted UNPIVOT function (UDFTx C++).
  • Options
    Thanks everyone for your quick response. and Thanks Adam..i will check and let everyone know how this is implemented

    Regards,
    Kushal

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file