We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


How to apply a parser (FRegexParser) to a table column? — Vertica Forum

How to apply a parser (FRegexParser) to a table column?

aoropezaaoropeza Vertica Customer

Hello,
I'm using Vertica 10.1 and I need to apply a Parser over a column. The table has a column called "body" and there is a continuos ingestion over this table (100M rows daily).
I need to apply daily (cron task) a Regex to get all matches and insert into table_stg table. This is my current solution:

vsql -U dbadmin -w pass -d db -At -c "SELECT text FROM table_in WHERE dt = CURRENT_DATE()" | vsql -U dbadmin -w pass -d db -At -c "COPY table_stg FROM STDIN PARSER FRegexParser(pattern = '...');"

Is this a good approach?
Could you give me some suggestions to accomplish this task?

Tagged:

Answers

  • aoropezaaoropeza Vertica Customer
    edited June 2021

    Hello Curtis, thanks for your answer.
    I have reviewed that functions but I feel limited because my Regex looks like this:
    (?<hostname>\w.+?):\s\[(?<datetime>|\w.*?)\]\s(?<servername>\w.*?)\s...
    The source column (table_in table) has 20 values I have to retrieve with my Regex and these matches must insert into a table (table_stg table) with this columns.

    table_in: id, body
    table_stg: hostname, datetime, servername, ... more columns

    With COPY command this is easy but in my case the information is already in Vertica.
    Is there a way to do this with the functions you recommend?
    Maybe I don't see how I can do it, but I would appreciate if you could give me some advice to do these transformations using those functions.

  • Here are some regex examples:
    select REGEXP_REPLACE(REGEXP_REPLACE(substr(query, 1, 500), '\d'), '\''[\S\s]?\''', '''?''' )
    , avg(query_duration_us)/1000000, avg(processed_Row_count), count(
    )
    from query_profiles where query_start::date > current_date -1
    group by 1
    having count(*) > 3
    order by 4 desc ;

    select regexp_replace(substr(segment_expression, 1, 500), '\w+.'), count(*) from projections where projection_name not ilike '%b1'
    group by 1 order by 2,1 desc ;

    But it sounds like you have 20 columns embedded in a single column in the table, and you need to split that out into 20 other columns into your target table?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file