How to apply a parser (FRegexParser) to a table column?

aoropezaaoropeza Vertica Customer

Hello,
I'm using Vertica 10.1 and I need to apply a Parser over a column. The table has a column called "body" and there is a continuos ingestion over this table (100M rows daily).
I need to apply daily (cron task) a Regex to get all matches and insert into table_stg table. This is my current solution:

vsql -U dbadmin -w pass -d db -At -c "SELECT text FROM table_in WHERE dt = CURRENT_DATE()" | vsql -U dbadmin -w pass -d db -At -c "COPY table_stg FROM STDIN PARSER FRegexParser(pattern = '...');"

Is this a good approach?
Could you give me some suggestions to accomplish this task?

Tagged:

Answers

  • aoropezaaoropeza Vertica Customer
    edited June 2021

    Hello Curtis, thanks for your answer.
    I have reviewed that functions but I feel limited because my Regex looks like this:
    (?<hostname>\w.+?):\s\[(?<datetime>|\w.*?)\]\s(?<servername>\w.*?)\s...
    The source column (table_in table) has 20 values I have to retrieve with my Regex and these matches must insert into a table (table_stg table) with this columns.

    table_in: id, body
    table_stg: hostname, datetime, servername, ... more columns

    With COPY command this is easy but in my case the information is already in Vertica.
    Is there a way to do this with the functions you recommend?
    Maybe I don't see how I can do it, but I would appreciate if you could give me some advice to do these transformations using those functions.

  • Here are some regex examples:
    select REGEXP_REPLACE(REGEXP_REPLACE(substr(query, 1, 500), '\d'), '\''[\S\s]?\''', '''?''' )
    , avg(query_duration_us)/1000000, avg(processed_Row_count), count(
    )
    from query_profiles where query_start::date > current_date -1
    group by 1
    having count(*) > 3
    order by 4 desc ;

    select regexp_replace(substr(segment_expression, 1, 500), '\w+.'), count(*) from projections where projection_name not ilike '%b1'
    group by 1 order by 2,1 desc ;

    But it sounds like you have 20 columns embedded in a single column in the table, and you need to split that out into 20 other columns into your target table?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file