How to apply a parser (FRegexParser) to a table column?

aoropeza · May 2021

Hello,
I'm using Vertica 10.1 and I need to apply a Parser over a column. The table has a column called "body" and there is a continuos ingestion over this table (100M rows daily).
I need to apply daily (cron task) a Regex to get all matches and insert into table_stg table. This is my current solution:

vsql -U dbadmin -w pass -d db -At -c "SELECT text FROM table_in WHERE dt = CURRENT_DATE()" | vsql -U dbadmin -w pass -d db -At -c "COPY table_stg FROM STDIN PARSER FRegexParser(pattern = '...');"

Is this a good approach?
Could you give me some suggestions to accomplish this task?

Vertica_Curtis · May 2021

Would this be easier solved by simply using one of Vertica's regular expression functions?
https://www.vertica.com/docs/10.1.x/HTML/Content/Authoring/SQLReferenceManual/Functions/RegularExpressions/RegularExpressionFunctions.htm?tocpath=SQL Reference Manual|SQL Functions|Regular Expression Functions|_____0

aoropeza · June 2021

Hello Curtis, thanks for your answer.
I have reviewed that functions but I feel limited because my Regex looks like this:
(?<hostname>\w.+?):\s\[(?<datetime>|\w.*?)\]\s(?<servername>\w.*?)\s...
The source column (table_in table) has 20 values I have to retrieve with my Regex and these matches must insert into a table (table_stg table) with this columns.

table_in: id, body
table_stg: hostname, datetime, servername, ... more columns

With COPY command this is easy but in my case the information is already in Vertica.
Is there a way to do this with the functions you recommend?
Maybe I don't see how I can do it, but I would appreciate if you could give me some advice to do these transformations using those functions.

Vertica_Curtis · June 2021

Here are some regex examples:
select REGEXP_REPLACE(REGEXP_REPLACE(substr(query, 1, 500), '\d'), '\''[\S\s]?\''', '''?''' )
, avg(query_duration_us)/1000000, avg(processed_Row_count), count()
from query_profiles where query_start::date > current_date -1
group by 1
having count(*) > 3
order by 4 desc ;

select regexp_replace(substr(segment_expression, 1, 500), '\w+.'), count(*) from projections where projection_name not ilike '%b1'
group by 1 order by 2,1 desc ;

But it sounds like you have 20 columns embedded in a single column in the table, and you need to split that out into 20 other columns into your target table?

We're Moving!

Create My New Community Account Now

How to apply a parser (FRegexParser) to a table column?

Answers

Leave a Comment