The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.
Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!
How to apply a parser (FRegexParser) to a table column?

Hello,
I'm using Vertica 10.1 and I need to apply a Parser over a column. The table has a column called "body" and there is a continuos ingestion over this table (100M rows daily).
I need to apply daily (cron task) a Regex to get all matches and insert into table_stg table. This is my current solution:
vsql -U dbadmin -w pass -d db -At -c "SELECT text FROM table_in WHERE dt = CURRENT_DATE()" | vsql -U dbadmin -w pass -d db -At -c "COPY table_stg FROM STDIN PARSER FRegexParser(pattern = '...');"
Is this a good approach?
Could you give me some suggestions to accomplish this task?
Tagged:
0
Answers
Would this be easier solved by simply using one of Vertica's regular expression functions?
https://www.vertica.com/docs/10.1.x/HTML/Content/Authoring/SQLReferenceManual/Functions/RegularExpressions/RegularExpressionFunctions.htm?tocpath=SQL Reference Manual|SQL Functions|Regular Expression Functions|_____0
Hello Curtis, thanks for your answer.
I have reviewed that functions but I feel limited because my Regex looks like this:
(?<hostname>\w.+?):\s\[(?<datetime>|\w.*?)\]\s(?<servername>\w.*?)\s...
The source column (table_in table) has 20 values I have to retrieve with my Regex and these matches must insert into a table (table_stg table) with this columns.
With COPY command this is easy but in my case the information is already in Vertica.
Is there a way to do this with the functions you recommend?
Maybe I don't see how I can do it, but I would appreciate if you could give me some advice to do these transformations using those functions.
Here are some regex examples:
select REGEXP_REPLACE(REGEXP_REPLACE(substr(query, 1, 500), '\d'), '\''[\S\s]?\''', '''?''' )
, avg(query_duration_us)/1000000, avg(processed_Row_count), count()
from query_profiles where query_start::date > current_date -1
group by 1
having count(*) > 3
order by 4 desc ;
select regexp_replace(substr(segment_expression, 1, 500), '\w+.'), count(*) from projections where projection_name not ilike '%b1'
group by 1 order by 2,1 desc ;
But it sounds like you have 20 columns embedded in a single column in the table, and you need to split that out into 20 other columns into your target table?