How to apply a parser (FRegexParser) to a table column?
aoropeza
Vertica Customer ✭
Hello,
I'm using Vertica 10.1 and I need to apply a Parser over a column. The table has a column called "body" and there is a continuos ingestion over this table (100M rows daily).
I need to apply daily (cron task) a Regex to get all matches and insert into table_stg table. This is my current solution:
vsql -U dbadmin -w pass -d db -At -c "SELECT text FROM table_in WHERE dt = CURRENT_DATE()" | vsql -U dbadmin -w pass -d db -At -c "COPY table_stg FROM STDIN PARSER FRegexParser(pattern = '...');"
Is this a good approach?
Could you give me some suggestions to accomplish this task?
Tagged:
0
Answers
Would this be easier solved by simply using one of Vertica's regular expression functions?
https://www.vertica.com/docs/10.1.x/HTML/Content/Authoring/SQLReferenceManual/Functions/RegularExpressions/RegularExpressionFunctions.htm?tocpath=SQL Reference Manual|SQL Functions|Regular Expression Functions|_____0
Hello Curtis, thanks for your answer.
I have reviewed that functions but I feel limited because my Regex looks like this:
(?<hostname>\w.+?):\s\[(?<datetime>|\w.*?)\]\s(?<servername>\w.*?)\s...
The source column (table_in table) has 20 values I have to retrieve with my Regex and these matches must insert into a table (table_stg table) with this columns.
With COPY command this is easy but in my case the information is already in Vertica.
Is there a way to do this with the functions you recommend?
Maybe I don't see how I can do it, but I would appreciate if you could give me some advice to do these transformations using those functions.
Here are some regex examples:
select REGEXP_REPLACE(REGEXP_REPLACE(substr(query, 1, 500), '\d'), '\''[\S\s]?\''', '''?''' )
, avg(query_duration_us)/1000000, avg(processed_Row_count), count()
from query_profiles where query_start::date > current_date -1
group by 1
having count(*) > 3
order by 4 desc ;
select regexp_replace(substr(segment_expression, 1, 500), '\w+.'), count(*) from projections where projection_name not ilike '%b1'
group by 1 order by 2,1 desc ;
But it sounds like you have 20 columns embedded in a single column in the table, and you need to split that out into 20 other columns into your target table?