parsing vertica sql dialect
Hello
I have a daily datamart calculation routine. There are 300+ datamarts and they depend on each other. And as of now I need to run datamart and query projection_usage to calculate dependencies for a new datamart. But datamarts became too wide and overdependent. So I came up with an idea of column-level dependencies to separate some calculations. And there is no column_usage. So I need to build and maintain sql statement model to analyze column-level dependencies. Or split statements on ddl-only and dml-only, execute all ddl and run analyze on dml. It looks like something vertica already does under the hood.
So can you advise me with projection_usage-like table for columns or can you provide me with sql grammar for vertica dialect? Or maybe there is a parser for vertica sql i did not find
Comments
What are you looking for? A way to check which columns from a projection were queried?
Yeap, I'm looking for a way to check columns needed to execute query. I can do this by splitting statements like create local temp table as select on ddl and dml and explaining dml part or by parsing. And with parsing I can do much more. I can get how resulting columns depend on input columns for example.
So I'm wondering if there is projection_usage-like table to get columns for recent queries or if I can somehow get grammar for vertica sql so I can make my own parser and parse queries before running them.
Honestly I have another task to point out to bi analysts what is bad in a query before executing it. So parser for vertica sql dialect is something that would solve both issues.
Maybe the data collector table DC_EXPLAIN_PLANS can help?
Quick example:
Well I can try it. So you suggest me to parse explains instead of queries and correlate all early&late materializations with corresponding join/storage access stages to decode aliases.
Here's what I got:
And now I know input columns. But I still cannot trace correlation between input and output columns. Any ideas how can I do this?