Group by on High Cardinal dimension - Projection Tuning- Vertica
I have a scenario where the group by is done on high cardinal dimension value field which has nearly 30Billion distinct values and the data type of this field is varchar. This query for a long time and I need to tune this query by creating projections in vertica. Could anyone provide me some ideas on it.
0
Comments
Hi,
You can tune the query using database designer. Please find the below URL's for more information
https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/AnalyzingData/Optimizations/FirstStepsForImprovingQueryPerformance.htm?Highlight=optimize query
https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/GettingStartedGuide/UsingDatabaseDeisgner/RunningDBDwithAT.htm?Highlight=database designer
Sruthi
I am sorry, I didnt mention that I dont have DBD to design my projection and I am creating custom projections to tune the query. I have those information that were providied as link above, but still it didnt help me.
If there is any commonality in the content of the distinct data, you could create what I would call a "synethic" lower cardinality column from the high cardinality data that will help to do an initial grouping.
For example, if the data is IP addresses for many servers, you could parse out the pieces of the IP address into separate columns, thereby "synthetically" creating lower cardinality columns from the high cardinality data.
Another example would be an office number. If the number is "12345678" and the first two digits are country code, the second are region code, etc. you could use those parts of the data in the high cardinality column to create two "synthetic" lower cardinality columns with which to order your projection.
You could even not change the table columns and use a live aggregate projection to create a new projection that has parses the data from the high cardinality column into lower cardinality columns by which you could sort the columns.
https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/AnalyzingData/AggregatedData/ExpressionsAggregateData.htm%3FTocPath%3DAnalyzing%2520Data%7CData%2520Aggregation%7CPre-Aggregating%2520Data%2520in%2520Projections%7CAggregating%2520Data%2520Through%2520Expressions%7C_____0
https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/AnalyzingData/AggregatedData/ExpressionsAggregateDataExample.htm%3FTocPath%3DAnalyzing%2520Data%7CData%2520Aggregation%7CPre-Aggregating%2520Data%2520in%2520Projections%7CAggregating%2520Data%2520Through%2520Expressions%7C_____1
Hopefully this gives you some type of an idea for an approach.
Hi Herb,
Thanks for the solution. I am using vertica 7.1v.
Also the high cardinal column doesnt have any particular pattern in it. It has numbers, characters, and alphanumeric values and the group by is done on this column.
example values
12
3736183
93836718393
NULL_STRING
pa.9d9jd8dddj3k39d9g9s9ec776
d7adf8jd75efc
Please suggest me a solution.
DId you try to create projection segmented by your high cardinality (hc) column. You should use mpp principle to run calculation on every node in your cluster.
Take the columns you need for that query only, segment projection by your hc column, apply encoding and order by hc column.
Hi,
Unfortunately the value in this column cannot be correlated with any other and it is not derived field too. Thanks for your reply.
-Vinoth
Hi,
I have tried segmentation based on HC column in my projection and made it in top of my order my clause, the result was the projection referesh happens for ever like it took more than 20 hours so I stopped it. Thanks