Partition a Table By More Than One Column
Jim_Knicely
- Select Field - Administrator
The Vertica partitioning capability divides one large table into smaller pieces based on values in one or more columns. Partitions can make data lifecycle management easier and improve the performance of queries whose predicate is included in the partition expression.
To partition a table by more than one column, use the HASH function!
Example:
dbadmin=> CREATE TABLE some_fact_table (c1 INT NOT NULL, c2 INT NOT NULL, c3 VARCHAR(1) NOT NULL, c4 INT, c5 VARCHAR(100)) PARTITION BY (c1, c2); ERROR 4331: PARTITION BY expression cannot return a tuple dbadmin=> CREATE TABLE some_fact_table (c1 INT NOT NULL, c2 INT NOT NULL, c3 VARCHAR(1) NOT NULL, c4 INT, c5 VARCHAR(100)) PARTITION BY HASH(c1, c2); CREATE TABLE
Helpful Link:
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Statements/partition-clause.htm
Have fun!
0
Comments
hi @Jim_Knicely , In the above case how can we drop older partitions ? If I have a table which is having a date only column and also a accountid column, how can I drop partitions which are older than 90 days?
You would need to compute hash for each combination. This should be possible with a stored procedure: loop over dates older than 90 days found in the table, then for each date, loop over distinct values of the second column, then inside the inner loop, drop based on hash on date and second field.
Hi @Jim_Knicely ! Hash(c1, c2) can create billions of distinct values. Any way to group into reasonable number of partition groups, so it can be practically used?
I can see only reason for hash partitioning if partition pruning will work. Is it really works? How query should be written to allow Vertica to do partition pruning? Can you provide a working example of table partitioned by hash, and query that allow Vertica to do a partition pruning?
Any other reasons to use hash partitioning, except for partition pruning?