The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.
Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!
partitioning vs many tables
Hi All, We are trying to devise schema in vertica for storing data for our application. There are a large number of publishers for that application and the confusion is whether to store the data in a single table and have partitioning based on publisher id or store each publisher data in a separate table. There are certain operations like delete , recovery et al which create locks on the entire table, hence data push might disrupt for everyone if the data is stored in a single table. However a single table is more easily manageable. Could someone throw some light on the pros and cons of each ? Thanks, Ravi.
I think that you are misunderstanding the concept of partition in Vertica. Partition is a property of a table and specifies how data is organized within individual nodes. Vertica offer partition for 2 reasons,
1- To easily clean historic data, so you can drop a partition and reclaim the space very fast.
2- Partition pruning so when Vertica scan the data filter the partitions.
However it is recommended to keep a low number of partition as when partition Vertica separated the partition data in different files. A common partition key is month/year.
As a pro/con of partition by publisher
Pro, you can easily filter the data when doing queries publisher = xxx
- too many partition ( not sure the number)
- too many active partitions ( you will get ros push back when loading as all the active partition will merge with the starta algorithm.
- delete history data will be with a delete statement, and could take long time and create delete vectors.
Not having the possibility to have partition by publisher, I think having different tables may be your answer, but I would like to know what type of deletes you do? If in fact you do too many deletes, I would recommend you to read the delete performance consideration.
You also need to analyze what type of queries do you do.
Hope this helps,