partitioning vs many tables
Hi All, We are trying to devise schema in vertica for storing data for our application. There are a large number of publishers for that application and the confusion is whether to store the data in a single table and have partitioning based on publisher id or store each publisher data in a separate table. There are certain operations like delete , recovery et al which create locks on the entire table, hence data push might disrupt for everyone if the data is stored in a single table. However a single table is more easily manageable. Could someone throw some light on the pros and cons of each ? Thanks, Ravi.
0
Comments
I think that you are misunderstanding the concept of partition in Vertica. Partition is a property of a table and specifies how data is organized within individual nodes. Vertica offer partition for 2 reasons,
1- To easily clean historic data, so you can drop a partition and reclaim the space very fast.
2- Partition pruning so when Vertica scan the data filter the partitions.
However it is recommended to keep a low number of partition as when partition Vertica separated the partition data in different files. A common partition key is month/year.
As a pro/con of partition by publisher
Pro, you can easily filter the data when doing queries publisher = xxx
Con,
- too many partition ( not sure the number)
- too many active partitions ( you will get ros push back when loading as all the active partition will merge with the starta algorithm.
- delete history data will be with a delete statement, and could take long time and create delete vectors.
Not having the possibility to have partition by publisher, I think having different tables may be your answer, but I would like to know what type of deletes you do? If in fact you do too many deletes, I would recommend you to read the delete performance consideration.
https://my.vertica.com/docs/6.1.x/HTML/index.htm#12704.htm
You also need to analyze what type of queries do you do.
Hope this helps,
Eugenia