Taking advantage of more partitioning?

I have read and understand why there is a maximum to the number of partitions with Vertica but I'm still wondering if there isn't another way I can take advantage of the distribution of my data. Basically my data is distrubted by year but also by customerID. There is about 20 years worth of data in my database for approximately 100 customers. At the moment I'm only partitioning on year which has resulted in 20 partitions. But the data is always accessed for a specific customer so it's very easy to further partition by customerID which would result in 2000 partitions. This obviously won't work with Vertica due to the hard limits but is there another way? Perhaps I can create a projection for each customer/year? Would this be a good idea?

Comments

  • Hi Dennis, First, you are right about that doing a partition by data and customer is not a good idea. But creating a projection with each customer by year is not possible neither. Why you want to take advantage of more partitioning? do your queries are not performing well? If so, not necessarily is a partition problem. Partition is for 2 main reason, 1- Clean the historic data. Vertica does not perform optimally when you do deletes and if you want to clean the history in your case eliminate the data from 11 years ago, with partition you just execute a drop partition and the data is automatically delete. It is fast, does not take locks and works really well. 2- Partition pruning, if you put in your query a where statement that uses the partition field such as where year = 2011. Vertica will just go to the partitions that have data from that year and avoid scan all the records, therefore, query perform much faster. Having too many partitions does not help. As you may note Vertica encode the data as more partition you have less advantage you take of encoding such as RLE and could affect performance negatively. So it is important that you understand your queries and priorities in order to take the best advantages. Does this make sense? Hope this helps. Eugenia
  • Hi Dennis (and hey Eugenia!, hope you don't mind if I jump in briefly), it sounds to me like this is an ideal case for choosing the right sort order on your projections. Specifically: If you sort on customerID, particularly if you also use RLE encoding on the column, Vertica essentially stores the data clustered by customerID within each projection. In that case, looking up the rows for particular customerID value will be very fast, and will be completely independent of how many records you have for each customer. (This is very different from the behavior that you might expect from a traditional row-store database, where all operations get more expensive as you add more rows.) Vertica's Database Designer will, incidentally, automatically set up this sort of optimization for you.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file