Questions on partition limit of 1024
I have a couple of questions on partition limit of 1024:
1. Is the 1024 limit per table or per database?
2. What happens if you exceed the 1024 limit? What impact does it have on the system?
1. Is the 1024 limit per table or per database?
2. What happens if you exceed the 1024 limit? What impact does it have on the system?
0
Comments
It is per projection and you will not be able to exceed it. Once that you get to the number you will have ROS pushback that is not loading more data until the number of ROS are reduced. Partition count and ros containers count are close together.
HTH
Eugenia
So if you create many partitions, you will have a zillion tiny files. These files aren't sorted, so we potentially have to open all of them to query your data; as a result, query performance is degraded. Also, it's just plain a ton of tiny files. Most hard drives don't perform well when seeking between many tiny files; this further degrades performance.
We often see people coming from other database systems trying to create many partitions in order to improve performance. In Vertica, you instead simply want to set the sort order -- we can query yesterday's data very quickly, partitioned or not, if the data is sorted by day.
Adam
We have many millions of rows per partition by day, so let's say we stop partitioning. We use partitions to make it easy to replace the data for a single day in an atomic operation without requiring transactions. Would you suggest that a large transaction to insert millions of rows is performant enough while the table is being simultaneously queried? We'd delete the whole day, insert millions of rows and then commit.
Hi
The method you mention is less valid for vertica as delete can be very slow and can create big impacts for select .
My understanding is .
Having say that , you can switch to weekly / monthly partition and load data directly to the table . ( Eg: your monthly partition can be yearMonth like 201601 for january 2016 and you can easily and very fast drop partitions when you no longer need the data ).
General guideline :
I hope you will find it useful
Thanks
Thanks for the response. We have a slightly different use case.
We're restating data for a given date range every day. We have an unreliable link in getting data into our system, so when data comes in, it contains historical data for the previous 10 days. We recaclulate all 10 days' data and replace it in the database. We parititon by day, inserting the data into a separate table, and then swap the days into the main table. It is convenient that we can drop partitions to dump old data, but I'm more concerned about the inserts. We're inserting many millions of rows per table day.
If we remove the ease of replacing data via paritions, we'd have to delete all the data for a given day, insert the new data, and commit. Whether we could do all 10 days at once or do each day separately is a separate question. Wondering whether it's better to do this than to partition.
Hi ,
As i mention , if you can stay with partition approach without having too many partitions on given time
this should be your preferable approach
Delete & insert approach is less optimized in column based databases (Many overhead - delete vectors /query response time impact and etc )
I hope this answering your question
Thanks
Is there any difference in delete performance if we're deleting a range of rows, espeically if they are sorted on the key being used for delete? For instance, imagine that the date colume is the first column and we sort on it. A delete of all the rows matching a given date will remove a set of contiguous rows from the DB. Even though we're deleting many rows, is this type of deletion perform better than deleting non-contiguous rows?
Yes ,
This for sure will be better than the alternative . delete performance in general are effected by :
Projection design - Projector sorted key equal to the filter (in your case date) can improve the performance , in addition if you have projection that do not include the filter key on it , your delete will be very very ..... slow
Large delete should be better then many small delete in term of the amount of delete vectors that will be created (In vertica delete is not deleting the data from the data files , vertica create delete vectors files , this is kind of lookup files that include some kind of row position to the records that was being deleted , in query time the EE need to lookup on each record and make sure its not deleted records , this is why select queries are affected ) .
Delete take X lock , during the delete execution , all other transactions will be block.
Thanks