Questions on partition limit of 1024

I have a couple of questions on partition limit of 1024:
1. Is the 1024 limit per table or per database?
2. What happens if you exceed the 1024 limit? What impact does it have on the system?

Comments

  • Hi May,
    It is per projection and you will not be able to exceed it. Once that you get to the number you will have ROS pushback that is not loading more data until the number of ROS are reduced. Partition count and ros containers count are close together.
    HTH
    Eugenia
  • I'm curious what the requirement is behind using more than 1024 partitions.
  • If I may add to Eugenia's response:  A partition in Vertica is a more-heavyweight construct than in some other databases; we actually create a completely separate file for each partition.  (More precisely, a separate file per partition per column within the partition, since we're a column store and store column data in separate files.)

    So if you create many partitions, you will have a zillion tiny files.  These files aren't sorted, so we potentially have to open all of them to query your data; as a result, query performance is degraded.  Also, it's just plain a ton of tiny files.  Most hard drives don't perform well when seeking between many tiny files; this further degrades performance.

    We often see people coming from other database systems trying to create many partitions in order to improve performance.  In Vertica, you instead simply want to set the sort order -- we can query yesterday's data very quickly, partitioned or not, if the data is sorted by day.

    Adam
  • We have many millions of rows per partition by day, so let's say we stop partitioning. We use partitions to make it easy to replace the data for a single day in an atomic operation without requiring transactions. Would you suggest that a large transaction to insert millions of rows is performant enough while the table is being simultaneously queried? We'd delete the whole day, insert millions of rows and then commit.

  • Hi

     

    The method you mention is less valid for vertica as delete can be very slow and can create big impacts for select . 


    My understanding  is .

     

    1. You need to keep data of 1 year in your table .
    2. Each time your  data is being loaded into some kind of   staging table and by using partition API you replace the data of the staging with the data of the relevant date on the table .

     

    Having say that , you can switch to weekly / monthly partition and load data directly to the table . ( Eg: your monthly partition can be yearMonth like 201601 for january 2016 and you can easily and very fast drop partitions when you no longer need the data ).

     

    General guideline :

     

    1. If you can , make sure your load transaction are big , this will let you use the DIRECT hint  and will create less amount of ROS files .
    2. Loading directly to ROS will minimize the  impact to querys (WOS data  in some case can be slow then reading directly from ROS) 

     I hope you will find it useful

    Thanks 

     

     

  • Thanks for the response. We have a slightly different use case.

     

    We're restating data for a given date range every day. We have an unreliable link in getting data into our system, so when data comes in, it contains historical data for the previous 10 days. We recaclulate all 10 days' data and replace it in the database. We parititon by day, inserting the data into a separate table, and then swap the days into the main table. It is convenient that we can drop partitions to dump old data, but I'm more concerned about the inserts. We're inserting many millions of rows per table day.

     

    If we remove the ease of replacing data via paritions, we'd have to delete all the data for a given day, insert the new data, and commit. Whether we could do all 10 days at once or do each day separately is a separate question. Wondering whether it's better to do this than to partition.

  • Hi ,

    As i mention , if you can stay with partition approach without having too many  partitions on given time 

    this should be your preferable approach 

     

    Delete & insert approach is less optimized in column based databases (Many overhead - delete vectors /query response time impact and etc )

     

    I hope this answering your question 

     

    Thanks 

  • Is there any difference in delete performance if we're deleting a range of rows, espeically if they are sorted on the key being used for delete? For instance, imagine that the date colume is the first column and we sort on it. A delete of all the rows matching a given date will remove a set of contiguous rows from the DB. Even though we're deleting many rows, is this type of deletion perform better than deleting non-contiguous rows?

  • Yes ,
    This for sure will be better than the alternative .  delete performance in general are effected by :


    1. Projection design - Projector sorted key equal to the filter (in your case date) can improve the performance , in addition if you have projection that do not include the filter key on it , your delete will be very very ..... slow

    2. Large delete should be better then many small delete in term of the amount of delete vectors that will be created (In vertica delete is not deleting the data from the data files , vertica create delete vectors files , this is kind of lookup files that include some kind of row position to the records that was being deleted , in query  time the EE need to lookup on each record and make sure its not deleted records , this is why select queries are affected ) .
      Delete take X lock , during the delete execution , all other transactions will be block.


    Thanks

     

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file