Options

Surrogate Key Generation

We have a requirement to generate the surrogate key for huge table in Vertica. We tried with multiple options like IDENTITY , AUTO_INCREMENT and SEQUENCE NUMBER.  These options are not working with our requirement. Its working in first session. If we again try to load in new session, its starting with new value (like session1 - starting with 1, session2 - starting with 500001 and session3 - starting with 1000001. We don't want like this. We want continue the number (like Max(Number)+1) in every load. We also tried with Row_Number() over() + Max(Number) . But INSERT-SELECT is giving performance issues with this option.  Can you please suggest is there any better way in vertica to load the data with surrogate key.

Comments

  • Options

     

    The next value that you would see in another session when it comes to sequences is decided by the cache setting. You should set it to (or close to) the number of records you push in a load. That way you won't see big gaps in your table. The only way to get no gaps without any perf issues AFAIK is to do it programmatically. grab the current val of a sequence. alter it to start at the (Number of records you plan to insert in that session +1). load your data. Repeat for every load. No gaps this way but more work on your side. DO NOT set the cache value to 1 in sequence if are going to be inserting multiple records in a table while using a sequence. Your inserts will be super slow. Post any questions you might have.  

  • Options

     

     

    You have 2 options to load data without gaps with IDENTITY or AUTO_INCREMENT column.

     

    Option 1 

     

    Number of allocated values per session defined in underlying sequence in CACHE property.

    By default it is block of 250K values.

    If you set CACHE value to 1 you will have no gaps, but will loose on performance.

     

     

    Option 2

     

    It may be possible to get contiguous numbers if you can control data loading. A batch of 250K records will use all the numbers from the 250K range assigned to session. If you can batch all of the data inserts to have the same number of records as your cache value you will have no gaps.

     

     

    This blog post describing everything in great detail

    http://www.dbjungle.com/working-with-identity-columns-in-hp-vertica/

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file