Options

Transactionally reload daily table snapshots

edited January 2019 in General Discussion

Hi,

we are dealing with a rather common scenario. There are datasources that give us full snapshots of their state every day. We take them and store them in Vertica. We're using 9.2.

We have certain requirements for this process:
1. When a snapshot is updated to a newer one, we'd like it to happen instantly and without any unnecessary locks from the end users (the clients that actually use the tables) perspective.
2. We'd like the process to be safe in case two uploads start at the same time. We expect one of the snapshots to be applied and the other to fail.
3. We'd like to have a general solution for any such table, so as little manual work for each case as possible.


Current thoughts and issues:

The most elegant way to update the table user is seeing would be using SWAP_PARTITIONS_BETWEEN_TABLES, with some intermediate staging table. This is also mentioned in the documentation several times, as a good general pattern.

Problems

  • Staging table cannot be temporary, because it's impossible to create a temp table LIKE a real table. Therefore, either a static staging table, or a dynamically created real table are the only options.
  • Tables that are not partitioned would require creating a single 'dummy' partition. (I'd guess that unpartitioned table is anyway stored identically as a single partition table, so this feels a bit messy)
  • There is no SWAP_ALL_PARTITIONS or SWAP_TABLES. This breaks our wish to have a generic solution, because then we need to somehow iterate over an arbitrary list of partitions.

We want to remove old data and insert new into the staging table as a single transaction. This is to prevent issues if accidentally someone starts two such processes. For example, if we use truncate, then both concurrent loads would truncate (which commits transaction) and start loading snapshots, ending in a duplicates. Ideal solution would take an X lock on the table, and perform the entire remove and load operation in one go.

Problems:

  • Again, partitions seems to be a good way with DROP PARTITION. But as I understand, it commits just like TRUNCATE. Moreover, again problems with iterating over partitions instead of saying DROP_ALL_PARTITIONS (I guess that's what TRUNCATE does?)
  • The only way to remove data and not commit a transaction is DELETE, which is then slow and doesn't suit some of our loads.

I guess using bulk delete is what we're after (according to https://www.vertica.com/kb/Best-Practices-for-Deleting-Data/Content/BestPractices/Best-Practices-for-Deleting-Data.htm#PerformBulkDelete) but there is a severe lack of documentation on how does it work. Is it different from DELETE ... WHERE employee_id = 15? How would I DELETE all rows from a table?

Any suggestions on how to approach this?

Comments

  • Options
    Jim_KnicelyJim_Knicely - Select Field - Administrator

    Here is one option that might cover requirement 1 and 3:

    dbadmin=> SELECT * FROM main;
     c1 | c2
    ----+----
      1 | A
      2 | B
      3 | D
    (3 rows)
    
    dbadmin=> \! cat /home/dbadmin/main_new.txt
    1|A
    2|B
    3|C
    4|D
    
    dbadmin=> CREATE TABLE main_stg LIKE main INCLUDING PROJECTIONS;
    CREATE TABLE
    
    dbadmin=> COPY main_stg FROM '/home/dbadmin/main_new.txt' DIRECT;
     Rows Loaded
    -------------
               4
    (1 row)
    
    dbadmin=> ALTER TABLE main_stg, main, temp RENAME TO temp, main_stg, main;
    ALTER TABLE
    
    dbadmin=> SELECT * FROM main;
     c1 | c2
    ----+----
      1 | A
      2 | B
      3 | C
      4 | D
    (4 rows)
    
    dbadmin=> DROP TABLE main_stg;
    DROP TABLE
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file