Options

Backfilling data and partitions - looking for best practice/advice

bmurrellbmurrell Community Edition User

Hello,
We have a table partitioned by date and a group partitioned by calendar_hierarchy_day.
We often have a backfilling exercise, which obviously inserts records into old and large partitions which then causes a mergeout.
How can I avoid that ? The date used is not updated, so it's only new (but old) records.
Queries use this date as a filter so on one side it's a good candidate for partitioning.
Also increasing activepartitioncount isn't appropriate since it's common to backdate many years, often up to 30 years, including all years from that point.

Best Answer

  • Options
    marcothesanemarcothesane - Select Field - Administrator
    Answer ✓

    Assume that the table is hierarchically partitioned by day for the last 3 months, month for the past year, and year from then onwards, and the table is sales.f_sales. Assume that the new records are really new, so no updates required. And all new records refer to the year 2021. And the new records sit in a landing-zone table landing.f_sales. And the driving column for the element of time is sales_dt.

    Step 1: create a staging table like the base table, and partitioned the same way:

    CREATE TABLE sales.f_sales_stg LIKE sales.f_sales INCLUDING PROJECTIONS;
    

    Step 2 : Create a year partition for the year affected by the change in one transaction. That should create one ROS container, with all 365 partition keys of 2021 combined.

    INSERT INTO sales.f_sales_stg
    SELECT * FROM landing.f_sales
    UNION ALL
    SELECT * FROM sales.f_sales WHERE sales_dt BETWEEN DATE '2021-01-01' AND DATE '2021-12-31'
    

    Step 3: Swap the right partitions between the staging table and the base table:

    SELECT 
      SWAP_PARTITIONS_BETWEEN_TABLES(
        'sales.f_sales_stg'
      , '2021-01-01'
      , '2021-12-31'
      , 'sales.f_sales’
    );
    

Answers

  • Options
    bmurrellbmurrell Community Edition User

    Thanks marcothesane. I will give that a try. It does imply there's only 1 writer process, to avoid data getting lost between steps 2 and 3.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file