Backfilling data and partitions - looking for best practice/advice
Hello,
We have a table partitioned by date and a group partitioned by calendar_hierarchy_day.
We often have a backfilling exercise, which obviously inserts records into old and large partitions which then causes a mergeout.
How can I avoid that ? The date used is not updated, so it's only new (but old) records.
Queries use this date as a filter so on one side it's a good candidate for partitioning.
Also increasing activepartitioncount isn't appropriate since it's common to backdate many years, often up to 30 years, including all years from that point.
Best Answer
-
marcothesane - Select Field - Administrator
Assume that the table is hierarchically partitioned by day for the last 3 months, month for the past year, and year from then onwards, and the table is
sales.f_sales
. Assume that the new records are really new, so no updates required. And all new records refer to the year 2021. And the new records sit in a landing-zone tablelanding.f_sales
. And the driving column for the element of time issales_dt
.Step 1: create a staging table like the base table, and partitioned the same way:
CREATE TABLE sales.f_sales_stg LIKE sales.f_sales INCLUDING PROJECTIONS;
Step 2 : Create a year partition for the year affected by the change in one transaction. That should create one ROS container, with all 365 partition keys of 2021 combined.
INSERT INTO sales.f_sales_stg SELECT * FROM landing.f_sales UNION ALL SELECT * FROM sales.f_sales WHERE sales_dt BETWEEN DATE '2021-01-01' AND DATE '2021-12-31'
Step 3: Swap the right partitions between the staging table and the base table:
SELECT SWAP_PARTITIONS_BETWEEN_TABLES( 'sales.f_sales_stg' , '2021-01-01' , '2021-12-31' , 'sales.f_sales’ );
1
Answers
Thanks marcothesane. I will give that a try. It does imply there's only 1 writer process, to avoid data getting lost between steps 2 and 3.