A partition is not a database object, but a subdivision of a table's storage containers.
My approach is this one:
CREATE TABLE wrk.target_swp LIKE wrk.target INCLUDING PROJECTIONS;
-- this will also create an identically partitioned table
INSERT INTO wrk.target_swp
SELECT col1,col2, .... , <update_column_val>, <update_column_val_x> ...
FROM wrk.target
WHERE dt >= <somedate> AND dt < <some_other_date>
AND .. -- other filters as you would use in the UPDATE statement.
;
-- if , for example, the two dates above range between now and 1st May
-- and the table is partitioned by year-month ...
SELECT SWAP_PARTITIONS_BETWEEN_TABLES('wrk.target_swp','202305,'202306','wrk.target');
This way, you only touch the affected partitions. And swapping partitions is a very quick atomic operation, which keeps the locks on the two tables just for a split second.
Sorry, I forgot:
You need to add this before the SWAP_PARTITIONS_BETWEEN_TABLES call:
INSERT INTO wrk.target_swp
SELECT * FROM wrk.target
WHERE dt >= <somedate> AND dt < <some_other_date>
AND -- negation of other filters as you would use in the UPDATE statement.
You need the partitions in the %_swp table to contain all data as you want it in the target table at the end; not only the data you want updated.
Thanks for your fast reply but I didn't understand your answer
I have some big ETL jobs I want to run in parallel. They insert into stage table partitions, manipulate the data then swap into the parent table
I'm not concerned with locking during partition swaps, just the manipulations
Lets say I have a partitioned table like this
CREATE TABLE part_stage
(
the_year varchar(10) NOT NULL ENCODING RLE,
the_country varchar(255) NOT NULL ENCODING RLE,
some_column varchar(255)
)
PARTITION BY the_year||the_country;
--Then I insert these two rows in different vsql sessions, no problem I assume with shared insert locks
insert into part_stage values('2023','Holland','some text');
insert into part_stage values('2022','Italy','some more text');
-- now I do an update like this in the first session. This is just a trivial example, in reality there's a ton of data and the update being done can last some time.
update part_stage set some_column='some other text' where the_year='2023' and the_country='Holland';
-- does it block this update in the other ETL session until the update in the other other session commits ?
update part_stage set some_column='something else' where the_year='2022' and the_country='Italy';
The update needs an X lock, is it required on the table or just the partition ?
If the lock is required on the whole table, will the second update wait or give an error ?
Then, yes, I'm afraid. If you're updating a table through two sessions in parallel - never mind the partitions, the lock goes to the whole table.
To avoid that, it may become nasty. One swap-staging table per parallel update. Then, in a second step an INSERT-SELECT * FROM wrk.target_swp1 UNION ALL SELECT * FROM wrk.target_swp2 (.etc...) into a big swapping table - and finally the SWAP_PARTITIONS_BETWEEN_TABLES()
Answers
This has been discussed already here:
https://forum.vertica.com/discussion/239101/is-it-safe-performance-locking-etc-to-update-fields-in-a-monthly-partitioned-table
A partition is not a database object, but a subdivision of a table's storage containers.
My approach is this one:
This way, you only touch the affected partitions. And swapping partitions is a very quick atomic operation, which keeps the locks on the two tables just for a split second.
Sorry, I forgot:
You need to add this before the
SWAP_PARTITIONS_BETWEEN_TABLES
call:You need the partitions in the
%_swp
table to contain all data as you want it in the target table at the end; not only the data you want updated.Thanks for your fast reply but I didn't understand your answer
I have some big ETL jobs I want to run in parallel. They insert into stage table partitions, manipulate the data then swap into the parent table
I'm not concerned with locking during partition swaps, just the manipulations
Lets say I have a partitioned table like this
CREATE TABLE part_stage
(
the_year varchar(10) NOT NULL ENCODING RLE,
the_country varchar(255) NOT NULL ENCODING RLE,
some_column varchar(255)
)
PARTITION BY the_year||the_country;
--Then I insert these two rows in different vsql sessions, no problem I assume with shared insert locks
insert into part_stage values('2023','Holland','some text');
insert into part_stage values('2022','Italy','some more text');
-- now I do an update like this in the first session. This is just a trivial example, in reality there's a ton of data and the update being done can last some time.
update part_stage set some_column='some other text' where the_year='2023' and the_country='Holland';
-- does it block this update in the other ETL session until the update in the other other session commits ?
update part_stage set some_column='something else' where the_year='2022' and the_country='Italy';
The update needs an X lock, is it required on the table or just the partition ?
If the lock is required on the whole table, will the second update wait or give an error ?
Thanks
David
Then, yes, I'm afraid. If you're updating a table through two sessions in parallel - never mind the partitions, the lock goes to the whole table.
To avoid that, it may become nasty. One swap-staging table per parallel update. Then, in a second step an
INSERT-SELECT * FROM wrk.target_swp1 UNION ALL SELECT * FROM wrk.target_swp2 (.etc...)
into a big swapping table - and finally theSWAP_PARTITIONS_BETWEEN_TABLES()