partition deletion request takes a long time
maks
Community Edition User ✭
Hello. Could you please help me with the following issue? I have a vertica cluster of 3 nodes, the size of the database is about 800 GB, it consists of one table, partitioning by day is configured. I delete daily one of the oldest partitions with the commandSELECT DROP_PARTITIONS ('CDB_TYPE_STATUS','day','day','true');
where day is the day for which I delete the partition.
The command works correctly, but takes a very long time - about 2 hours. At the same time, there is no load on hardware resources (load on CPU / RAM is not more than 25%, each host has 16 CPU cores and 96 GB of RAM). The size of the deleted partition is approximately 6-7 GB. Why does it take so long to delete a partition and how can it be accelerated?
Tagged:
0
Answers
Hi Maks,
How many partitions does your table possess?
Maybe using the force‑split parameter here is unnecessary?
Try with only DROP_PARTITIONS ('CDB_TYPE_STATUS','day','day');
@VValdar I tried deleting a partition without the force-split parameter, but I get an error
I also looked at the number of partitions in the table
select node_name,table_schema,projection_name,count(distinct(partition_key)) partitions from partitions group by 1,2,3 order by 3 desc;
node_name | table_schema | projection_name | partitions
-------------------------------+-----------------------+---------------------------------------------------------+------------
node0001 | public | CDB_TYPE_STATUS_b1 | 115
node0002 | public | CDB_TYPE_STATUS_b1 | 112
node0003 | public | CDB_TYPE_STATUS_b1 | 113
node0002 | public | CDB_TYPE_STATUS_b0 | 113
node0001 | public | CDB_TYPE_STATUS_b0 | 112
node0003 | public | CDB_TYPE_STATUS_b0 | 115
That's weird, are you sure your table is partitioned by day?
If it was the case the ForceSplit could be avoided (and that's not the case). 113ish partitions in your table would mean 4 months of history.
@VValdar I split the table by days with the command
How correctly then to split the table by days?
It looks like you are trying to drop partitions in a table with hierarchical partitioning - where at least a range of the partition keys you intend to drop are in grouped partitions.
What is happening is best documented here:
https://www.vertica.com/docs/12.0.x/HTML/Content/Authoring/AdministratorsGuide/Partitions/DroppingPartitions.htm?zoom_highlight=Splitting Partition Groups .
To avoid being forced to use
TRUE
asforce_split
, do this:With hierarchical partitioning, for example, your last month might be partitioned by day, the rest of the months of the current year might be partitioned by year-month, and the previous rest will be partitioned by year.
In this case, I would drop the oldest year in one shot - at the right time, and go:
SELECT DROP_PARTITIONS ('CDB_TYPE_STATUS','2019-01-01','2019-12-31');
I really don't expect you to need
TRUE
forforce_split
in this case (I have no chance to test right now); but if you actually should need it, there won't be anything to be split, and you can expect the operation to be quick.Yes your group by trigger the hierarchical partitioning and I'm not sure that's what you want.
If you only want daily partitions (beware of the number of ROS containers, 1 partition = 1 ROS), the following is enough:
Date datatype being already at day granularity.
Just ran a test. Seems to confirm my assumption. Just the partition keys (dates), and an integer as an ID:
Now what happened?
Note that we have three year-grouped ROS containers, for 2018 through 2020; month-grouped ROS containers from 2021-01-01 through to 2022-11-30, and single-key ROS containers for the rest.
If I now drop all partitions for the year 2018:
SELECT DROP_PARTITIONS('hierpart','2018-01-01','2018-12-31' );
-- out DROP_PARTITIONS
-- out -------------------
-- out Partition dropped
It works as intended.
Looking at it again, your hierarchical partitioning clause that you are using is not hierarchical.
Matched with my table, it would be:
A few data examples:
a) one creates a DATE literal, the other a TIMESTAMP literal
b) both expressions generate the same granularity.
I would either just partition by day -
PARTITION BY "time"::DATE
- or following my example, using hierarchical partitioning.