Options

partition deletion request takes a long time

maksmaks Community Edition User

Hello. Could you please help me with the following issue? I have a vertica cluster of 3 nodes, the size of the database is about 800 GB, it consists of one table, partitioning by day is configured. I delete daily one of the oldest partitions with the command
SELECT DROP_PARTITIONS ('CDB_TYPE_STATUS','day','day','true');
where day is the day for which I delete the partition.

The command works correctly, but takes a very long time - about 2 hours. At the same time, there is no load on hardware resources (load on CPU / RAM is not more than 25%, each host has 16 CPU cores and 96 GB of RAM). The size of the deleted partition is approximately 6-7 GB. Why does it take so long to delete a partition and how can it be accelerated?

Tagged:

Answers

  • Options
    VValdarVValdar Vertica Employee Employee

    Hi Maks,

    How many partitions does your table possess?
    Maybe using the force‑split parameter here is unnecessary?
    Try with only DROP_PARTITIONS ('CDB_TYPE_STATUS','day','day');

  • Options
    maksmaks Community Edition User

    @VValdar I tried deleting a partition without the force-split parameter, but I get an error

    ERROR 8430:  Some storage containers must be split before the specified partition range can be dropped
    HINT:  Set "forceSplit" to true to implicitly repartition the data
    

    I also looked at the number of partitions in the table

    select node_name,table_schema,projection_name,count(distinct(partition_key)) partitions from partitions group by 1,2,3 order by 3 desc;
    node_name | table_schema | projection_name | partitions
    -------------------------------+-----------------------+---------------------------------------------------------+------------
    node0001 | public | CDB_TYPE_STATUS_b1 | 115
    node0002 | public | CDB_TYPE_STATUS_b1 | 112
    node0003 | public | CDB_TYPE_STATUS_b1 | 113
    node0002 | public | CDB_TYPE_STATUS_b0 | 113
    node0001 | public | CDB_TYPE_STATUS_b0 | 112
    node0003 | public | CDB_TYPE_STATUS_b0 | 115

  • Options
    VValdarVValdar Vertica Employee Employee

    That's weird, are you sure your table is partitioned by day?
    If it was the case the ForceSplit could be avoided (and that's not the case). 113ish partitions in your table would mean 4 months of history.

  • Options
    maksmaks Community Edition User

    @VValdar I split the table by days with the command

    ALTER TABLE CDB_TYPE_STATUS
      PARTITION BY ((time)::date) GROUP BY (date_trunc('day', (time)::date)) REORGANIZE;
    

    How correctly then to split the table by days?

  • Options
    marcothesanemarcothesane - Select Field - Administrator

    It looks like you are trying to drop partitions in a table with hierarchical partitioning - where at least a range of the partition keys you intend to drop are in grouped partitions.
    What is happening is best documented here:
    https://www.vertica.com/docs/12.0.x/HTML/Content/Authoring/AdministratorsGuide/Partitions/DroppingPartitions.htm?zoom_highlight=Splitting Partition Groups .
    To avoid being forced to use TRUE as force_split, do this:
    With hierarchical partitioning, for example, your last month might be partitioned by day, the rest of the months of the current year might be partitioned by year-month, and the previous rest will be partitioned by year.

    In this case, I would drop the oldest year in one shot - at the right time, and go:
    SELECT DROP_PARTITIONS ('CDB_TYPE_STATUS','2019-01-01','2019-12-31');
    I really don't expect you to need TRUE for force_split in this case (I have no chance to test right now); but if you actually should need it, there won't be anything to be split, and you can expect the operation to be quick.

  • Options
    VValdarVValdar Vertica Employee Employee
    edited December 2022

    Yes your group by trigger the hierarchical partitioning and I'm not sure that's what you want.
    If you only want daily partitions (beware of the number of ROS containers, 1 partition = 1 ROS), the following is enough:

    ALTER TABLE CDB_TYPE_STATUS
      PARTITION BY ((time)::date);
    

    Date datatype being already at day granularity.

  • Options
    marcothesanemarcothesane - Select Field - Administrator

    Just ran a test. Seems to confirm my assumption. Just the partition keys (dates), and an integer as an ID:

    DROP TABLE IF EXISTS hierpart;                                                                                                                                                            
    -- out DROP TABLE
    CREATE TABLE hierpart (
      id INTEGER NOT NULL PRIMARY KEY
    , dt DATE    NOT NULL
    )
    UNSEGMENTED ALL NODES
    PARTITION BY dt GROUP BY CALENDAR_HIERARCHY_DAY(dt,1,2)
    ;
    -- out CREATE TABLE
    
    INSERT INTO hierpart
    SELECT
      id
    , tsr::DATE AS dt
    FROM (
                SELECT 1, DATE '2018-01-01'
      UNION ALL SELECT 1, DATE '2022-12-31'
      UNION ALL SELECT 2, DATE '2018-01-01'
      UNION ALL SELECT 2, DATE '2022-12-31'
    ) limits(id,dt)
    TIMESERIES tsr AS '1 DAY' OVER(PARTITION BY id ORDER BY dt::TIMESTAMP)
    ;
    

    Now what happened?

    SELECT
      MIN(partition_key) minkey
    , MAX(partition_key) maxkey
    , COUNT(*)           keycount
    , ros_id
    FROM partitions 
    WHERE projection_name ilike 'hierpart%%' 
    GROUP BY 4
    ORDER BY 1;
    -- out    minkey   |   maxkey   | keycount |      ros_id       
    -- out ------------+------------+----------+-------------------
    -- out  2018-01-01 | 2018-12-31 |      365 | 45035996276613873
    -- out  2019-01-01 | 2019-12-31 |      365 | 45035996276613867
    -- out  2020-01-01 | 2020-12-31 |      366 | 45035996276613861
    -- out  2021-01-01 | 2021-01-31 |       31 | 45035996276613855
    -- out  2021-02-01 | 2021-02-28 |       28 | 45035996276613849                                                                                                                            
    [...]
    -- out  2022-10-01 | 2022-10-31 |       31 | 45035996276613729
    -- out  2022-11-01 | 2022-11-30 |       30 | 45035996276613723
    -- out  2022-12-01 | 2022-12-01 |        1 | 45035996276613717
    [...]
    -- out  2022-12-30 | 2022-12-30 |        1 | 45035996276613543
    -- out  2022-12-31 | 2022-12-31 |        1 | 45035996276613537
    

    Note that we have three year-grouped ROS containers, for 2018 through 2020; month-grouped ROS containers from 2021-01-01 through to 2022-11-30, and single-key ROS containers for the rest.

    If I now drop all partitions for the year 2018:
    SELECT DROP_PARTITIONS('hierpart','2018-01-01','2018-12-31' );
    -- out DROP_PARTITIONS
    -- out -------------------
    -- out Partition dropped
    It works as intended.

  • Options
    marcothesanemarcothesane - Select Field - Administrator

    Looking at it again, your hierarchical partitioning clause that you are using is not hierarchical.
    Matched with my table, it would be:

    ALTER TABLE hierpart
      PARTITION BY ((dt)::DATE) GROUP BY (DATE_TRUNC('DAY', (dt)::date)) REORGANIZE;
    

    A few data examples:

    SELECT
     dt::DATE
    ,DATE_TRUNC('DAY', (dt)::DATE)
    FROM hierpart 
    LIMIT 4;                                                                                                                                                                                  
    -- out      dt     |     DATE_TRUNC      
    -- out ------------+---------------------
    -- out  2022-12-31 | 2022-12-31 00:00:00
    -- out  2022-12-31 | 2022-12-31 00:00:00
    -- out  2022-12-30 | 2022-12-30 00:00:00
    -- out  2022-12-30 | 2022-12-30 00:00:00
    

    a) one creates a DATE literal, the other a TIMESTAMP literal
    b) both expressions generate the same granularity.

    I would either just partition by day - PARTITION BY "time"::DATE - or following my example, using hierarchical partitioning.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file