Options

Partitioning Strategey

I have table which has more than a milion records ingested per day . In the table I have a Timestamp_Day column which only holds the date part of day.

Also I need to be able to drop records older than 180 days on any day.

If the table is partitioned by TimeStamp_Day column which is of Type Date, will I exceed the recomended limit of 40 ROS containers per projection?

for the current data I have which is only for 2 different days I see there are already 16 ROS containers

what is the best Partitioning strategey for this kind of a table?

Answers

  • Options
    moshegmosheg Vertica Employee Administrator

    If you partition the table by the Timestamp_Day column using hierarchical partitioning with the CALENDAR_HIERARCHY_DAY function, as described in the Vertica documentation example below, it is less likely that you will exceed 40 ROS containers per projection.

    The CALENDAR_HIERARCHY_DAY function organizes date partitions into a hierarchy of groups, effectively consolidating older partitions into larger time spans (such as years and months). This dynamic grouping helps manage the number of ROS containers efficiently, even when dealing with a large volume of daily ingested records.

    See: https://docs.vertica.com/24.1.x/en/admin/partitioning-tables/hierarchical-partitioning

  • Options
    VValdarVValdar Vertica Employee Employee

    Hi prad33p2301,

    180 partitions would be acceptable with a daily partition.

    You could go with a group by partition, keep daily partition for xx days then move to weekly partition.
    And instead dropping one partition par day after 180 days, drop the whole week.
    For example with 28 daily + 22 weekly you get a 50 ROS table handling up to 182 days. Seems a good direction!

  • Options

    Thank @mosheg , @VValdar Considering that I will have 5M records a day , what will be the impact on the read and write performance when I drop a partition whichis having 5M records and what will be the case when I drop a weeks worth of data which will be 35M records

  • Options
    VValdarVValdar Vertica Employee Employee

    5M rows a day is not that much, maybe weekly partition is good enough in all cases.
    Dropping a partition is more or less instantaneous whatever it's size; it's a catalog operation + removing files, very low load for the database.

  • Options

    I tried the hierarchical partitioning with activemonth as 1 and activeyears as 2 , but the DROP PARTITIONS keeps failing.

    SELECT DROP_Partitions('public.tablename','2023-10-01','2023-10-01',true); runs for 35-40 mins and fails with the below error

  • Options
    marcothesanemarcothesane - Select Field - Administrator

    If you have hierarchical partitions, you should drop the older, grouped, partitions in a way that they don't need splitting. In fact, I would actually omit the last parameter, force-split, or deliberately set it to false, so the function fails if I don't hit the right partition keys. min_range_value must be the smallest key of a group of, for example a month, and max_range_value the greatest.

  • Options

    if I omit the last parameter or set it to false i get the below error

  • Options
    marcothesanemarcothesane - Select Field - Administrator

    That is because you did not drop the right partition keys.

    For example, if you have one active month, and you want to get rid of two months old data, today on 2024-03-27, you should not try to drop with the key of 2024-01-27, but drop all of January, or December, if you want to keep the last four days of January for now.

    If you, for example ...

    SELECT
      DROP_PARTITIONS (
        'public.partitioned_by_hier_date'
      , '2023-12-01'
      , '2023-12-31'
      );
    

    , then the function should run through. But if you don't hit the "border" partition keys of the month of December 2023, you run into the error you pasted as a picture above.

    And, if you do force a split, the transaction is not atomic any more, but will run until the single ROS container that currently holds the keys you specify, is split into ROS containers that don't hold the keys you specify in the function call.

  • Options
    marcothesanemarcothesane - Select Field - Administrator

    Here we have a complete scenario from scratch that you can test:

    DROP TABLE IF EXISTS hier_part;
    CREATE TABLE hier_part (
      dt DATE NOT NULL
    )
    PARTITION BY dt
    GROUP BY CALENDAR_HIERARCHY_DAY(dt,2,2); 
    -- two active months,two active years
    
    -- fill the table with one row per day, from 1st Jan 2022 till today ..
    -- use the TIMESERIES clause
    INSERT INTO hier_part
    WITH limits(ts) AS (
                SELECT TIMESTAMP '2022-01-01'
      UNION ALL SELECT CURRENT_DATE::TIMESTAMP
    )
    SELECT
      dt::DATE
    FROM limits
    TIMESERIES dt AS '1 day' OVER(ORDER BY ts)
    ;
    COMMIT;
    
    -- droopping the day two years ago fails ....
    SELECT DROP_PARTITIONS('hier_part','2022-03-27','2022-03-27');
    -- out ERROR 8430:  Some storage containers must be split before the specified partition range can be dropped
    -- out HINT:  Set "forceSplit" to true to implicitly repartition the data
    
    --- dropping the month two years ago fails, too
    SELECT DROP_PARTITIONS('hier_part','2022-03-01','2022-03-31');
    -- out ERROR 8430:  Some storage containers must be split before the specified partition range can be dropped
    -- out HINT:  Set "forceSplit" to true to implicitly repartition the data
    
    -- dropping the whole year two years ago succeeds ...
    SELECT DROP_PARTITIONS('hier_part','2022-01-01','2022-12-31');
    -- out   DROP_PARTITIONS  
    -- out -------------------
    -- out  Partition dropped
    -- out (1 row)
    -- out 
    -- out Time: First fetch (1 row): 36.700 ms. All rows formatted: 36.796 ms
    
    -- dropping the whole month one year ago succeeds ...
    SELECT DROP_PARTITIONS('hier_part','2023-03-01','2023-03-31');
    -- out   DROP_PARTITIONS  
    -- out -------------------
    -- out  Partition dropped
    -- out (1 row)
    -- out 
    -- out Time: First fetch (1 row): 14.522 ms. All rows formatted: 14.620 ms
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file