ROS per node per partition

Hello,
I am using Vertica 9.1 with KSAFE(1).
I have a large table that I want to partition by date (I am using hierarchical partitioning for timestamptz column):

PARTITION BY ((my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date)
GROUP BY (
CASE WHEN ("datediff"('year', (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date, ((now() AT TIME ZONE 'UTC')::timestamptz(6))::date) >= 2)
THEN (date_trunc('year', (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date))::date
WHEN ("datediff"('month', (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date, ((now() AT TIME ZONE 'UTC')::timestamptz(6))::date) >= 2)
THEN (date_trunc('month', (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date))::date
ELSE (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date END)
REORGANIZE;

For each partition and each projection (my_projection_b0 and my_projection_b1) Vertica creates 4 ROS conatiners per node. As I understand from documentation, it is supposed to be 1 ROS container per node per partition per projection (after tuple mover handles any inserted/updated data). I have the number which is 4 times higher then expected. I check it with the query:

SELECT COUNT (DISTINCT ros_id) NumROS, node_name FROM PARTITIONS
WHERE projection_name ilike '%my_projection%'
GROUP BY node_name
ORDER BY node_name;
In this case I get 8 ROS per partition per node (but I think this is because of KSAFE(1) which gives my_projection_b0 and my_projection_b1), which is fine. But why 4?

Thank you.

Answers

  • DaveTDaveT Vertica Employee Employee

    The number of ROS containers depends on your data. Your 2016 data and prior years will be in ROS containers per year. Your June and July data for 2018 will be in ROS containers per day and everything else in 2017 and 2018 will be in a ROS container per month. Things will continue to shift over time based on the CASE statement used.

    A good way to view your data is with the dump_table_partition_keys metafunction. It will show you how each partition key has been placed within each ROS container.

  • Thank you DaveT,
    I have loaded some test data which is a part of 2007 year, so it is supposed to be stored in one container per node, but I got 8 (4+4), for that. Below is a part of dump_table_partition_keys result (just for one node):

    Partition keys on node v_test_node0001
    Projection 'projection_test_01_b0'
    Storage [ROS container]
    No of partition keys: 122
    Partition keys: 2007-08-01, 2007-08-02, 2007-08-03, 2007-08-04, ...
    Storage [ROS container]
    No of partition keys: 122
    Partition keys: 2007-08-01, 2007-08-02, 2007-08-03, 2007-08-04, ...
    Storage [ROS container]
    No of partition keys: 122
    Partition keys: 2007-08-01, 2007-08-02, 2007-08-03, 2007-08-04, ...
    Storage [ROS container]
    No of partition keys: 122
    Partition keys: 2007-08-01, 2007-08-02, 2007-08-03, 2007-08-04, ...
    Projection 'projection_test_01_b1'
    Storage [ROS container]
    No of partition keys: 122
    Partition keys: 2007-08-01, 2007-08-02, 2007-08-03, 2007-08-04, ...
    Storage [ROS container]
    No of partition keys: 122
    Partition keys: 2007-08-01, 2007-08-02, 2007-08-03, 2007-08-04, ...
    Storage [ROS container]
    No of partition keys: 122
    Partition keys: 2007-08-01, 2007-08-02, 2007-08-03, 2007-08-04, ...
    Storage [ROS container]
    No of partition keys: 122
    Partition keys: 2007-08-01, 2007-08-02, 2007-08-03, 2007-08-04, ...

  • DaveTDaveT Vertica Employee Employee
    It is probably due to size and number of your loads, the active partition, and whether mergeout had run. As you load more data with different dates, months, and years you will see it shift as mentioned, Rerun dump_table_partition_keys over time and you should see data shift as mergeout runs.
  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited July 2018

    In the quick example below, I am getting 1 ROS container for each Partition Key by node:

    dbadmin=> CREATE TABLE my_table (pk INT, my_timestamptz_column TIMESTAMPTZ NOT NULL, CONSTRAINT my_table_pk PRIMARY KEY(pk))
    dbadmin-> ORDER BY pk
    dbadmin-> SEGMENTED BY HASH(pk) ALL NODES;
    CREATE TABLE
    
    dbadmin=> ALTER TABLE my_table
    dbadmin-> PARTITION BY ((my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date)
    dbadmin-> GROUP BY (
    dbadmin(> CASE WHEN ("datediff"('year', (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date, ((now() AT TIME ZONE 'UTC')::timestamptz(6))::date) >= 2)
    dbadmin(> THEN (date_trunc('year', (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date))::date
    dbadmin(> WHEN ("datediff"('month', (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date, ((now() AT TIME ZONE 'UTC')::timestamptz(6))::date) >= 2)
    dbadmin(> THEN (date_trunc('month', (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date))::date
    dbadmin(> ELSE (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date END)
    dbadmin-> REORGANIZE;
    NOTICE 4785:  Started background repartition table task
    ALTER TABLE
    
    dbadmin=> INSERT INTO my_table SELECT row_number() over (), to_date(randomint(11)+1::varchar || '/01/2017', 'MM/DD/YYYY') FROM vs_columns LIMIT 25;
     OUTPUT
    --------
         25
    (1 row)
    
    dbadmin=> COMMIT;
    COMMIT
    
    dbadmin=> SELECT do_tm_task('mergeout', 'my_table');
                                                                 do_tm_task
    -------------------------------------------------------------------------------------------------------------------------------------
     Task: mergeout
    (Table: public.my_table) (Projection: public.my_table_b0)
    (Table: public.my_table) (Projection: public.my_table_b1)
    
    (1 row)
    
  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    dbadmin=> SELECT DUMP_TABLE_PARTITION_KEYS('public.my_table');
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          DUMP_TABLE_PARTITION_KEYS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
    ----------------------------------------------------------------------
     Partition keys on node v_test_db_node0001
      Projection 'my_table_b0'
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-09-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-03-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-01-01
      Projection 'my_table_b1'
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-10-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-07-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-06-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-05-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-04-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-02-01
    
     Partition keys on node v_test_db_node0002
      Projection 'my_table_b0'
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-09-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-08-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-07-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-05-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-01-01
      Projection 'my_table_b1'
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-09-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-03-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-01-01
    
     Partition keys on node v_test_db_node0003
      Projection 'my_table_b0'
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-10-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-07-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-06-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-05-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-04-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-02-01
      Projection 'my_table_b1'
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-09-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-08-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-07-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-05-01
       Storage [ROS container]
         No of partition keys: 1
         Partition keys: 2017-01-01
    
    (1 row)
    
  • Thank you for testing.
    As I can see there are different partition keys per different ROS container. Which I think is fine (merged by month). In my case there are 4 containers with the same keys which is confusing.
    Every time I load data to the table it creates 4 new containers with the same keys.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    @Sergey_T - I did a larger test of inserting a lot of 2007 data.

    dbadmin=> SELECT COUNT(*) total_row_count, MIN(my_timestamptz_column) date_min, MAX(my_timestamptz_column) date_max, COUNT(DISTINCT my_timestamptz_column) distinct_dates FROM my_table;
     total_row_count |        date_min        |        date_max        | distinct_dates
    -----------------+------------------------+------------------------+----------------
             5067264 | 2007-01-01 00:00:00-05 | 2007-11-27 00:00:00-05 |            297
    (1 row)
    
  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited July 2018
    dbadmin=> SELECT DUMP_TABLE_PARTITION_KEYS('public.my_table');
    ------------------------------------------------------------------------
     Partition keys on node v_test_db_node0001
      Projection 'my_table_b0'
       Storage [ROS container]
         No of partition keys: 297
         Partition keys: 2007-01-01, 2007-01-02, 2007-01-03, 2007-01-04, ....
      Projection 'my_table_b1'
       Storage [ROS container]
         No of partition keys: 297
         Partition keys: 2007-01-01, 2007-01-02, 2007-01-03, 2007-01-04, ....
    
     Partition keys on node v_test_db_node0002
      Projection 'my_table_b0'
       Storage [ROS container]
         No of partition keys: 297
         Partition keys: 2007-01-01, 2007-01-02, 2007-01-03, 2007-01-04, ....
      Projection 'my_table_b1'
       Storage [ROS container]
         No of partition keys: 297
         Partition keys: 2007-01-01, 2007-01-02, 2007-01-03, 2007-01-04, ....
    
     Partition keys on node v_test_db_node0003
      Projection 'my_table_b0'
       Storage [ROS container]
         No of partition keys: 297
         Partition keys: 2007-01-01, 2007-01-02, 2007-01-03, 2007-01-04, ....
      Projection 'my_table_b1'
       Storage [ROS container]
         No of partition keys: 297
         Partition keys: 2007-01-01, 2007-01-02, 2007-01-03, 2007-01-04, ....
    
    (1 row)
    
    dbadmin=> SELECT COUNT (DISTINCT ros_id) NumROS, node_name
    dbadmin->   FROM PARTITIONS
    dbadmin->  WHERE projection_name ILIKE '%my_table%'
    dbadmin->  GROUP BY 2
    dbadmin->  ORDER BY 2;
     NumROS |     node_name
    --------+--------------------
          2 | v_test_db_node0001
          2 | v_test_db_node0002
          2 | v_test_db_node0003
    (3 rows)
    
    dbadmin=> SELECT COUNT (DISTINCT ros_id) NumROS, node_name, projection_name
    dbadmin->   FROM PARTITIONS
    dbadmin->  WHERE projection_name ILIKE '%my_table%'
    dbadmin->  GROUP BY 2, 3
    dbadmin->  ORDER BY 2, 3;
     NumROS |     node_name      | projection_name
    --------+--------------------+-----------------
          1 | v_test_db_node0001 | my_table_b0
          1 | v_test_db_node0001 | my_table_b1
          1 | v_test_db_node0002 | my_table_b0
          1 | v_test_db_node0002 | my_table_b1
          1 | v_test_db_node0003 | my_table_b0
          1 | v_test_db_node0003 | my_table_b1
    (6 rows)
    
  • Thank you Jim_Knicely,
    I was looking to find something like you got, what is supposed to be according to the documentation. Do you have any idea why Vertica can create more containers as happened in my case?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file