ROS per node per partition
Hello,
I am using Vertica 9.1 with KSAFE(1).
I have a large table that I want to partition by date (I am using hierarchical partitioning for timestamptz column):
PARTITION BY ((my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date)
GROUP BY (
CASE WHEN ("datediff"('year', (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date, ((now() AT TIME ZONE 'UTC')::timestamptz(6))::date) >= 2)
THEN (date_trunc('year', (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date))::date
WHEN ("datediff"('month', (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date, ((now() AT TIME ZONE 'UTC')::timestamptz(6))::date) >= 2)
THEN (date_trunc('month', (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date))::date
ELSE (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date END)
REORGANIZE;
For each partition and each projection (my_projection_b0 and my_projection_b1) Vertica creates 4 ROS conatiners per node. As I understand from documentation, it is supposed to be 1 ROS container per node per partition per projection (after tuple mover handles any inserted/updated data). I have the number which is 4 times higher then expected. I check it with the query:
SELECT COUNT (DISTINCT ros_id) NumROS, node_name FROM PARTITIONS
WHERE projection_name ilike '%my_projection%'
GROUP BY node_name
ORDER BY node_name;
In this case I get 8 ROS per partition per node (but I think this is because of KSAFE(1) which gives my_projection_b0 and my_projection_b1), which is fine. But why 4?
Thank you.
Answers
The number of ROS containers depends on your data. Your 2016 data and prior years will be in ROS containers per year. Your June and July data for 2018 will be in ROS containers per day and everything else in 2017 and 2018 will be in a ROS container per month. Things will continue to shift over time based on the CASE statement used.
A good way to view your data is with the dump_table_partition_keys metafunction. It will show you how each partition key has been placed within each ROS container.
Thank you DaveT,
I have loaded some test data which is a part of 2007 year, so it is supposed to be stored in one container per node, but I got 8 (4+4), for that. Below is a part of dump_table_partition_keys result (just for one node):
Partition keys on node v_test_node0001
Projection 'projection_test_01_b0'
Storage [ROS container]
No of partition keys: 122
Partition keys: 2007-08-01, 2007-08-02, 2007-08-03, 2007-08-04, ...
Storage [ROS container]
No of partition keys: 122
Partition keys: 2007-08-01, 2007-08-02, 2007-08-03, 2007-08-04, ...
Storage [ROS container]
No of partition keys: 122
Partition keys: 2007-08-01, 2007-08-02, 2007-08-03, 2007-08-04, ...
Storage [ROS container]
No of partition keys: 122
Partition keys: 2007-08-01, 2007-08-02, 2007-08-03, 2007-08-04, ...
Projection 'projection_test_01_b1'
Storage [ROS container]
No of partition keys: 122
Partition keys: 2007-08-01, 2007-08-02, 2007-08-03, 2007-08-04, ...
Storage [ROS container]
No of partition keys: 122
Partition keys: 2007-08-01, 2007-08-02, 2007-08-03, 2007-08-04, ...
Storage [ROS container]
No of partition keys: 122
Partition keys: 2007-08-01, 2007-08-02, 2007-08-03, 2007-08-04, ...
Storage [ROS container]
No of partition keys: 122
Partition keys: 2007-08-01, 2007-08-02, 2007-08-03, 2007-08-04, ...
In the quick example below, I am getting 1 ROS container for each Partition Key by node:
dbadmin=> CREATE TABLE my_table (pk INT, my_timestamptz_column TIMESTAMPTZ NOT NULL, CONSTRAINT my_table_pk PRIMARY KEY(pk)) dbadmin-> ORDER BY pk dbadmin-> SEGMENTED BY HASH(pk) ALL NODES; CREATE TABLE dbadmin=> ALTER TABLE my_table dbadmin-> PARTITION BY ((my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date) dbadmin-> GROUP BY ( dbadmin(> CASE WHEN ("datediff"('year', (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date, ((now() AT TIME ZONE 'UTC')::timestamptz(6))::date) >= 2) dbadmin(> THEN (date_trunc('year', (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date))::date dbadmin(> WHEN ("datediff"('month', (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date, ((now() AT TIME ZONE 'UTC')::timestamptz(6))::date) >= 2) dbadmin(> THEN (date_trunc('month', (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date))::date dbadmin(> ELSE (my_table.my_timestamptz_column AT TIME ZONE 'UTC')::date END) dbadmin-> REORGANIZE; NOTICE 4785: Started background repartition table task ALTER TABLE dbadmin=> INSERT INTO my_table SELECT row_number() over (), to_date(randomint(11)+1::varchar || '/01/2017', 'MM/DD/YYYY') FROM vs_columns LIMIT 25; OUTPUT -------- 25 (1 row) dbadmin=> COMMIT; COMMIT dbadmin=> SELECT do_tm_task('mergeout', 'my_table'); do_tm_task ------------------------------------------------------------------------------------------------------------------------------------- Task: mergeout (Table: public.my_table) (Projection: public.my_table_b0) (Table: public.my_table) (Projection: public.my_table_b1) (1 row)dbadmin=> SELECT DUMP_TABLE_PARTITION_KEYS('public.my_table'); DUMP_TABLE_PARTITION_KEYS ---------------------------------------------------------------------- Partition keys on node v_test_db_node0001 Projection 'my_table_b0' Storage [ROS container] No of partition keys: 1 Partition keys: 2017-09-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-03-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-01-01 Projection 'my_table_b1' Storage [ROS container] No of partition keys: 1 Partition keys: 2017-10-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-07-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-06-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-05-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-04-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-02-01 Partition keys on node v_test_db_node0002 Projection 'my_table_b0' Storage [ROS container] No of partition keys: 1 Partition keys: 2017-09-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-08-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-07-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-05-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-01-01 Projection 'my_table_b1' Storage [ROS container] No of partition keys: 1 Partition keys: 2017-09-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-03-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-01-01 Partition keys on node v_test_db_node0003 Projection 'my_table_b0' Storage [ROS container] No of partition keys: 1 Partition keys: 2017-10-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-07-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-06-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-05-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-04-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-02-01 Projection 'my_table_b1' Storage [ROS container] No of partition keys: 1 Partition keys: 2017-09-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-08-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-07-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-05-01 Storage [ROS container] No of partition keys: 1 Partition keys: 2017-01-01 (1 row)Thank you for testing.
As I can see there are different partition keys per different ROS container. Which I think is fine (merged by month). In my case there are 4 containers with the same keys which is confusing.
Every time I load data to the table it creates 4 new containers with the same keys.
@Sergey_T - I did a larger test of inserting a lot of 2007 data.
dbadmin=> SELECT COUNT(*) total_row_count, MIN(my_timestamptz_column) date_min, MAX(my_timestamptz_column) date_max, COUNT(DISTINCT my_timestamptz_column) distinct_dates FROM my_table; total_row_count | date_min | date_max | distinct_dates -----------------+------------------------+------------------------+---------------- 5067264 | 2007-01-01 00:00:00-05 | 2007-11-27 00:00:00-05 | 297 (1 row)dbadmin=> SELECT DUMP_TABLE_PARTITION_KEYS('public.my_table'); ------------------------------------------------------------------------ Partition keys on node v_test_db_node0001 Projection 'my_table_b0' Storage [ROS container] No of partition keys: 297 Partition keys: 2007-01-01, 2007-01-02, 2007-01-03, 2007-01-04, .... Projection 'my_table_b1' Storage [ROS container] No of partition keys: 297 Partition keys: 2007-01-01, 2007-01-02, 2007-01-03, 2007-01-04, .... Partition keys on node v_test_db_node0002 Projection 'my_table_b0' Storage [ROS container] No of partition keys: 297 Partition keys: 2007-01-01, 2007-01-02, 2007-01-03, 2007-01-04, .... Projection 'my_table_b1' Storage [ROS container] No of partition keys: 297 Partition keys: 2007-01-01, 2007-01-02, 2007-01-03, 2007-01-04, .... Partition keys on node v_test_db_node0003 Projection 'my_table_b0' Storage [ROS container] No of partition keys: 297 Partition keys: 2007-01-01, 2007-01-02, 2007-01-03, 2007-01-04, .... Projection 'my_table_b1' Storage [ROS container] No of partition keys: 297 Partition keys: 2007-01-01, 2007-01-02, 2007-01-03, 2007-01-04, .... (1 row) dbadmin=> SELECT COUNT (DISTINCT ros_id) NumROS, node_name dbadmin-> FROM PARTITIONS dbadmin-> WHERE projection_name ILIKE '%my_table%' dbadmin-> GROUP BY 2 dbadmin-> ORDER BY 2; NumROS | node_name --------+-------------------- 2 | v_test_db_node0001 2 | v_test_db_node0002 2 | v_test_db_node0003 (3 rows) dbadmin=> SELECT COUNT (DISTINCT ros_id) NumROS, node_name, projection_name dbadmin-> FROM PARTITIONS dbadmin-> WHERE projection_name ILIKE '%my_table%' dbadmin-> GROUP BY 2, 3 dbadmin-> ORDER BY 2, 3; NumROS | node_name | projection_name --------+--------------------+----------------- 1 | v_test_db_node0001 | my_table_b0 1 | v_test_db_node0001 | my_table_b1 1 | v_test_db_node0002 | my_table_b0 1 | v_test_db_node0002 | my_table_b1 1 | v_test_db_node0003 | my_table_b0 1 | v_test_db_node0003 | my_table_b1 (6 rows)Thank you Jim_Knicely,
I was looking to find something like you got, what is supposed to be according to the documentation. Do you have any idea why Vertica can create more containers as happened in my case?