We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Partitioning problem + question — Vertica Forum

Partitioning problem + question

kfruchtmankfruchtman Vertica Customer

Hi Experts!
I have been trying to partition a billion rows table according to time by CALENDAR_HIERARCHY_DAY algorithm.
when I run the statement:
ALTER TABLE public.spins
PARTITION BY time::DATE
GROUP BY CALENDAR_HIERARCHY_DAY(time::DATE, 2, 1) REORGANIZE;

I get the error:
ROLLBACK 2648: Column time in PARTITION BY expression is not allowed, since it is not present in some projections
HINT: Projection "spins_pk" does not contain column "time"

When I try to drop the projection (I don't really need it..) I get:

WARNING 7000: The projection 'spins_pk' was used to enforce the enabled key constraint 'pk', and may be regenerated to validate a DML statement on the base table

how can I workaround that issue?should I add the column time to the pk projection?

And a general partitioning question
If I want my partitions to be as follows: 24 hours - by hours, 7 days , 1 month, 3 month, 1 year..
what is the group by clause needed for the
GROUP BY CALENDAR_HIERARCHY algorithm.

Many thanks!!
Keren

Answers

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited April 2021

    This smells llike a bug... But, if you disable the PK you should be able to add the PARTITIONing...

    Example:

    verticademos=> CREATE TABLE spins(c INT NOT NULL, time TIMESTAMP) ORDER BY time;
    CREATE TABLE
    
    verticademos=> ALTER TABLE spins ADD CONSTRAINT spins_pk PRIMARY KEY (c) ENABLED;
    ALTER TABLE
    
    verticademos=> INSERT INTO spins SELECT 1, SYSDATE;
     OUTPUT
    --------
          1
    (1 row)
    
    verticademos=> ALTER TABLE spins PARTITION BY time::DATE GROUP BY CALENDAR_HIERARCHY_DAY(time::DATE, 2, 1) REORGANIZE;
    ROLLBACK 2648:  Column time in PARTITION BY expression is not allowed, since it is not present in some projections
    HINT:  Projection "spins_pk" does not contain column "time"
    
    verticademos=> DROP PROJECTION public.spins_pk;
    WARNING 7000:  The projection 'spins_pk' was used to enforce the enabled key constraint 'spins_pk', and may be regenerated to validate a DML statement on the base table
    DROP PROJECTION
    
    verticademos=> ALTER TABLE spins ALTER CONSTRAINT spins_pk DISABLED;
    ALTER TABLE
    
    verticademos=> ALTER TABLE spins PARTITION BY time::DATE GROUP BY CALENDAR_HIERARCHY_DAY(time::DATE, 2, 1) REORGANIZE;
    NOTICE 8364:  The new partitioning scheme will produce partitions in 1 physical storage containers per projection
    WARNING 9249:  Using PARTITION expression that may result in NULL values
    HINT:  DML statements which produce NULL values for partition expressions will roll back
    NOTICE 4785:  Started background repartition table task
    ALTER TABLE
    
    verticademos=> ALTER TABLE spins ALTER CONSTRAINT spins_pk ENABLED;
    ALTER TABLE
    
    
  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    And for your general partitioning question... You can get pretty creative with the PARTITION GROUP BY clause!

    Example:

    verticademos=> SELECT c FROM t;
                 c
    ----------------------------
     2021-08-14 00:00:00
     2023-04-14 00:00:00
     2024-04-15 00:00:00
     2021-06-14 00:00:00
     2021-04-29 02:46:58.528717
     2021-04-17 02:46:58.516394
     2021-04-14 02:46:58.470838
     2021-04-14 04:46:58.488554
     2021-04-14 04:46:58.500997
     2021-04-15 02:46:58.510026
    (10 rows)
    
    verticademos=> SELECT c,
    verticademos->        CASE
    verticademos->          WHEN DATEDIFF('HOUR', NOW(), c) <= 24
    verticademos->            THEN 1
    verticademos->          WHEN DATEDIFF('DAY', NOW(), c) <= 7
    verticademos->            THEN 2
    verticademos->          WHEN DATEDIFF('MONTH', NOW(), c) <= 1
    verticademos->            THEN 3
    verticademos->          WHEN DATEDIFF('MONTH', NOW(), c) <= 3
    verticademos->            THEN 4
    verticademos->          ELSE 5
    verticademos->        END
    verticademos-> FROM t;
                 c              | case
    ----------------------------+------
     2021-08-14 00:00:00        |    5
     2023-04-14 00:00:00        |    5
     2024-04-15 00:00:00        |    5
     2021-06-14 00:00:00        |    4
     2021-04-29 02:46:58.528717 |    3
     2021-04-17 02:46:58.516394 |    2
     2021-04-14 02:46:58.470838 |    1
     2021-04-14 04:46:58.488554 |    1
     2021-04-14 04:46:58.500997 |    1
     2021-04-15 02:46:58.510026 |    1
    (10 rows)
    
    verticademos=> ALTER TABLE t PARTITION BY (c)
    verticademos->   GROUP BY (
    verticademos(>     CASE WHEN DATEDIFF('HOUR', NOW(), c) <= 24
    verticademos(>            THEN 1
    verticademos(>          WHEN DATEDIFF('DAY', NOW(), c) <= 7
    verticademos(>            THEN 2
    verticademos(>          WHEN DATEDIFF('MONTH', NOW(), c) <= 1
    verticademos(>            THEN 3
    verticademos(>          WHEN DATEDIFF('MONTH', NOW(), c) <= 3
    verticademos(>            THEN 4
    verticademos(>          ELSE 5
    verticademos(>      END) REORGANIZE;
    WARNING 6100:  Using PARTITION expression that returns a Timestamp value
    HINT:  This PARTITION expression may cause too many data partitions.  Use of an expression that returns a more accurate value, such as a regular VARCHAR or INT, is encouraged
    NOTICE 4785:  Started background repartition table task
    ALTER TABLE
    
  • kfruchtmankfruchtman Vertica Customer

    Trying that now.Thank you so much!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file