Options

Projection Deletes

Hello,

I need to understand how data from a projection gets deleted. Consider the below scenario with a table containing 3 columns a,b,c and there are two projections
a) Super Projection
b) User-Defined Projection with columns a and b.

If I delete from the table having the predicate as c, how will the data be removed from the user-defined projection since there is no c column ?

Ex : CREATE TABLE TEMP(A INT,B INT,C INT);

CREATE PROJECTION TEMP_SUPER
(
A,B,C ) AS SELECT A,B,C FROM TEMP;

CREATE PROJECTION TEMP_USER
(
A,B ) AS SELECT A,B FROM TEMP;

INSERT INTO TEMP(a,b,c) SELECT 1,2,3;
INSERT INTO TEMP(a,b,c) SELECT 4,5,6;

DELETE FROM TEMP WHERE C = 6;

How will data from User-defined projection TEMP_USER be removed ??

Comments

  • Options

       Data will be removed, since you  cannot specify what projection to use for delete nor what projection to use for  updates.

      Projection have the same data "consistency" as the super projection.

     

      Using the DELETE FROM TEMP WHERE C = 6; will remove all data from column A,B,C where c= 6.

     

     Run some tests and see the changes in size of the projections using the projection_storage table.

     

     Run do_tm_task('mergeout','prjection name') after deletes so you will get the real values.

  • Options

    Dear Adrian,

     

     I also had similar issue where i had about 10-12 projection of a table. Running a delete operation using predicate C column which is not present in projections other than super projection is very costly affair. It is taking ages to complete. Moreover for me there is no other option but to use the same column for delete.

     

    Do you have any better recommendation than making C column part of all projections? 

  • Options

     

    Hi,

    Ok , first off ! 

     Why would you have so many projections ?  Please review !!! Don`t have more than 2-3 "or at least this is my best practice".

     Find a way to monitor the use of thos projections. 

     

      Second , you need to understand that Vertica is "Read Once", read this here to get a better idea of what i am talking, so every container you touch with your delete will have to be "recreated" alot of IO and very slow.

     

    Also consider optimizing your projecitons for delete, Vertica uses the sort keys for delete as well.

     

    Sudgestion :

    - make the c column the partition key of your table and stop using update & delete , mke more use of drop,move ,switch partition.

     - his operations are limited to a single partition container and are more optimized and fast.

     

     - in most of the cases i prefer recreating the partition as select into a staging table eliminating the "deleted" values inside the select and when the partition is recreated i just swap the partition. 

     

    - there are many ways to do it. 

     

     

  • Options

    Hi Adrian,

     

    Yes , once i performed the delete on the table  I could see the a,b values of column c = 6 got removed from the User-Defined Projection.

     

    I wanted to understand more on how the mapping happens when the User-Def Projection doesn't have the column c but still removes the a,b values.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file