Prevent specific column value-rows in a Vertica table
Hello, is there a way to prevent loading rows with specific column values into a Vertica table? For example I have a table that has data for multiple companies and if the column CompanyID=1 I want that data copied, if it is another value, it should not be.
I know it would be ideal to exclude CompanyID<>1 as part of the batch load, but we don't have that option. On relational databases I would create a trigger that would prevent the unwanted rows, how can it be done in Vertica? CompanyID is not the PK.
0
Answers
I can imagine two ways - both not too effective, but there you are.
This CSV file:
pumped through my d2l:
modified and run with vsql:
Load:
Well, finds the error. But rejects everything:
Next try, taking advantage of the fact that we can partition a table and drop its partitions very quickly:
Loading this table:
dbadmin@gessnerm-HP-ZBook-15-G3:~/data$ vsql -af ../sql/ltb.sql COPY foo FROM LOCAL '/home/dbadmin/data/foo.csv' SKIP 1 DELIMITER ',' EXCEPTIONS '/dev/tty' ; Rows Loaded 6 (1 row)Loads everything, as expected.
But now:
We don't want
orgto contain anything but 1. So we quickly get rid of all other partitions:dbadmin@gessnerm-HP-ZBook-15-G3:~/data$ vsql -c "SELECT DROP_PARTITIONS('foo',2,3)"; vsql -c "SELECT * FROM foo" DROP_PARTITIONS Partition dropped (1 row) id | org | nam ----+-----+---------------- 1 | 1 | arthur 1 | 1 | slartibartfast 1 | 1 | trillian 1 | 1 | zaphod (4 rows)Yes, we load too much. But then, our parser is most efficient and will determine the values for org quicker than other methods. And you get rid of the unwanted stuff very quickly ...
In addition to the second solution of @marcothesane, you can provide your users a view on top of the table with the filter org=1 then do the drop partition whenever you want,
In addition, if the "org" field has a wide range of possible values, consider the following example:
-- This script is responsible for loading data into the target table only if the "org" field input falls within the desired range. -- The desired "org" value in this exampe is 1. -- For optimal performance, Vertica recommends keeping the number of ungrouped partitions between 10 and 20, without exceeding 50. -- If the "org" field has a wide range of possible values, we will not partition the table by that field. -- To determine if a row falls within the desired field range during the load process, we introduce a column called 'is_loadable'. -- If the input falls within the range, the value of 'is_loadable' will be set to true; otherwise, it will be set to false. -- We use a CASE statement during the load to assign the value of 'is_loadable' based on the field range we want to load. -- Please adjust the conditions and values to match your desired range. -- Additional conditions can be added if necessary. -- After loading the data, we can drop the partition that have false values in the 'is_loadable' column. -- This operation will remove the rows that fall outside the desired field range. \set FILE_PATH `pwd`/foo.csv \set FILE_PATH '''':FILE_PATH'''' CREATE TABLE foo ( id SMALLINT NOT NULL, org SMALLINT NOT NULL, name VARCHAR(14) NOT NULL, is_loadable BOOLEAN NOT NULL ) PARTITION BY is_loadable; COPY foo ( id, org, name, is_loadable as case when org = 1 then TRUE else FALSE end ) FROM LOCAL :FILE_PATH SKIP 1 DELIMITER ',' ; SELECT DROP_PARTITIONS('foo',FALSE,FALSE); \echo 'Input file content:' \! cat foo.csv \echo '__________________________________' \echo 'Rows loaded, with the desired org field range:' SELECT * FROM foo ORDER BY id;Output:
Just because we all like playing, and @mosheg has introduced the derived value in a COPY statement, as an
ASclause in the column list:You can also formulate a load script to give
orgthe same "wrong" value if the input is not equal to 1 - thus avoiding an additional column that is just needed to separate what you want from what you don't want.The content of the input file
foo.csvis still:Create table:
Copy command:
Control query:
drop the one partition:
SELECT DROP_PARTITIONS('foo',-1,-1); -- out DROP_PARTITIONS -- out ------------------- -- out Partition dropped -- out (1 row) -- out -- out Time: First fetch (1 row): 214.895 ms. All rows formatted: 215.000 msFinal control query: