COPY data into 2 Tables
Hi,
I am loading 1 CSV file having the below format to 2 different tables.
(This is because of different data retention policy)
CSV File Format: (date, ip, port, data)
2017-01-02 20:11:43 10.11.12.13 1234 3bc548852b806d39c88c7b4a6b
2017-01-02 20:11:44 10.11.11.15 1234 93997f86ddb76fbd423e812dca0ea1
2017-01-02 20:11:45 10.11.12.17 1234 35b73412c2ae80c65723bdd67248558c34b80
Tables structures:
Table_A:
id,date,ip,port,hash(data)
Table_B
id2,date,data
The question is how to copy CSV file into Table_A without store 'data' field, i need only hash(data).
Scenario:
'data' field has up to 65K length, so it use more than 90% of disk space. I want to store it in different tables to keep 'data' for few days and delete expired data by daily partition (Table_B).
so when i need 'data' i'll join Table_A and Table_B on 'date' and hash(data)
Comments
Hi,
Check out the FILLER option of the COPY command.
See:
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/BulkLoadCOPY/IgnoringColumnsAndFieldsInTheLoadFile.htm
Example:
In this example I added an ID to the CSV file...
dbadmin=> CREATE TABLE table_a(id INT, d date, ip VARCHAR(100), port INT, data_hash INT); CREATE TABLE dbadmin=> CREATE TABLE table_b(id INT, d date, data VARCHAR(1000)); CREATE TABLE dbadmin=> \! cat /home/dbadmin/file.csv 1|2017-01-02 20:11:43|10.11.12.13|1234|3bc548852b806d39c88c7b4a6b 2|2017-01-02 20:11:44|10.11.11.15|1234|93997f86ddb76fbd423e812dca0ea1 3|2017-01-02 20:11:45|10.11.12.17|1234|35b73412c2ae80c65723bdd67248558c34b80 dbadmin=> COPY table_a (id, d, ip, port, data_hash_f FILLER VARCHAR, data_hash AS HASH(data_hash_f)) FROM '/home/dbadmin/file.csv' DIRECT; Rows Loaded ------------- 3 (1 row) ^ dbadmin=> COPY table_b (id, d, ip_f FILLER VARCHAR(100), port_f FILLER INT, data) FROM '/home/dbadmin/file.csv' DIRECT; Rows Loaded ------------- 3 (1 row) dbadmin=> SELECT * FROM table_a; id | d | ip | port | data_hash ----+------------+-------------+------+--------------------- 1 | 2017-01-02 | 10.11.12.13 | 1234 | 1478321945145517140 2 | 2017-01-02 | 10.11.11.15 | 1234 | 3787372996188421577 3 | 2017-01-02 | 10.11.12.17 | 1234 | 1171497616496332802 (3 rows) dbadmin=> SELECT * FROM table_b; id | d | data ----+------------+--------------------------------------- 1 | 2017-01-02 | 3bc548852b806d39c88c7b4a6b 2 | 2017-01-02 | 93997f86ddb76fbd423e812dca0ea1 3 | 2017-01-02 | 35b73412c2ae80c65723bdd67248558c34b80 (3 rows)