Parquet Export and Import when Columns are in Different Order
My source and destination tables have the same columns but with different order.
I know it is possible to specify column names in specific order when exporting the data.
Is it also possible to specify columns in specific order when importing data?
Tagged:
0
Best Answers
-
avi120
✭
Thanks!
I saw specifying the column in copy command "COPY some_parq_data(c3, c2)" works. However, I found out that I'm getting files from different sources and each has a different column order. Is there something that can be put inside the file to indicate the order of columns such as a first row that contains column names in the right order0 -
mosheg
Vertica Employee Administrator
Try this:
cat data.csv f2,f1,f3 2,1,3 5,4,6 8,7,9 CREATE TABLE t1( f1 INT, f2 INT, f3 INT); COPY t1 FROM '/YourFilePath/data.csv' parser FDELIMITEDPARSER() DELIMITER ',' ABORT ON ERROR; SELECT * FROM t1 ORDER BY 1; f1 | f2 | f3 ----+----+---- 1 | 2 | 3 4 | 5 | 6 7 | 8 | 9 (3 rows)
0
Answers
@avi120 : Yes you can export specific columns. However when importing you need to import all those columns
https://www.vertica.com/docs/11.0.x/HTML/Content/Authoring/SQLReferenceManual/Statements/EXPORTTOPARQUET.htm
https://www.vertica.com/docs/11.0.x/HTML/Content/Authoring/SQLReferenceManual/Statements/COPY/PARQUET.htm
Check out the feature Loose Schema Matching introduced in Vertica 10.1.1 !
Example:
dbadmin=> CREATE TABLE some_parq_data(c1 INT, c2 VARCHAR, c3 INT); CREATE TABLE dbadmin=> INSERT INTO some_parq_data SELECT 1, 'A', 2; OUTPUT -------- 1 (1 row) dbadmin=> EXPORT TO PARQUET (directory='/home/dbadmin/mparq') AS SELECT c3, c2 FROM some_parq_data; Rows Exported --------------- 1 (1 row) dbadmin=> COPY some_parq_data(c3, c2) FROM '/home/dbadmin/mparq/*' PARQUET(do_soft_schema_match_by_name='TRUE'); Rows Loaded ------------- 1 (1 row) dbadmin=> SELECT * FROM some_parq_data; c1 | c2 | c3 ----+----+---- 1 | A | 2 | A | 2 (2 rows)@avi120 - Here is a better example of using do_soft_schema_match_by_name:
dbadmin=> SELECT * FROM test1; c1 | c2 | c3 ----+----+---- 1 | 2 | 3 3 | 4 | 5 (2 rows) dbadmin=> SELECT * FROM test2; c3 | c2 | c1 ----+----+---- 10 | 11 | 12 (1 row) dbadmin=> EXPORT TO PARQUET(directory='/home/dbadmin/test1') AS SELECT * FROM test1; Rows Exported --------------- 2 (1 row) dbadmin=> EXPORT TO PARQUET(directory='/home/dbadmin/test2') AS SELECT * FROM test2; Rows Exported --------------- 1 (1 row) dbadmin=> CREATE TABLE test3 (c1 INT, c2 INT, c3 INT); CREATE TABLE dbadmin=> COPY test3 FROM '/home/dbadmin/test1/*' PARQUET(do_soft_schema_match_by_name='TRUE'); Rows Loaded ------------- 2 (1 row) dbadmin=> COPY test3 FROM '/home/dbadmin/test2/*' PARQUET(do_soft_schema_match_by_name='TRUE'); Rows Loaded ------------- 1 (1 row) dbadmin=> SELECT * FROM test3; c1 | c2 | c3 ----+----+---- 1 | 2 | 3 3 | 4 | 5 12 | 11 | 10 (3 rows)Thank you @mosheg and @Jim_Knicely.
Since I'm using version 9.2 I ended up using FDELIMITEDPARSER parser. Once we upgrade to version 10 we will experiment with do_soft_schema_match_by_name