Parquet Export and Import when Columns are in Different Order

avi120 · November 2021

My source and destination tables have the same columns but with different order.

I know it is possible to specify column names in specific order when exporting the data.

Is it also possible to specify columns in specific order when importing data?

avi120 · November 2021

Thanks!
I saw specifying the column in copy command "COPY some_parq_data(c3, c2)" works. However, I found out that I'm getting files from different sources and each has a different column order. Is there something that can be put inside the file to indicate the order of columns such as a first row that contains column names in the right order

mosheg · November 2021

Try this:

cat data.csv
f2,f1,f3
2,1,3
5,4,6
8,7,9

CREATE TABLE t1( f1 INT, f2 INT, f3 INT);

COPY t1 FROM '/YourFilePath/data.csv' parser FDELIMITEDPARSER() DELIMITER ',' ABORT ON ERROR;

SELECT * FROM t1 ORDER BY 1;
 f1 | f2 | f3
----+----+----
  1 |  2 |  3
  4 |  5 |  6
  7 |  8 |  9
(3 rows)

SruthiA · November 2021

@avi120 : Yes you can export specific columns. However when importing you need to import all those columns

https://www.vertica.com/docs/11.0.x/HTML/Content/Authoring/SQLReferenceManual/Statements/EXPORTTOPARQUET.htm

https://www.vertica.com/docs/11.0.x/HTML/Content/Authoring/SQLReferenceManual/Statements/COPY/PARQUET.htm

Jim_Knicely · November 2021

Check out the feature Loose Schema Matching introduced in Vertica 10.1.1 !

Example:

dbadmin=> CREATE TABLE some_parq_data(c1 INT, c2 VARCHAR, c3 INT);
CREATE TABLE

dbadmin=> INSERT INTO some_parq_data SELECT 1, 'A', 2;
 OUTPUT
--------
      1
(1 row)

dbadmin=> EXPORT TO PARQUET (directory='/home/dbadmin/mparq') AS SELECT c3, c2 FROM some_parq_data;
 Rows Exported
---------------
             1
(1 row)

dbadmin=> COPY some_parq_data(c3, c2) FROM '/home/dbadmin/mparq/*' PARQUET(do_soft_schema_match_by_name='TRUE');
 Rows Loaded
-------------
           1
(1 row)

dbadmin=> SELECT * FROM some_parq_data;
 c1 | c2 | c3
----+----+----
  1 | A  |  2
    | A  |  2
(2 rows)

Jim_Knicely · November 2021

@avi120 - Here is a better example of using do_soft_schema_match_by_name:

dbadmin=> SELECT * FROM test1;
 c1 | c2 | c3
----+----+----
  1 |  2 |  3
  3 |  4 |  5
(2 rows)

dbadmin=> SELECT * FROM test2;
 c3 | c2 | c1
----+----+----
 10 | 11 | 12
(1 row)

dbadmin=> EXPORT TO PARQUET(directory='/home/dbadmin/test1') AS SELECT * FROM test1;
 Rows Exported
---------------
             2
(1 row)

dbadmin=> EXPORT TO PARQUET(directory='/home/dbadmin/test2') AS SELECT * FROM test2;
 Rows Exported
---------------
             1
(1 row)

dbadmin=> CREATE TABLE test3 (c1 INT, c2 INT, c3 INT);
CREATE TABLE

dbadmin=> COPY test3 FROM '/home/dbadmin/test1/*' PARQUET(do_soft_schema_match_by_name='TRUE');
 Rows Loaded
-------------
           2
(1 row)

dbadmin=> COPY test3 FROM '/home/dbadmin/test2/*' PARQUET(do_soft_schema_match_by_name='TRUE');
 Rows Loaded
-------------
           1
(1 row)

dbadmin=> SELECT * FROM test3;
 c1 | c2 | c3
----+----+----
  1 |  2 |  3
  3 |  4 |  5
 12 | 11 | 10
(3 rows)

avi120 · November 2021

Thank you @mosheg and @Jim_Knicely.

Since I'm using version 9.2 I ended up using FDELIMITEDPARSER parser. Once we upgrade to version 10 we will experiment with do_soft_schema_match_by_name

We're Moving!

Create My New Community Account Now

Parquet Export and Import when Columns are in Different Order

Best Answers

Answers

Leave a Comment