Parquet Export and Import when Columns are in Different Order

My source and destination tables have the same columns but with different order.

I know it is possible to specify column names in specific order when exporting the data.

Is it also possible to specify columns in specific order when importing data?

Tagged:

Best Answers

  • avi120avi120
    edited November 2021 Answer ✓

    Thanks!
    I saw specifying the column in copy command "COPY some_parq_data(c3, c2)" works. However, I found out that I'm getting files from different sources and each has a different column order. Is there something that can be put inside the file to indicate the order of columns such as a first row that contains column names in the right order

  • moshegmosheg Vertica Employee Administrator
    Answer ✓

    Try this:

    cat data.csv
    f2,f1,f3
    2,1,3
    5,4,6
    8,7,9
    
    CREATE TABLE t1( f1 INT, f2 INT, f3 INT);
    
    COPY t1 FROM '/YourFilePath/data.csv' parser FDELIMITEDPARSER() DELIMITER ',' ABORT ON ERROR;
    
    SELECT * FROM t1 ORDER BY 1;
     f1 | f2 | f3
    ----+----+----
      1 |  2 |  3
      4 |  5 |  6
      7 |  8 |  9
    (3 rows)
    

Answers

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Check out the feature Loose Schema Matching introduced in Vertica 10.1.1 !

    Example:

    dbadmin=> CREATE TABLE some_parq_data(c1 INT, c2 VARCHAR, c3 INT);
    CREATE TABLE
    
    dbadmin=> INSERT INTO some_parq_data SELECT 1, 'A', 2;
     OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> EXPORT TO PARQUET (directory='/home/dbadmin/mparq') AS SELECT c3, c2 FROM some_parq_data;
     Rows Exported
    ---------------
                 1
    (1 row)
    
    dbadmin=> COPY some_parq_data(c3, c2) FROM '/home/dbadmin/mparq/*' PARQUET(do_soft_schema_match_by_name='TRUE');
     Rows Loaded
    -------------
               1
    (1 row)
    
    dbadmin=> SELECT * FROM some_parq_data;
     c1 | c2 | c3
    ----+----+----
      1 | A  |  2
        | A  |  2
    (2 rows)
    
  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited November 2021

    @avi120 - Here is a better example of using do_soft_schema_match_by_name:

    dbadmin=> SELECT * FROM test1;
     c1 | c2 | c3
    ----+----+----
      1 |  2 |  3
      3 |  4 |  5
    (2 rows)
    
    dbadmin=> SELECT * FROM test2;
     c3 | c2 | c1
    ----+----+----
     10 | 11 | 12
    (1 row)
    
    dbadmin=> EXPORT TO PARQUET(directory='/home/dbadmin/test1') AS SELECT * FROM test1;
     Rows Exported
    ---------------
                 2
    (1 row)
    
    dbadmin=> EXPORT TO PARQUET(directory='/home/dbadmin/test2') AS SELECT * FROM test2;
     Rows Exported
    ---------------
                 1
    (1 row)
    
    dbadmin=> CREATE TABLE test3 (c1 INT, c2 INT, c3 INT);
    CREATE TABLE
    
    dbadmin=> COPY test3 FROM '/home/dbadmin/test1/*' PARQUET(do_soft_schema_match_by_name='TRUE');
     Rows Loaded
    -------------
               2
    (1 row)
    
    dbadmin=> COPY test3 FROM '/home/dbadmin/test2/*' PARQUET(do_soft_schema_match_by_name='TRUE');
     Rows Loaded
    -------------
               1
    (1 row)
    
    dbadmin=> SELECT * FROM test3;
     c1 | c2 | c3
    ----+----+----
      1 |  2 |  3
      3 |  4 |  5
     12 | 11 | 10
    (3 rows)
    
  • Thank you @mosheg and @Jim_Knicely.

    Since I'm using version 9.2 I ended up using FDELIMITEDPARSER parser. Once we upgrade to version 10 we will experiment with do_soft_schema_match_by_name

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file