Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Parquet Export and Import when Columns are in Different Order

My source and destination tables have the same columns but with different order.

I know it is possible to specify column names in specific order when exporting the data.

Is it also possible to specify columns in specific order when importing data?

Tagged:

Best Answers

  • avi120avi120
    edited November 7 Accepted Answer

    Thanks!
    I saw specifying the column in copy command "COPY some_parq_data(c3, c2)" works. However, I found out that I'm getting files from different sources and each has a different column order. Is there something that can be put inside the file to indicate the order of columns such as a first row that contains column names in the right order

  • moshegmosheg Employee
    Accepted Answer

    Try this:

    cat data.csv
    f2,f1,f3
    2,1,3
    5,4,6
    8,7,9
    
    CREATE TABLE t1( f1 INT, f2 INT, f3 INT);
    
    COPY t1 FROM '/YourFilePath/data.csv' parser FDELIMITEDPARSER() DELIMITER ',' ABORT ON ERROR;
    
    SELECT * FROM t1 ORDER BY 1;
     f1 | f2 | f3
    ----+----+----
      1 |  2 |  3
      4 |  5 |  6
      7 |  8 |  9
    (3 rows)
    

Answers

  • Jim_KnicelyJim_Knicely Administrator

    Check out the feature Loose Schema Matching introduced in Vertica 10.1.1 !

    Example:

    dbadmin=> CREATE TABLE some_parq_data(c1 INT, c2 VARCHAR, c3 INT);
    CREATE TABLE
    
    dbadmin=> INSERT INTO some_parq_data SELECT 1, 'A', 2;
     OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> EXPORT TO PARQUET (directory='/home/dbadmin/mparq') AS SELECT c3, c2 FROM some_parq_data;
     Rows Exported
    ---------------
                 1
    (1 row)
    
    dbadmin=> COPY some_parq_data(c3, c2) FROM '/home/dbadmin/mparq/*' PARQUET(do_soft_schema_match_by_name='TRUE');
     Rows Loaded
    -------------
               1
    (1 row)
    
    dbadmin=> SELECT * FROM some_parq_data;
     c1 | c2 | c3
    ----+----+----
      1 | A  |  2
        | A  |  2
    (2 rows)
    
  • Jim_KnicelyJim_Knicely Administrator
    edited November 8

    @avi120 - Here is a better example of using do_soft_schema_match_by_name:

    dbadmin=> SELECT * FROM test1;
     c1 | c2 | c3
    ----+----+----
      1 |  2 |  3
      3 |  4 |  5
    (2 rows)
    
    dbadmin=> SELECT * FROM test2;
     c3 | c2 | c1
    ----+----+----
     10 | 11 | 12
    (1 row)
    
    dbadmin=> EXPORT TO PARQUET(directory='/home/dbadmin/test1') AS SELECT * FROM test1;
     Rows Exported
    ---------------
                 2
    (1 row)
    
    dbadmin=> EXPORT TO PARQUET(directory='/home/dbadmin/test2') AS SELECT * FROM test2;
     Rows Exported
    ---------------
                 1
    (1 row)
    
    dbadmin=> CREATE TABLE test3 (c1 INT, c2 INT, c3 INT);
    CREATE TABLE
    
    dbadmin=> COPY test3 FROM '/home/dbadmin/test1/*' PARQUET(do_soft_schema_match_by_name='TRUE');
     Rows Loaded
    -------------
               2
    (1 row)
    
    dbadmin=> COPY test3 FROM '/home/dbadmin/test2/*' PARQUET(do_soft_schema_match_by_name='TRUE');
     Rows Loaded
    -------------
               1
    (1 row)
    
    dbadmin=> SELECT * FROM test3;
     c1 | c2 | c3
    ----+----+----
      1 |  2 |  3
      3 |  4 |  5
     12 | 11 | 10
    (3 rows)
    
  • Thank you @mosheg and @Jim_Knicely.

    Since I'm using version 9.2 I ended up using FDELIMITEDPARSER parser. Once we upgrade to version 10 we will experiment with do_soft_schema_match_by_name

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.