We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Parquet Export and Import when Columns are in Different Order — Vertica Forum

Parquet Export and Import when Columns are in Different Order

My source and destination tables have the same columns but with different order.

I know it is possible to specify column names in specific order when exporting the data.

Is it also possible to specify columns in specific order when importing data?

Tagged:

Best Answers

  • avi120avi120
    edited November 2021 Answer ✓

    Thanks!
    I saw specifying the column in copy command "COPY some_parq_data(c3, c2)" works. However, I found out that I'm getting files from different sources and each has a different column order. Is there something that can be put inside the file to indicate the order of columns such as a first row that contains column names in the right order

  • moshegmosheg Vertica Employee Administrator
    Answer ✓

    Try this:

    cat data.csv
    f2,f1,f3
    2,1,3
    5,4,6
    8,7,9
    
    CREATE TABLE t1( f1 INT, f2 INT, f3 INT);
    
    COPY t1 FROM '/YourFilePath/data.csv' parser FDELIMITEDPARSER() DELIMITER ',' ABORT ON ERROR;
    
    SELECT * FROM t1 ORDER BY 1;
     f1 | f2 | f3
    ----+----+----
      1 |  2 |  3
      4 |  5 |  6
      7 |  8 |  9
    (3 rows)
    

Answers

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Check out the feature Loose Schema Matching introduced in Vertica 10.1.1 !

    Example:

    dbadmin=> CREATE TABLE some_parq_data(c1 INT, c2 VARCHAR, c3 INT);
    CREATE TABLE
    
    dbadmin=> INSERT INTO some_parq_data SELECT 1, 'A', 2;
     OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> EXPORT TO PARQUET (directory='/home/dbadmin/mparq') AS SELECT c3, c2 FROM some_parq_data;
     Rows Exported
    ---------------
                 1
    (1 row)
    
    dbadmin=> COPY some_parq_data(c3, c2) FROM '/home/dbadmin/mparq/*' PARQUET(do_soft_schema_match_by_name='TRUE');
     Rows Loaded
    -------------
               1
    (1 row)
    
    dbadmin=> SELECT * FROM some_parq_data;
     c1 | c2 | c3
    ----+----+----
      1 | A  |  2
        | A  |  2
    (2 rows)
    
  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited November 2021

    @avi120 - Here is a better example of using do_soft_schema_match_by_name:

    dbadmin=> SELECT * FROM test1;
     c1 | c2 | c3
    ----+----+----
      1 |  2 |  3
      3 |  4 |  5
    (2 rows)
    
    dbadmin=> SELECT * FROM test2;
     c3 | c2 | c1
    ----+----+----
     10 | 11 | 12
    (1 row)
    
    dbadmin=> EXPORT TO PARQUET(directory='/home/dbadmin/test1') AS SELECT * FROM test1;
     Rows Exported
    ---------------
                 2
    (1 row)
    
    dbadmin=> EXPORT TO PARQUET(directory='/home/dbadmin/test2') AS SELECT * FROM test2;
     Rows Exported
    ---------------
                 1
    (1 row)
    
    dbadmin=> CREATE TABLE test3 (c1 INT, c2 INT, c3 INT);
    CREATE TABLE
    
    dbadmin=> COPY test3 FROM '/home/dbadmin/test1/*' PARQUET(do_soft_schema_match_by_name='TRUE');
     Rows Loaded
    -------------
               2
    (1 row)
    
    dbadmin=> COPY test3 FROM '/home/dbadmin/test2/*' PARQUET(do_soft_schema_match_by_name='TRUE');
     Rows Loaded
    -------------
               1
    (1 row)
    
    dbadmin=> SELECT * FROM test3;
     c1 | c2 | c3
    ----+----+----
      1 |  2 |  3
      3 |  4 |  5
     12 | 11 | 10
    (3 rows)
    
  • Thank you @mosheg and @Jim_Knicely.

    Since I'm using version 9.2 I ended up using FDELIMITEDPARSER parser. Once we upgrade to version 10 we will experiment with do_soft_schema_match_by_name

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file