Options

EXPORT TO PARQUET - Specify the Node Data Will be Written To

I have 3 node Vertica cluster. I would like to export data to local folder on specific node from the three node cluster.
How can I do it without creating a mapping on the other nodes?

Currently my command is:

EXPORT TO PARQUET (directory = '/data/export_vertica_tables/a/') AS SELECT b from imp.a;

Tagged:

Answers

  • Options
    Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited October 2021

    Try using an empty OVER() clause.

    EXPORT TO PARQUET (directory = '/data/export_vertica_tables/a/') OVER() AS SELECT b from imp.a;

    See: Partitioning and Sorting Data

    It says:

    If you are exporting data to a local file system, you might want to force a single node to write all of the files. To do so, use an empty OVER clause.

  • Options

    Thanks for the reply. To better understand, I would appreciate getting input on the following:

    1) When I used the empty OVER() I saw that the directory was also created on the other nodes but was empty. Why this empty directory is created?

    2) I ran the command on node1, but the data was exported to node2. Can I control to which node the data will be written?

    3) Is OVER() also the default if I omit the OVER all together.

  • Options
    Jim_KnicelyJim_Knicely - Select Field - Administrator

    Hi,

    Which version of Vertica are you running? Note that Vertica 11.0 is the first version to officially support exporting to the Linux File System.

    See: Export to Linux File System

    It says:

    When exporting data, you can now write to the node-local Linux file system. Previously, exports to Linux were only supported for an NFS mount.

    Anyway, there doesn't appear to be a way to just write to one node...

    Below are some quick tests using the DB view as described here Monitoring Exports to show the rows written on each node...

    dbadmin=> SELECT sysdate;
              sysdate
    ----------------------------
     2021-10-25 11:15:54.592678
    (1 row)
    
    dbadmin=> EXPORT TO PARQUET(directory='/home/dbadmin/big_empty_over100') OVER () AS SELECT * FROM big;
     Rows Exported
    ---------------
             65536
    (1 row)
    
    dbadmin=> SELECT node_name, SUM(rows) rows FROM parquet_export_events WHERE report_time > '10/25/2021 11:15:54' GROUP BY node_name ORDER BY node_name;
         node_name      | rows
    --------------------+-------
     v_test_db_node0001 | 21722
     v_test_db_node0002 | 21929
     v_test_db_node0003 | 21885
    (3 rows)
    
    dbadmin=> SELECT sysdate;
              sysdate
    ----------------------------
     2021-10-25 11:17:03.680264
    (1 row)
    
    dbadmin=> EXPORT TO PARQUET(directory='/home/dbadmin/big_no_over100') AS SELECT * FROM big;
     Rows Exported
    ---------------
             65536
    (1 row)
    
    dbadmin=> SELECT node_name, SUM(rows) rows FROM parquet_export_events WHERE report_time > '10/25/2021 11:17:03' GROUP BY node_name ORDER BY node_name;
         node_name      | rows
    --------------------+-------
     v_test_db_node0001 | 21722
     v_test_db_node0002 | 21929
     v_test_db_node0003 | 21885
    (3 rows)
    
    dbadmin=> SELECT sysdate;
              sysdate
    ----------------------------
     2021-10-25 11:17:50.409652
    (1 row)
    
    dbadmin=> EXPORT TO PARQUET(directory='/home/dbadmin/big_over100') OVER(PARTITION BY c1) AS SELECT * FROM big;
     Rows Exported
    ---------------
             65536
    (1 row)
    
    dbadmin=> SELECT node_name, SUM(rows) rows FROM parquet_export_events WHERE report_time > '10/25/2021 11:17:50' GROUP BY node_name ORDER BY node_name;
         node_name      | rows
    --------------------+-------
     v_test_db_node0001 | 19865
     v_test_db_node0002 | 15592
     v_test_db_node0003 | 18346
    (3 rows)
    
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file