Export To S3 And External table

relireli Vertica Customer

I export data to s3 with portion :
SELECT s3EXPORT_PARTITION(* USING PARAMETERS url='s3://bucket_name/test1/bystate.date') OVER (PARTITION BY name,date) FROM (SELECT name,TO_CHAR(ts, 'YYYYMMDD') AS date, * FROM T ) foo;
Now I want to create the external table with partition (date & name) base on the data in s3 I export before.
I can do that?
what kind of file the export create?it looks like CSV with '|' delimiter,
in this case, file CSV can be a part of the external table with a partition on name and date?
Thank you all !!

Best Answer

  • Sudhakar_BSudhakar_B Vertica Customer ✭✭
    Answer ✓

    @reli
    Yes, you are correct. parquet format file can be partitioned and external tables defined in Vertica on parquet or ORC can use "hive_partition" in the external table definition. It will improve performance for queries on such tables.

    In your original post you were exporting using s3EXPORT_PARTITION to export to a ((CSV format** file on S3. Not a Parquet format file.
    For External tables in Vertica using CSV files (or other delimited TEXT files) you can just use the "wildcards (globs)". Partitioning is not required or possible.

    Of course performance of queries on external tables can be slow based on data volume and other factors.
    Best regards.

Answers

  • Sudhakar_BSudhakar_B Vertica Customer ✭✭

    @reli ,
    AFAIK, you can't and don't need to created partitioned external table in Vertica.
    You can create normal external table with copy. Your COPY definition statement can use wildcards (globs) to make parsing the stored COPY statement definition more dynamic.
    If you s3EXPORT_PARTITION created files in s3 bucket like below...

    SELECT s3EXPORT_PARTITION(* USING PARAMETERS url='s3://db001/bystate.date')
       OVER (PARTITION by st, yr) from T;
     rows   | url
    --------+------------------------------------------------------------------------------
     184647 | s3://db001/st=MA/yr=2005/bystate.77fcab9836b93a04.dat
     282633 | s3://db001/st=VA/yr=2007/bystate.77fcab9836b93a05.dat
     282633 | s3://db001/st=VA/yr=2009/bystate.77fcab9836b93a05.dat
    (3 rows)
    

    Then you can create an external table in Vertica like below, and Vertica parses the COPY statement, all stored data in the top-level directory is accessible to the query.

    CREATE EXTERNAL TABLE archive (...) AS COPY FROM 's3://db001/*'
    

    Also, by default Vertica exports data in PIPE delimiter, though you can specify alternate characters.
    https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/SQLReferenceManual/Functions/VerticaFunctions/S3EXPORT_PARTITION.htm
    https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/ExternalTables/CreatingExternalTables.htm
    Thanks

  • relireli Vertica Customer

    partition its good for performance in external table ,
    I do this :
    EXPORT TO PARQUET(directory = 's3://db001/bystate.date') OVER(PARTITION BY name, date ORDER BY ts) AS SELECT *,TO_CHAR(ts, 'YYYYMMDD') AS date FROM T order by ts
    external table:
    CREATE EXTERNAL TABLE tst.T_EXTERNAL ( --All the columns.. name varchar(20), date varchar(20) ) AS COPY FROM 's3://db001/bystate.date/*/*/*.parquet' PARQUET (hive_partition_cols='name,date');
    and it's what I need,tank for the help!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file