Options

s3EXPORT_PARTITION

relireli Vertica Customer

hiן all, can I do this in s3EXPORT_PARTITION, with a calculated value?

SELECT s3EXPORT_PARTITION(* USING PARAMETERS url='s3://db001/bystate.date')
   OVER (PARTITION by to_char(ts, 'YYYYMMDD')) from T;

when ts is a date-time value.

Tagged:

Answers

  • Options
    Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited January 2021

    Sure!

    Quick example:

    dbadmin=> \d T
                                      List of Fields by Tables
     Schema | Table | Column |   Type    | Size | Default | Not Null | Primary Key | Foreign Key
    --------+-------+--------+-----------+------+---------+----------+-------------+-------------
     public | T     | ts     | timestamp |    8 |         | f        | f           |
     public | T     | c1     | int       |    8 |         | f        | f           |
    (2 rows)
    
    dbadmin=> SELECT * FROM T;
                 ts             | c1
    ----------------------------+----
     2021-01-03 18:44:00.795916 |  1
     2021-01-04 18:44:05.374714 |  2
     2021-01-05 18:44:08.886589 |  3
    (3 rows)
    
    dbadmin=> SELECT s3EXPORT_PARTITION(* USING PARAMETERS url='s3://bucket_name/test/bystate.date') OVER (PARTITION by to_char(ts, 'YYYYMMDD')) from T;
     rows |                         url
    ------+------------------------------------------------------
        1 | https://bucket_name/test/=20210103/bystate.39b4312b.date
        1 | https://bucket_name/test/=20210104/bystate.9e02d3ac.date
        1 | https://bucket_name/test/=20210105/bystate.e7c2c4c4.date
    (3 rows)
    
    dbadmin=> CREATE EXTERNAL TABLE t_ext (partition_column INT, ts TIMESTAMP, c1 INT) AS COPY FROM 's3://bucket_name/test/*/*';
    CREATE TABLE
    
    dbadmin=> SELECT * FROM t_ext ORDER BY 1;
     partition_column |             ts             | c1
    ------------------+----------------------------+----
             20210103 | 2021-01-03 18:44:00.795916 |  1
             20210104 | 2021-01-04 18:44:05.374714 |  2
             20210105 | 2021-01-05 18:44:08.886589 |  3
    (3 rows)
    
  • Options
    relireli Vertica Customer

    I run this query
    SELECT s3EXPORT_PARTITION(* USING PARAMETERS url='s3://bucket_name/test/bystate.date') OVER (PARTITION by to_char(ts, 'YYYYMMDD')) from T;
    and I get this error..

    my vertica version 10.0.1-6.

  • Options
    relireli Vertica Customer

    @reli said:
    I run this query
    SELECT s3EXPORT_PARTITION(* USING PARAMETERS url='s3://bucket_name/test/bystate.date') OVER (PARTITION by to_char(ts, 'YYYYMMDD')) from T;
    and I get this error..

    my vertica version 10.0.1-6.

    it parse a part of the data but not all the data.
    and in s3 the folder:

    how to do it like this ? date=20201205

  • Options
    Jim_KnicelyJim_Knicely - Select Field - Administrator

    Not sure what you mean by "it parse a part of the data but not all the data"...

    Can you run this query without error?

    SELECT to_char(ts, 'YYYYMMDD')) from T;

    For your question "how to do it like this ? date=20201205", you can use a sub-query and name the derived column that you are using as the partition...

    Example:

    dbadmin=> SELECT s3EXPORT_PARTITION(* USING PARAMETERS url='s3://bucket_name/test1/bystate.date') OVER (PARTITION BY date) FROM (SELECT TO_CHAR(ts, 'YYYYMMDD') AS date, * FROM T) foo;
     rows |                            url
    ------+-----------------------------------------------------------
        1 | https://bucket_name/test1/date=20210103/bystate.8d999fa5.date
        1 | https://bucket_name/test1/date=20210104/bystate.60394485.date
        1 | https://bucket_name/test1/date=20210105/bystate.7deec715.date
    (3 rows)
    
  • Options
    relireli Vertica Customer

    I will try to explain better,
    I run this query:
    SELECT s3EXPORT_PARTITION(* USING PARAMETERS url='s3://bucket_name/test/bystate.date') OVER (PARTITION by to_char(ts, 'YYYYMMDD'))
    i get this error

    the distinct value from the source table is

    in s3 I get these folders only:

    and there are files in the folders, but not all the folders that need to be.

  • Options
    Jim_KnicelyJim_Knicely - Select Field - Administrator

    Can you try running the export command from vsql?

  • Options
    relireli Vertica Customer

    I export command from vsql what you send,

    SELECT s3EXPORT_PARTITION(* USING PARAMETERS url='s3://bucket_name/test1/bystate.date') OVER (PARTITION BY date) FROM (SELECT TO_CHAR(ts, 'YYYYMMDD') AS date, * FROM T) foo;
    
    

    this is the result

    but I get the same error, and not all the data in the folder: :neutral:

    ERROR 5861: Error calling processPartition() in User Function s3export_partition at [/data/qb_workspaces/jenkins2/ReleaseBuilds/Hammermill/REL-10_0_1-x_hammermill/build/udx/supported/AWS/S3.cpp:799], error code: 0, message: Unknown exception: No response body. Response code: 4294967295

  • Options
    Jim_KnicelyJim_Knicely - Select Field - Administrator

    Hi,

    Hmm. Let's do some trouble shooting. First, can you post the DDL for the table?

    SELECT export_objects('','T');

    Once I have the DDL I will install your version of Vertica to test. What is the exact version you are running?

    SELECT version();

    Next, can you try running these?

    SELECT s3EXPORT_PARTITION(* USING PARAMETERS url='s3://bucket_name/test1/bystate.date') OVER (PARTITION BY date) FROM (SELECT TO_CHAR(ts, 'YYYYMMDD') AS date, * FROM T WHERE TO_CHAR(ts, 'YYYYMMDD') = '20201209') foo;
    
    SELECT s3EXPORT_PARTITION(* USING PARAMETERS url='s3://bucket_name/test1/bystate.date') OVER (PARTITION BY date) FROM (SELECT TO_CHAR(ts, 'YYYYMMDD') AS date, * FROM T WHERE TO_CHAR(ts, 'YYYYMMDD') = '20201210') foo;
    
    SELECT s3EXPORT_PARTITION(* USING PARAMETERS url='s3://bucket_name/test1/bystate.date') OVER (PARTITION BY date) FROM (SELECT TO_CHAR(ts, 'YYYYMMDD') AS date, * FROM T WHERE TO_CHAR(ts, 'YYYYMMDD') = '20201211) foo;
    
  • Options
    relireli Vertica Customer
    1. Attach file -DDL table.
    2. Version: Vertica Analytic Database v10.0.1-6
    3. s3EXPORT_PARTITION:
      • 20201209 : fails
      • 20201210:fails
      • 20201211 : ok
        I have no explanation for why some succeed and some do not. :neutral:
  • Options
    Jim_KnicelyJim_Knicely - Select Field - Administrator

    The DDL in your attached file is not Vertica DDL :) For example, it's referenceing a data type "numerivalue".

  • Options
    relireli Vertica Customer
    edited January 2021

    I add the new script in DDL Table, it is ok now?
    Thank you so much for all the help. o:)

  • Options
    relireli Vertica Customer

    I change the parameter chunksize and it's work!!
    SELECT s3EXPORT_PARTITION(* USING PARAMETERS url=''s3://bucket_name/test1/bystate.date',chunksize=20485760) OVER (PARTITION BY date) FROM (SELECT TO_CHAR(ts, 'YYYYMMDD') AS date, * FROM T WHERE TO_CHAR(ts, 'YYYYMMDD') = '20201211') foo;
    Thank you @Jim_Knicely :smiley:

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file