Loading parquet files with complex data type

Hello,

Is it possible to load a parquet file with the schema, that has complex data type like structs or lists?
I am trying to load a file with the shema below, however vertica gives me "complex types are not supported" message, even though per documentation structs are supported

Id: BYTE_ARRAY UTF8
capturedTimestamp: INT64
location.latitude: DOUBLE
location.longitude: DOUBLE
location.postalCode: BYTE_ARRAY UTF8
event.metadata.triptype: BYTE_ARRAY UTF8

Thanks for any help/suggestions.

Comments

  • SruthiASruthiA Administrator

    What is your external table definition? What is the data type you are specifying in Vertica for BYTE_ARRAY type in the schema?

  • LenoyJLenoyJ - Select Field - Employee
    edited August 2019

    Which version of Vertica are you on? Structs were introduced in 9.2.

    I'm certain that the documentation says that structs are supported in the context of External Tables. But you can attempt to load the data with structs using a COPY.

    Use this guide as reference: https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/Tables/ExternalTables/Structs.htm

    So a table in Hive with nested structs like the following:

    create external table atomix
    (
    Id int,
    capturedTimestamp timestamp,
    location struct<
        latitude:double,
        longitude:double,
        postalCode:int >,
    event struct<
        metadata:struct<
            triptype:string,
            tripcode:int> >
    )
    stored as parquet location '/user/data/atomix';
    

    Can be written as an External Table in Vertica as:

    create external table atomix
    (
        Id int ,
        capturedTimestamp timestamptz ,
        "location.latitude" float,
        "location.longitude" float,
        "location.postalCode" int,
        "event.metadata.triptype" varchar,
        "event.metadata.tripcode" int
    )
    as copy from '/home/dbadmin/atomix.parquet' PARQUET;
    

    Select:

    dbadmin=> select * from atomix ;
     Id |   capturedTimestamp    | location.latitude | location.longitude | location.postalCode | event.metadata.triptype | event.metadata.tripcode
    ----+------------------------+-------------------+--------------------+---------------------+-------------------------+-------------------------
      1 | 2019-08-08 03:00:00-04 |             12.12 |              13.14 |              123456 | funtrip                 |                     321
    (1 row)
    

    You can create a regular table like below and even do a COPY.

    CREATE TABLE atomix_regular
    (
        Id int ,
        capturedTimestamp timestamptz ,
        "location.latitude" float,
        "location.longitude" float,
        "location.postalCode" int,
        "event.metadata.triptype" varchar,
        "event.metadata.tripcode" int
    );
    

    -

    dbadmin=> COPY atomix_regular FROM '/home/dbadmin/atomix.parquet' PARQUET;
     Rows Loaded
    -------------
               1
    (1 row)
    

    Attached atomix.parquet that I used.

  • Ah, that make sense, thank you. Using 9.1.0-1. OK, going to upgrade to 9.2 and will give it a try, thank you!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file