The Vertica Forum is getting a makeover! The site will be undergoing maintenance from Tuesday 8/13 to Friday 8/16. We appreciate your patience and cooperation during this time.

Loading parquet files with complex data type

Hello,

Is it possible to load a parquet file with the schema, that has complex data type like structs or lists?
I am trying to load a file with the shema below, however vertica gives me "complex types are not supported" message, even though per documentation structs are supported

Id: BYTE_ARRAY UTF8
capturedTimestamp: INT64
location.latitude: DOUBLE
location.longitude: DOUBLE
location.postalCode: BYTE_ARRAY UTF8
event.metadata.triptype: BYTE_ARRAY UTF8

Thanks for any help/suggestions.

Comments

  • SruthiASruthiA Employee

    What is your external table definition? What is the data type you are specifying in Vertica for BYTE_ARRAY type in the schema?

  • LenoyJLenoyJ Employee
    edited August 11

    Which version of Vertica are you on? Structs were introduced in 9.2.

    I'm certain that the documentation says that structs are supported in the context of External Tables. But you can attempt to load the data with structs using a COPY.

    Use this guide as reference: https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/Tables/ExternalTables/Structs.htm

    So a table in Hive with nested structs like the following:

    create external table atomix
    (
    Id int,
    capturedTimestamp timestamp,
    location struct<
        latitude:double,
        longitude:double,
        postalCode:int >,
    event struct<
        metadata:struct<
            triptype:string,
            tripcode:int> >
    )
    stored as parquet location '/user/data/atomix';
    

    Can be written as an External Table in Vertica as:

    create external table atomix
    (
        Id int ,
        capturedTimestamp timestamptz ,
        "location.latitude" float,
        "location.longitude" float,
        "location.postalCode" int,
        "event.metadata.triptype" varchar,
        "event.metadata.tripcode" int
    )
    as copy from '/home/dbadmin/atomix.parquet' PARQUET;
    

    Select:

    dbadmin=> select * from atomix ;
     Id |   capturedTimestamp    | location.latitude | location.longitude | location.postalCode | event.metadata.triptype | event.metadata.tripcode
    ----+------------------------+-------------------+--------------------+---------------------+-------------------------+-------------------------
      1 | 2019-08-08 03:00:00-04 |             12.12 |              13.14 |              123456 | funtrip                 |                     321
    (1 row)
    

    You can create a regular table like below and even do a COPY.

    CREATE TABLE atomix_regular
    (
        Id int ,
        capturedTimestamp timestamptz ,
        "location.latitude" float,
        "location.longitude" float,
        "location.postalCode" int,
        "event.metadata.triptype" varchar,
        "event.metadata.tripcode" int
    );
    

    -

    dbadmin=> COPY atomix_regular FROM '/home/dbadmin/atomix.parquet' PARQUET;
     Rows Loaded
    -------------
               1
    (1 row)
    

    Attached atomix.parquet that I used.

  • Ah, that make sense, thank you. Using 9.1.0-1. OK, going to upgrade to 9.2 and will give it a try, thank you!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.