Processing Arrays with FjsonParser/KafkaJsonParser

rytchrytch Vertica Customer

Hi all,
Thank you all in advance just for reading this and any help, idea or recommendation will be highly appreciated.
Im trying to ingest data using vertica copy commands with KafkaSource. (https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/KafkaIntegrationGuide/KafkaFunctions/KafkaSource.htm?zoom_highlight=kafkasource)
My question is what is the most optimal way of processing arrays within kafka message. I am mostly interested to save data into regular table (not flex). I've tried to set VARCHAR data type but Im mostly recieving weird characters 'SOH' (Start of header ?), 'NUL' around real values. Then I got recommendation to use directly ARRAY[VARCHAR(x)] data type, but once data are saved, I can see lots of unicode codes within values.
My question is, whether there is a way how to save data properly, and possibly use EXPLODE function to multiply array elements to rows and process data further ?
Im not interested in flex tables and option to 'flatten_arrays=true' since it is now feasible from the point of design.
Thanks,
Richard

Best Answer

  • SergeBSergeB - Select Field - Employee
    Answer ✓

    What version of Vertica are you using? I just tested this with v10.1.1-0

    /opt/vertica/packages/kafka/bin/kafkacat -b kafkahost:9092 -C -t json_array

    {"type":"survey","event_id":"12345678-1234-1234-1234-12345678901","some_key":["some_value"]}
    % Reached end of topic json_array [1] at offset 0
    % Reached end of topic json_array [2] at offset 1
    % Reached end of topic json_array [0] at offset 0

    create table regular_table(event_id varchar(100), some_key ARRAY[varchar(500)]);
    create flex table flex_table(event_id varchar(100), some_key ARRAY[varchar(500)]);
    copy regular_table(event_id, some_key) source kafkasource (stream='json_array|2|-2',brokers='kafkahost:9092',stop_on_eof=true) PARSER KafkaJsonParser();
    copy flex_table(event_id, some_key,raw) source kafkasource (stream='json_array|2|-2',brokers='kafkahost:9092',stop_on_eof=true) PARSER KafkaJsonParser();

    select * from regular_table;
    some_key | event_id
    ----------------+-------------------------------------
    ["some_value"] | 12345678-1234-1234-1234-12345678901

    select some_key,event_id from flex_table;
    some_key | event_id
    ----------------+-------------------------------------
    ["some_value"] | 12345678-1234-1234-1234-12345678901

Answers

  • SergeBSergeB - Select Field - Employee

    Could you open a support ticket? And then upload some sample records ?

  • rytchrytch Vertica Customer

    Sure I can post some of my work, imagine following message from kafka topic:
    {"type":"survey","event_id":"12345678-1234-1234-1234-12345678901","some_key":["some_value"]}
    See attached 2 text files. In the first one 'regular_table' I was trying to directly save an array as VARCHAR, but unfortunately some additional characters appeared.
    Regarding flex_with_array, this is almost my target (even if it is flex) , array data type looks almost as I expect it, only some unicode characters appeared exacly as it can be found in result.
    Data have been copied out from DB tool and anonymised.
    Support ticket is already open, I just wanted to give a try this forum, whether there is more experience than I have :)
    Thanks

  • rytchrytch Vertica Customer

    I am using version 10.0.1, where the behaviour is really different

  • rytchrytch Vertica Customer

    Thank you Serge, seems upgrade of version of our Vertica should do the job, will post there an update with our result.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file