Processing Arrays with FjsonParser/KafkaJsonParser
Hi all,
Thank you all in advance just for reading this and any help, idea or recommendation will be highly appreciated.
Im trying to ingest data using vertica copy commands with KafkaSource. (https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/KafkaIntegrationGuide/KafkaFunctions/KafkaSource.htm?zoom_highlight=kafkasource)
My question is what is the most optimal way of processing arrays within kafka message. I am mostly interested to save data into regular table (not flex). I've tried to set VARCHAR data type but Im mostly recieving weird characters 'SOH' (Start of header ?), 'NUL' around real values. Then I got recommendation to use directly ARRAY[VARCHAR(x)] data type, but once data are saved, I can see lots of unicode codes within values.
My question is, whether there is a way how to save data properly, and possibly use EXPLODE function to multiply array elements to rows and process data further ?
Im not interested in flex tables and option to 'flatten_arrays=true' since it is now feasible from the point of design.
Thanks,
Richard
Best Answer
-
SergeB - Select Field - Employee
What version of Vertica are you using? I just tested this with v10.1.1-0
/opt/vertica/packages/kafka/bin/kafkacat -b kafkahost:9092 -C -t json_array
{"type":"survey","event_id":"12345678-1234-1234-1234-12345678901","some_key":["some_value"]}
% Reached end of topic json_array [1] at offset 0
% Reached end of topic json_array [2] at offset 1
% Reached end of topic json_array [0] at offset 0create table regular_table(event_id varchar(100), some_key ARRAY[varchar(500)]);
create flex table flex_table(event_id varchar(100), some_key ARRAY[varchar(500)]);
copy regular_table(event_id, some_key) source kafkasource (stream='json_array|2|-2',brokers='kafkahost:9092',stop_on_eof=true) PARSER KafkaJsonParser();
copy flex_table(event_id, some_key,raw) source kafkasource (stream='json_array|2|-2',brokers='kafkahost:9092',stop_on_eof=true) PARSER KafkaJsonParser();select * from regular_table;
some_key | event_id
----------------+-------------------------------------
["some_value"] | 12345678-1234-1234-1234-12345678901select some_key,event_id from flex_table;
some_key | event_id
----------------+-------------------------------------
["some_value"] | 12345678-1234-1234-1234-123456789011
Answers
Could you open a support ticket? And then upload some sample records ?
Sure I can post some of my work, imagine following message from kafka topic:
{"type":"survey","event_id":"12345678-1234-1234-1234-12345678901","some_key":["some_value"]}
See attached 2 text files. In the first one 'regular_table' I was trying to directly save an array as VARCHAR, but unfortunately some additional characters appeared.
Regarding flex_with_array, this is almost my target (even if it is flex) , array data type looks almost as I expect it, only some unicode characters appeared exacly as it can be found in result.
Data have been copied out from DB tool and anonymised.
Support ticket is already open, I just wanted to give a try this forum, whether there is more experience than I have
Thanks
I am using version 10.0.1, where the behaviour is really different
Thank you Serge, seems upgrade of version of our Vertica should do the job, will post there an update with our result.