We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Delimied file to parse with Kafka scheduler — Vertica Forum

Delimied file to parse with Kafka scheduler

Hi Team,

I need to parse a delimited file with kafka scheduler into vertica but i am getting error.i am using the below command to create the load-spec for ddelimited file.

Sample data

ABC123,1234#
XYZ12,1212#
PPPPP,121#
RRRR,7676#

Syntax i am using

/opt/vertica/packages/kafka/bin/vkconfig load-spec --create --config-schema vert123 --load-spec kafka_delim_test --parser fdelimitedpairparser (delimiter=',',recordTerminator = '#')

But its throws as syntax error...

Comments

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited November 2017

    See if the --parser-parameters option works for you:

    /opt/vertica/packages/kafka/bin/vkconfig load-spec --create --config-schema vert123 --load-spec kafka_delim_test --parser fdelimitedpairparser --parser-parameters delimiter=',',recordTerminator='#';

    See:
    https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/KafkaIntegrationGuide/UtilityOptions/LoadSpecUtilityOptions.htm

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited November 2017

    @opper20 - There is an open JIRA ticket with a request to add the standard Vertica parsers to Kafka. Currently they are not available so you need to use the --filters option.

    See:

    https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/KafkaIntegrationGuide/UtilityOptions/LoadSpecUtilityOptions.htm

    https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/KafkaIntegrationGuide/ParsingCustomFormats.htm

  • HI Jim,

    Thanks for the reply...I tried to run based on the suggestions from Vertica Documentation but i am not getting the appropriate result.

    /opt/vertica/packages/kafka/bin/vkconfig load-spec --create --config-schema Vert123 --load-spec kafka_delim_test --filters KafkaInsertDelimiters(delimiter = ',',recordTerminator = '#') --username test1234 --password '####'
    -bash: syntax error near unexpected token `('

    Can you tell if i am parsing wrongly those values..

    Thanks

  • skeswaniskeswani - Select Field - Employee

    Is your data a key value pair?

    hadoop@skeswani:~$ cat data.txt
    ABC123,33#XYZ234,12#

    and is this how you want it to look?. i.e. each pair is a record. I am not sure fdelimitedpairparser is correct for the above data.

    skeswani=> select * from tgt;
    key | value
    --------+------
    ABC123 | 33
    XYZ234 | 12
    (2 rows)

    How does you raw data look. can you send me a sample, using kafka-console-consumer.sh

    kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic vertica_topic --from-beginning
    ABC123,33#XYZ234,12#

  • skeswaniskeswani - Select Field - Employee

    Based on the data provided, the row delimiter is not #, its E'#\n'

  • skeswaniskeswani - Select Field - Employee

    [skeswani@skeswani examples] [@:/]$ head /tmp/test.csv
    value1,1#
    value2,2#
    value3,3#
    value4,4#
    value5,5#
    value6,6#
    value7,7#
    value8,8#
    value9,9#
    value10,10#

    [skeswani@skeswani examples] [@:/]$ pwd
    /opt/vertica/sdk/examples
    [skeswani@skeswani examples] [@:/]$ sudo mkdir build
    [skeswani@skeswani examples] [@:/]$ sudo chmod a+w build/
    [skeswani@skeswani examples] [@:/]$ make
    ...
    ...

    [skeswani@skeswani examples] [@:/]$ vsql

    Welcome to vsql, the Vertica Analytic Database interactive terminal.

    Type: \h or \? for help with vsql commands
    \g or terminate with semicolon to execute query
    \q to quit

    skeswani=> CREATE LIBRARY ExampleDelimitedParserLib AS '/opt/vertica/sdk/examples/build/ExampleDelimitedParser.so';
    CREATE LIBRARY
    skeswani=> CREATE PARSER ExampleDelimitedParser AS LANGUAGE 'C++' NAME 'DelimitedParserExampleFactory' LIBRARY ExampleDelimitedParserLib;
    CREATE PARSER FUNCTION
    skeswani=> drop table tgt;
    DROP TABLE
    skeswani=> create table tgt (key varchar, values varchar);
    CREATE TABLE
    skeswani=> copy tgt from '/tmp/test.csv' Parser ExampleDelimitedParser(delimiter=',' , record_terminator = E'\n');

    Rows Loaded

     1000000
    

    (1 row)
    -- not ideal but the \n should be replace with #
    skeswani=> select * from tgt limit 1;
    key | values
    --------+--------
    value1 | 1#
    (1 row)

    skeswani=> copy tgt from '/tmp/test.csv' Parser ExampleDelimitedParser(delimiter=',' , record_terminator = '#');

    Rows Loaded

     1000000
    

    (1 row)

    skeswani=> select * from tgt limit 1;
    key | values
    ----------+--------

    value10 | 10
    (1 row)

    skeswani=> truncate table tgt;
    TRUNCATE TABLE

    --- Now for the scheduler...

    cat /tmp/test.csv | kafka-console-producer.sh --broker-list localhost:9092 --topic vertica_topic

    /opt/vertica/packages/kafka/bin/vkconfig scheduler --drop --config-schema delim_test
    /opt/vertica/packages/kafka/bin/vkconfig scheduler --add --config-schema delim_test --frame-duration '10 seconds'
    /opt/vertica/packages/kafka/bin/vkconfig cluster --create --cluster StreamCluster1 --hosts localhost:9092 --config-schema delim_test
    /opt/vertica/packages/kafka/bin/vkconfig source --create --config-schema delim_test --cluster StreamCluster1 --source vertica_topic --partitions 4
    /opt/vertica/packages/kafka/bin/vkconfig target --create --config-schema delim_test --target-schema public --target-table tgt

    /opt/vertica/packages/kafka/bin/vkconfig load-spec --create --config-schema delim_test --load-spec kafka --parser ExampleDelimitedParser --parser-parameters "delimiter=',',record_Terminator='#'"

    /opt/vertica/packages/kafka/bin/vkconfig microbatch --create --config-schema delim_test --microbatch conversion --target-schema public --target-table tgt --rejection-schema public --rejection-table rej --load-spec kafka --add-source vertica_topic --add-source-cluster StreamCluster1

    [skeswani@skeswani examples] [@:/]$ vsql -X -a -c "select * from tgt limit 1;"
    select * from tgt limit 1;
    key | values
    -----+--------
    (0 rows)

    /opt/vertica/packages/kafka/bin/vkconfig launch --config-schema delim_test

    [skeswani@skeswani examples] [@:/]$ vsql -X -a -c "select * from tgt limit 10;"
    select * from tgt limit 10;
    key | values
    --------------+---------
    value1 | 1
    value10 | 10
    value100 | 100
    value1000 | 1000
    value10000 | 10000
    value100000 | 100000
    value1000000 | 1000000
    value100001 | 100001
    value100002 | 100002
    value100003 | 100003
    (10 rows)

  • Thanks Sumeet...This really helps

  • Hi Sumeet,

    Can you tell if i want to skip the header from the message how can i skip that using scheduler

  • SergeBSergeB - Select Field - Employee

    FYI
    If you want to use the default DELIMITED parser with scheduler, you need to specify as the load-spec parser, AND you need to use the filters argument to pass non standard delimiters, escape by....

    For instance:
    /opt/vertica/packages/kafka/bin/vkconfig load-spec --conf kafka.props --create --load-spec spec_1 --filters "FILTER KafkaInsertDelimiters(delimiter=E'\n') DELIMITER ',' " --load-method DIRECT --parser DELIMITED

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file