Sessions exceeding the limit in Hadoop

I have hadoop eco system which receives files as input and hadoop after processing loads into vertica database with 200 session limit.

Hadoop is creating vertica session for each input file from hadoop.

Now if the input files exceeds 200 limit then the sessions exceed 200 and causes 'sesssion exceed error'

is there any pooling concept which will help .

Comments

  •  Why don't you join the files so you can avoid hifh concurency against Vertica. Vertica was not bild for nigh concurency stuff. Also this way you will avoid you TM to go crazy. 

     

  • Thanks a lot for the reply ...Yes, i have that suggestion for the Hadoop team, but they are more crazy then TM at the moment.

    Is there a way in vertica to accept more sessions or any other chance.

  •  Adding more sessions wont solve nothing ! 

    What is your Resource pool Planned Concureny ? How many CPU do you have on your nodes (single node)?

     

    You can open more session and all of them will be queing !  

     

  • Adam Wrote a answered a similar question here:

    https://community.dev.hpe.com/t5/Vertica-Forum/Data-Load-limitation/td-p/218471

     

    -is not Hadop related but it relates to the whole session and loading stuff

  • I have 12 node cluster with max concurrency 24 planned concurrency 10 and max session 200.

    We are using Vertica-hadoop native connector in map reduce which will create a copy statement.

    i believe it is creating a new session for evry file(say more that 200   files in total).

    i am planning to increase the session limit to 500.

     

  • Hi SRA,

     

      You said you have hadoop feeding data to you so you can load it into Vertica. I assume is aggregated data ! (less important this point:) ).

      Now i guess you have this data in a certain format and the files come with similar of names ? ! 

     

    Your load design should not create a load session per each finle that is generated.

     

    Is not a good approach to run copy"load" every time a file is generated by your hadoop ! A smart method would be to have your load process run every lets say 10 min and have him load all files present in the location where hadoop generates the files.

     

    Example:

    COPY table_bla_bla FROM '/hadoop_path/file*.csv' ON node01, '/hadoop_path/file*.csv' ON node2, ''/hadoop_path/file*.csv'' ON node3, DELIMITER '|' direct ; 

    This way you run only one copy command. 

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file