Sessions exceeding the limit in Hadoop

SRA1 · November 2016

I have hadoop eco system which receives files as input and hadoop after processing loads into vertica database with 200 session limit.

Hadoop is creating vertica session for each input file from hadoop.

Now if the input files exceeds 200 limit then the sessions exceed 200 and causes 'sesssion exceed error'

is there any pooling concept which will help .

Adrian_Oprea_1 · November 2016

Why don't you join the files so you can avoid hifh concurency against Vertica. Vertica was not bild for nigh concurency stuff. Also this way you will avoid you TM to go crazy.

SRA1 · November 2016

Thanks a lot for the reply ...Yes, i have that suggestion for the Hadoop team, but they are more crazy then TM at the moment.

Is there a way in vertica to accept more sessions or any other chance.

Adrian_Oprea_1 · November 2016

Adding more sessions wont solve nothing !

What is your Resource pool Planned Concureny ? How many CPU do you have on your nodes (single node)?

You can open more session and all of them will be queing !

Adrian_Oprea_1 · November 2016

Adam Wrote a answered a similar question here:

https://community.dev.hpe.com/t5/Vertica-Forum/Data-Load-limitation/td-p/218471

-is not Hadop related but it relates to the whole session and loading stuff

SRA1 · November 2016

I have 12 node cluster with max concurrency 24 planned concurrency 10 and max session 200.

We are using Vertica-hadoop native connector in map reduce which will create a copy statement.

i believe it is creating a new session for evry file(say more that 200 files in total).

i am planning to increase the session limit to 500.

Adrian_Oprea_1 · November 2016

Hi SRA,

You said you have hadoop feeding data to you so you can load it into Vertica. I assume is aggregated data ! (less important this point:) ).

Now i guess you have this data in a certain format and the files come with similar of names ? !

Your load design should not create a load session per each finle that is generated.

Is not a good approach to run copy"load" every time a file is generated by your hadoop ! A smart method would be to have your load process run every lets say 10 min and have him load all files present in the location where hadoop generates the files.

Example:

COPY table_bla_bla FROM '/hadoop_path/file*.csv' ON node01, '/hadoop_path/file*.csv' ON node2, ''/hadoop_path/file*.csv'' ON node3, DELIMITER '|' direct ;

This way you run only one copy command.

We're Moving!

Create My New Community Account Now

Sessions exceeding the limit in Hadoop

Comments

Leave a Comment