performance of loading data to Vertica

eychih · March 2015

Hi,

I used the Vertica connector to load data to Vertica via parallel COPY DIRECT. At first time we load data, data include some fields of long varchar (over 1000 chars) and the loading is very slow. After we remove those long verchar fields, loading become very fast. Any idea why loading data with long varcha slows down the loading? Thanks.

Ey-Chih Chow

Navin_C · April 2015

Hi Ey-Chih Chow ,

Which Vertica connector are you using ?

where are you loading from.

NC

eychih · April 2015

I used the vertica connector that use copy direct to load to vertica. Not the newer version that uses insert to load data to vertica. I load data from AWS EC2 instances via VPC.

Thanks.
Ey-Chih Chow

DerrickR · April 2015

Hello Ey-Chih Chow,

I'm unsure if you are experiencing a client/server performance issue or a server-only performance issue. I don't know what you mean specifically by "vertica connector". What client is that?

In general, declaring fields as potentially large can have some drawbacks.

I won't get into implementation details, but the gist of the issue is that operators need to be prepared to see fields as wide as 1000 characters if you declare a varchar(1000). Setting up the internals to handle fields of that wide may have a performance impact.

If you don't ever need 1000 characters, declare your varchar fields to be smaller.

Certainly if you were comparing the performance of (int, int, varchar(1000)) to (int, int) you would see a difference. The internals can work much faster with a pair of ints than with a pair of ints plus a variable length character field that might be almost a KB in size per row.

- Derrick

eychih · April 2015

Hi Derrick,

Thanks for the information. By vertica connector, I mean vertica-hadoop connector. So the client is a hadoop mapper only job.

Best regards,

Ey-Chih Chow

We're Moving!

Create My New Community Account Now

performance of loading data to Vertica

Comments

Leave a Comment