performance of loading data to Vertica
Hi,
I used the Vertica connector to load data to Vertica via parallel COPY DIRECT. At first time we load data, data include some fields of long varchar (over 1000 chars) and the loading is very slow. After we remove those long verchar fields, loading become very fast. Any idea why loading data with long varcha slows down the loading? Thanks.
Ey-Chih Chow
0
Comments
Hi Ey-Chih Chow ,
Which Vertica connector are you using ?
where are you loading from.
NC
Thanks.
Ey-Chih Chow
Hello Ey-Chih Chow,
I'm unsure if you are experiencing a client/server performance issue or a server-only performance issue. I don't know what you mean specifically by "vertica connector". What client is that?
In general, declaring fields as potentially large can have some drawbacks.
I won't get into implementation details, but the gist of the issue is that operators need to be prepared to see fields as wide as 1000 characters if you declare a varchar(1000). Setting up the internals to handle fields of that wide may have a performance impact.
If you don't ever need 1000 characters, declare your varchar fields to be smaller.
Certainly if you were comparing the performance of (int, int, varchar(1000)) to (int, int) you would see a difference. The internals can work much faster with a pair of ints than with a pair of ints plus a variable length character field that might be almost a KB in size per row.
- Derrick
Thanks for the information. By vertica connector, I mean vertica-hadoop connector. So the client is a hadoop mapper only job.
Best regards,
Ey-Chih Chow