Performance impact upon increasing number of columns

James_1 · July 2015

I would like to know what is the performace impact upon increasing number of columns.

We are having a Vertica cluster (6 nodes) with two big fact tacles (360 column each) and they partition by date_id (365 days). we need to increase the number of column on each of the fact tables by arroud 200 columns.

question:

1. we know that teoreticly Vertica can support maximum of 1600 columns per tabel.

2. Is the insertion performance (time) will dramaticly fail? or it will decrease by the relative number of columns will be added?

3. It is possibly to say that the performance drop off (all asspect including tupple mover operations) compared to increasing number of columns

is linearly?

4. Is their a performance breaking point upon increasing number of columns?

Yogi_Baba · July 2015

Vertica Column Limit is at the database platform level. But there is dependency on the OS level configuration.

1. So, 1600 columns will not impact anything negatively within the capacity available in the cluster. 1600 columns means effectively minimum 1600 files to handle per node. During data writes it could be multiples files before tuple mover merges the files to make ROS perform.

2. With regards to INSERT performance, whether you have 1600 columns in one table or 100 columns in 16 tables, the INSERT performance will depend on the underlying IO subsystem. The number of RAID disks used will determine the IOPS / Throughput. But in my humble opinion, keep you columns lower than max limit. Additionally this will create a single point of contention for you application that may have complicated workload.

3. Performance drop may not be happening just because of increased number of columns.

4. I do not think there is any performance breaking point. But this will be best answered by Vertica engineering folks.

best regards,

yogi

eli_revach · July 2015

Hi You may face some performance degradation , below are the impacted areas : • Parsing time – when you run COPY or insert commands to load data , Vertica need to parse each input data column to validate its content with its internal metadata , for example , when you put timestamp value on your input vertica need to validate that your database column is timestamp and the time that you provide is valid time , so more columns = more parsing time ( There are some case where Vertica parse the input in parallel , so , the impact can be small , but its very related to the method you use for data load ) . • ROS container –One ROS container unit will include more files now – more work for MergeOut process , can impact your cluster utilization and impact overall performances . • WOS – if your writing to WOS , MoveOut will work more harder (He does In some kind of In memory MergeOut). All above is a general potential behave and in some case will not be seeing , I advise to run small benchmark to validate your specific use case . Thanks

We're Moving!

Create My New Community Account Now

Performance impact upon increasing number of columns

Comments

Leave a Comment