Performance impact upon increasing number of columns
I would like to know what is the performace impact upon increasing number of columns.
We are having a Vertica cluster (6 nodes) with two big fact tacles (360 column each) and they partition by date_id (365 days). we need to increase the number of column on each of the fact tables by arroud 200 columns.
question:
1. we know that teoreticly Vertica can support maximum of 1600 columns per tabel.
2. Is the insertion performance (time) will dramaticly fail? or it will decrease by the relative number of columns will be added?
3. It is possibly to say that the performance drop off (all asspect including tupple mover operations) compared to increasing number of columns
is linearly?
4. Is their a performance breaking point upon increasing number of columns?
Comments
Vertica Column Limit is at the database platform level. But there is dependency on the OS level configuration.
1. So, 1600 columns will not impact anything negatively within the capacity available in the cluster. 1600 columns means effectively minimum 1600 files to handle per node. During data writes it could be multiples files before tuple mover merges the files to make ROS perform.
2. With regards to INSERT performance, whether you have 1600 columns in one table or 100 columns in 16 tables, the INSERT performance will depend on the underlying IO subsystem. The number of RAID disks used will determine the IOPS / Throughput. But in my humble opinion, keep you columns lower than max limit. Additionally this will create a single point of contention for you application that may have complicated workload.
3. Performance drop may not be happening just because of increased number of columns.
4. I do not think there is any performance breaking point. But this will be best answered by Vertica engineering folks.
best regards,
yogi