Wide Tables With vertica
Hi Team,
I need a suggestion/Approach on performing a Load/Copy operation in a Wide tables.
I have a wide table with 1200 columns and few columns around 20 with 500 length.Each day it has to load with 1 billion data.Please let m e know which would be good in terms of performance.
Approach1:
I will load in the wide table with COPY and do the Order by/Segmentation based on User queries.This will not do any re-segmentation/Broadcast over network.Will this be give any impact when do the COPY because it has to maintain 1200 columns in each copy and have to be in Catalog.
Approach 2
i will divide the wide table into 4 tables with 300 columns and will join all the tables when user queries.But the performance impact will come here when do the joining on 4 tables.It may have re-segmentation/Broadcast and if a user misses any segmentation clause it again
re-segment.
Can you tell which will give good performance with less CPU/IO impact.
Comments
Hi,
I think that approach #1 is best. Make sure to look into the Vertica "Load Parallelism" features!
See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/ExtendingVertica/UDx/UDL/ParallelLoad.htm
I've had clients attempt to join tables having billions of records only to later flatten them out into a single wide table to improve query performance.
Note that Vertica also has a "Flattened Table" feature that you might find helpful. Although, these are more geared toward FACTS and DIMENSIONS...
See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AnalyzingData/FlattenedTables/FlattenedTables.htm
Thanks Jim, Can you brief on Load Parallelism features or share any example.Currently we are doing COPY with any node that means file is present in all nodes the distributed approach.Do you mean the same or any other parallal techniques you are looking out