We get a zipped file of 20 GB in size and we load it through copy command. Could you guys help me in improving the load time in this process?
No.of Nodes in Cluster - 62
Loading data with the advantage of parallelism will speed up the operation.
Distribute your files on several nodes to load in parallel, instead of loading on a single node. You can use a wildcard or glob (such as *.gz) to load multiple input files, combined with the ON ANY NODE clause, then COPY will distribute the list of files to all cluster nodes to spread the workload.
If you have only a single file, Vertica could divide it and load its portions (apportioned load) on several nodes in parallel, however, you cannot use apportioned load for a compressed file.