may I ask what exactly the costs in explain output are? Are they combied, summed (or something else) from several metrics like IO, RAM etc. or they are one specific metric?
e.g. +-SELECT LIMIT 10 [Cost: 282K, Rows: 10]
With many cost-based-optimizers (CBO's) of which Vertica's is one, "cost" is only a heuristic value, an estimate, intended to give a rough idea of how "heavy" a query plan is.
The first link shows factors involved in costing, but no algorithm is provided:
"Cost: The optimizer calculates the cost for each operator using algorithms that estimate resources for CPU, memory, and network. The estimate of usage of resources is based on statistics. Some examples of the statistics are as follows: • Number of rows in the table. • Number of distinct values of each column. • Minimum or maximum values of each column.• Histogram of the distribution of values in each column.• Disk space of each column."
"Cost is an estimate of the resources that the query plan will use for its execution strategy, such as data distribution statistics, CPU, disk, memory, network, data segmentation across cluster nodes, and so on. Although such resources correlate to query run time, they are not an estimate of run time. For example, if Plan1 costs more than Plan2, the optimizer estimates that Plan2 will take less time to run. Cost does not mean that if Pan1 costs two times more than Plan2, the optimizer estimates that Plan2 will take half the time to run."