what is the unit of parallelism on a node in a vertica cluster? in each node if i have multiple cpus, how does vertica internally implements parallelism?
Take a look at the HP Vertica Concurrency and Workload Management technical white paper which explains Vertica's approach to concurrency.
thank you for the reply.
my question is more about a complex join leveraging the multi cores on a single host in a cluster of machines.
to make it simple how many processes run on a single machine in the cluster for a specific query?
Good question! One with a complex answer, of course. The trade-off is between parallelism and resource usage. Parallelism does not come free: it requries additional memory buffers and incurs a complexity cost. It's not always best to have maximum parallelism. Particularly on systems with concurrent sessions / active queries.
Having said that, Vertica attempts to leverage parallelism wherever possible. This includes within Joins - of which there are several types (internally).
Administrators who want to have control over parallelism will need to understand and carefully configure their Resource Pools. See the documentation on "Guidelines for Setting Pool Parameters".
Specifically EXECUTIONPARALLELISM, PLANNEDCONCURRENCY, and memory sizes (which affect the query budget).
Generally we recommend keeping EXECUTIONPARALLELISM at AUTO and setting PLANNEDCONCURRENCY to the number of active queries you want to have running on a resource pool at any given time. By allocating more memory resources for the resource pool, queries will have a larger budget and therefore be able to pay the price of greater parallelism.