What is the ideal hardware spec needed to join big fat fact tables of >1B records, >100 columns in Vertica?
Where in a query you'll find:
multiple fact on fact table joins
joins using high cardinality columns or pk columns
We have plenty of customers doing complex workloads using the general hardware requirements set out at https://www.vertica.com/kb/GenericHWGuide/Content/Hardware/GenericHWGuide.htm
In general, you'll want the high end of CPU and memory recommendation, up to 32 cores (2 sockets at 12-16 cores each) and 512 GB memory. If you're on AWS, this corresponds to the high memory instance r4.16xlarge.
In case or complex queries, you'll want to tune a resource pool to allow maximum query budget - low ExecutionParallelism and MaxConcurrency and high initial memory. Also run DBD to ensure there are optimized projections. Tuning the resource pool and building optimized projections will likely be critical to good performance.
I've gone through projections. It doesn't seem to give any performance gain for multiple billion-record fact on fact table joins. It's pretty frustrating, to be honest. It's like squeezing juice from an old orange fruit.
But I'll read through that recommended hardware specs to see if we're just lacking horsepower
Regarding projections, there are three general recommendations: ensure that all tables have optimized compression to reduce I/O and network latency; create projections sorted by JOIN ON clause because it's faster to merge join on sorted columns; create projections segmented by the same key(s) so similar data ranges are on the same node, avoiding network transfer between nodes. Some of this appears on EXPLAIN, e.g. avoid any steps with RESEGMENT, or any GLOBAL activity since global means across all nodes.
However, where more than one input table is greater than 1B rows, increased memory might help, since it's faster to process in memory than to spill to disk. PROFILE query should show how much total memory and disk are used and give some estimate how much memory might be needed. Another idea is to change settings such as initial memory and planned concurrency to increase query budget in a resource pool specific to the large query - this allows Vertica to pre-allocate a large buffer space and might run a bit faster by avoiding malloc and rebalancing other pools dynamically.