Vertica 8.1.1 OOM (constantly getting killed)
We have been plagued with OOM issues. Today we had two nodes fail within 15 minutes of each other. Every month we have at least a node fail due to OOM issues. Here are the details:
DB Version: 8.1.1-6 (3 nodes)
AWS AMI: Vertica 8.1.1 CentOS 7.3 - 1498566984-38d06046-9fbd-4e9e-8f59-cfdb7b6de752-ami-751f2e63.4 (ami-85ffe3fc)
OS: Centos 7.3 3.10.0-514.6.2.el7.x86_64
RAM: total used free shared buff/cache available
Mem: 62G 4.4G 40G 491M 17G 57G
Swap: 15G 727M 15G
sysquery: 64M, sysdata: 100M, wosdata: 2G, tm: 2G, p_dashboard (custom pool): 8G (cascades to general)
general: 48G, sysdata: 1GB, wosdata: 2G, jvm: 2GB, monitoring: 2GB, blobdata: 10% (not used, we don't run any machine learning).
OOM dmesg logs:
[ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
 1001 331310 25760521 15801473 42609 3996082 0 vertica
Shows vertica with 60.2GB rss, and 15GB swapents; no other process has even close to 1GB rss.
sysctl: (changes to base AMI:) vm.swappiness=1
Any help would be greatly appreciated.