Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!
Vertica and oom
Hi, I have a Vertica 6.1.2 cluster, with 7 nodes with 128GB memory each. Since a month or so (WIthout Vertica update in the meantime), the oom killer regularly decides that Vertica on a host must be killed. This does not happen on al hosts, but most hosts are affected at one point or another. I know that I could play with oom_adj_score, but this is per process, so it needs to be adjusted at each restart and is just dirty. I am trying to find a better solution. Vertica has been restarted on all hosts, so this is not an old memory leak. I set up the general pool to only use 75% of the memory (before the restart), hoping it would alleviate the issue but it did not help. The relevant log lines from oom are are below. From this I understand that teh swap is full (but it does not matter as it basically is useless because very tiny compared to the memory) and that the 'normal' memory is below its min mark, which is the reason for the oom killer kicking in. I wonder if this has already be seen, and what could be a solution. Thanks for any help,
warning kernel: active_anon:15562290 inactive_anon:745732 isolated_anon:0 warning kernel: active_file:1348456 inactive_file:1562279 isolated_file:0 warning kernel: unevictable:0 dirty:126 writeback:0 unstable:0 warning kernel: free:13454989 slab_reclaimable:110377 slab_unreclaimable:20001 warning kernel: mapped:1085 shmem:21 pagetables:34068 bounce:0 warning kernel: Node 0 DMA free:15592kB min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15180kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes warning kernel: lowmem_reserve: 0 1943 64563 64563 warning kernel: Node 0 DMA32 free:251824kB min:1352kB low:1688kB high:2028kB active_anon:266144kB inactive_anon:389756kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1989640kB mlocked:0kB dirty:20kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:40kB slab_unreclaimable:216kB kernel_stack:0kB pagetables:276kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no warning kernel: lowmem_reserve: 0 0 62620 62620 warning kernel: Node 0 Normal free:43652kB min:43668kB low:54584kB high:65500kB active_anon:61808336kB inactive_anon:2575340kB active_file:1908kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:64122880kB mlocked:0kB dirty:276kB writeback:0kB mapped:496kB shmem:16kB slab_reclaimable:18360kB slab_unreclaimable:53884kB kernel_stack:4712kB pagetables:122172kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:6184 all_unreclaimable? no warning kernel: lowmem_reserve: 0 0 0 0 warning kernel: Node 0 DMA: 2*4kB 0*8kB 2*16kB 0*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15592kB warning kernel: Node 0 DMA32: 37*4kB 16*8kB 22*16kB 16*32kB 9*64kB 158*128kB 246*256kB 156*512kB 81*1024kB 2*2048kB 0*4096kB = 251828kB warning kernel: Node 0 Normal: 2164*4kB 881*8kB 396*16kB 206*32kB 117*64kB 39*128kB 14*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44696kB warning kernel: 2912367 total pagecache pages warning kernel: 1629 pages in swap cache warning kernel: Swap cache stats: add 210179, delete 208550, find 444184/451180 warning kernel: Free swap = 0kB warning kernel: Total swap = 524280kB info kernel: 33554416 pages RAM info kernel: 523103 pages reserved info kernel: 2908628 pages shared info kernel: 16663531 pages non-shared