What's the relationship between Vertica and NUMA?

Neil_1 · February 2014

Does Vertica support NUMA architecture? Is there a best practice for configuring a server with regard to its NUMA? Does the NUMA configuration have any impact on best practice configuration of any aspects of a Vertica database?

Any advice on these topics, and/or links to detailed information on Vertica running on SMP vs NUMA hardware appreciated.

Emanuel_Pordes · February 2014

FYI, when running Vertica 6.1.x on Centos 6.2 we were bitten by the following issues with Linux NUMA support:

http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/

http://frosty-postgres.blogspot.ca/2012/08/postgresql-numa-and-zone-reclaim-mode.html

Neil_1 · February 2014

thanks - seen that mysql article before, interested to know it bit in the context of Vertica

Neil_1 · February 2014

Ok, so been investigating this further and think that I am seeing this happen as well.

eg:

$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 65501 MB
node 0 free: 747 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 65535 MB
node 1 free: 39341 MB
node distances:
node 0 1
0: 10 20
1: 20 10

$ perl numa-maps-summary.pl < /proc/49519/numa_maps
N0 : 12961762 ( 49.45 GB)
N1 : 5625963 ( 21.46 GB)
active : 17309379 ( 66.03 GB)
anon : 18581686 ( 70.88 GB)
dirty : 18581686 ( 70.88 GB)
mapmax : 361 ( 0.00 GB)
mapped : 6354 ( 0.02 GB)

As you can see above, the free memory on node 0 reduces to 747MB as the query progresses. The allocation to N0 is also significantly imbalanced. What I noticed shortly after was that the memory usage dropped to 5%, which seems to then be consistent for the remainder of the query. Is Vertica switching to not using memory? This would fit with a significant increase (> linear) in execution time when the size of the (theoretic) intermediate relation being worked with by Vertica grows past a certain size.

Am I reading this wrong? Is NUMA a non-issue here?

Advice on NUMA configuration would be very appreciated.
Advice on not having this issue would be very appreciated.
Are these issues resolved in later versions of Vertica (I am running 6.1.2)

The queries I am running should all fit in main memory of one node, let alone 4, and are in the 'trivially parallelisable' category.

$ dmesg | grep -i numa
NUMA: Allocated memnodemap from c000 - c140
NUMA: Using 30 for the hash shift.
pci_bus 0000:00: on NUMA node 0 (pxm 0)
pci_bus 0000:20: on NUMA node 1 (pxm 1)

Running; Vertica 6.1.2
Running on: CentOS release 6.3 (Final)

Linux version 2.6.32-279.el6.x86_64 (mockbuild@c6b9.bsys.dev.centos.org) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Fri Jun 22 12:19:21 UTC 2012

HP ProLiant DL380p Gen8, BIOS P70 03/01/2013

Emanuel_Pordes · February 2014

From our understanding, the following settings should have the biggest impact:

1) Add 'vm.zone_reclaim_mode = 0' to /etc/sysctl.conf, save it and execute sysctl -p to load the new settings into the kernel.
We applied this change and saw some improvements.

2) Start Vertica with numactl --interleave=all
Unfortunately, adding this is out of our control. There is a feature request already opened with HP/Vertica to "optimize" for NUMA, but so far no word on any progress there.

There are also other suggestions/optimizations outlined here:
http://blog.jcole.us/2012/04/16/a-brief-update-on-numa-and-mysql/

Neil_1 · February 2014

@Emanuel - thanks for the tips, appreciated - was considering the interleave=all option. Am also considering switching off NUMA in the bios and trying running as SMP (assuming the bios allows for this).

Am *really* hoping that someone from HP Vertica who knows about this stuff will chime in with their view on best practices however.

Neil_1 · February 2014

Interestingly, it seems that the zone_reclaim_mode is already disabled on the node(s):

$ cat /proc/sys/vm/zone_reclaim_mode
0

[Deleted User] · February 2014

Hi all,

I can't really give official advice here as NUMA isn't my personal expertise. But this is an issue that we are aware of. I think our typical recommendation is to switch off NUMA in the BIOS.

Emanuel, do you have an open support case? Support should be able to walk you through starting Vertica with "numactl --interleave=all" if you're running on machines whose BIOS doesn't provide this functionality.

From our perspective, there are two separate things in play here. "Optimize for NUMA" means "make Vertica take advantage of NUMA to run faster than would be possible on traditional SMP systems." The advantages here tend to be quite small on many types of current modern systems, though they are nonzero and will presumably grow as we see servers with more cores and more-complex memory architectures. Separately, as you've seen, Linux itself tries to optimize for NUMA, and sometimes its attempts backfire and cause programs to run much more slowly than they would have with a simple flat memory model on the same physical hardware. That sort of performance regression is something that we can help with right now, typically with steps like those already discussed on this thread.

The issues here are fundamentally in the Linux kernel, not the hardware. For most Vertica users, this doesn't really matter; we'll provide support and assistance either way and wouldn't expect more than that. But if you happen to be a large software shop with systems programmers who are comfortable working with the Linux kernel (and this does describe a number of Vertica users), this is also something that you could address by poking at the kernel's NUMA code. Sounds like many people would appreciate that patch :-)

Adam

We're Moving!

Create My New Community Account Now

What's the relationship between Vertica and NUMA?

Comments

Leave a Comment