HPVertica on Azure Marketplace - Performance Issues?
I've been trying to use HP Vertica installation from Azure Marketplace:
https://azuremarketplace.microsoft.com/en-us/marketplace/apps/hpe.verica?tab=Overview
The performance results are quite inconsistent though.
I was able to get better results on physical machine with just 6CPU 2.2GHz and 10GB RAM!
The weirdest thing is that on DS14_v2 machines which has twice the cpu and memory then DS13_v2 I got test times 10% slower. Even though I got the same data, scenarios, statistics, projections, etc.
I did some verification on how the cluster is configured and deployed, and it seems fine in theory, but there are some glitches.
[root@VertNode0 ~]# VERT_DBA_DATA_DIR=/data /opt/vertica/oss/python/bin/python -m vertica.local_verify
CheckNodeDisk.check_io_scheduler_none (0.076s)
HINT (S0151): These disks do not have known IO schedulers: '/dev/md126' ('md126') = 'none'
https://my.vertica.com/docs/8.0.x/HTML/index.htm#cshid=S0151
CheckNodeDisk.check_readahead (0.075s)
FAIL (S0020): Readahead size of md126 (/dev/md126) is too high for typical systems: 12288 > 8192
https://my.vertica.com/docs/8.0.x/HTML/index.htm#cshid=S0020
Summary:
fail: 2
pass: 48
Shouldn't this verification pass? Could it be caused by software raid?
Network parameters doesn't seem perfect either.
test | date | node | index | rtt latency (us) | clock skew (us)
latency | 2017-03-13_07:29:39,610 | 10.0.0.21 | 0 | 269 | -657331
latency | 2017-03-13_07:29:39,610 | 10.0.0.22 | 1 | 523 | -211663
date | test | rate limit (MB/s) | node | MB/s (sent)
2017-03-13_07:29:42,621 | udp-throughput | 256 | 10.0.0.21 | 222.969
2017-03-13_07:29:42,621 | udp-throughput | 256 | 10.0.0.22 | 215.714
2017-03-13_07:29:42,621 | udp-throughput | 256 | average | 219.341
2017-03-13_07:29:43,623 | udp-throughput | 512 | 10.0.0.21 | 230.82
2017-03-13_07:29:43,623 | udp-throughput | 512 | 10.0.0.22 | 229.864
2017-03-13_07:29:43,623 | udp-throughput | 512 | average | 230.342
date | test | rate limit (MB/s) | node | MB/s (sent)
2017-03-13_07:29:54,650 | tcp-throughput | 256 | 10.0.0.21 | 244.17
2017-03-13_07:29:54,650 | tcp-throughput | 256 | 10.0.0.22 | 244.17
2017-03-13_07:29:54,650 | tcp-throughput | 256 | average | 244.17
2017-03-13_07:29:56,652 | tcp-throughput | 512 | 10.0.0.21 | 488.302
2017-03-13_07:29:56,652 | tcp-throughput | 512 | 10.0.0.22 | 456.768
2017-03-13_07:29:56,652 | tcp-throughput | 512 | average | 472.535
Comments
The issue you are encountering is a problem with the Linux 2.6 kernel. under some circumstances the kernel causes a spin-lock during a memory option when running on a virtualized server. The more memory on the VM, the longer the spin-lock. This issue was identified 2 weeks ago, being resolved as part of a new version of the marketplace solution that will be available in a few weeks (target is 3/30). The resolution is to upgrade to the 7.3 distribution of CentOS with the 3.10 kernel. This kernel does not exhibit the same spin-lock issues.
As for the software raid, when disks are created via mdadmin, there is no choice for the IO on the raid device, it relies on the IO choice for the underlying disks (I.E /dev/sdc, /dev/sdd, etc). Additionally, the read ahead is intentionally set a little higher to optimize the storage paradigm in the cloud. The issue with verify is basically that it was designed to be used against a bare metal machine, not a virtualized one, but I will file a change request with ENG on it.
The network results you are getting are completely inline with the maximum single thread connection speeds available in Azure, ~4GB.
If you need to resolve the spin-lock issue before the 3/30 target for the new marketplace solution, please reach out to me directly, and I can offer assistance.
-Chris
Is this problem with spin-locks something specific to Azure or can it cause issues on deployments running on CentOS 6.8 under VMWare?
As for Azure... we've switched for now to AWS. So it's nothing urgent any more. Thanks for your help.