Options

Unexpected drop of performance under low concurrency

At my company we have our own analytical web service that reads requests from frontend web applications and translate them to SQL queries. We have been testing Vertica as a possible underlying analytical database.

For that we took an example use case with two cubes composed by a fact table with all the dimensions degenerated and a joined date dimension. 

The fact tables have around 5 million rows and 20 columns (dimensions). For our test we took 30 frontend request that generates about 300 SQL queries. Each thread executes the 300 queries ten times.

We set up on AWS a three node cluster with r3.4xlarge instances with default Vertica configuration:
  • 16 vCores, 
  • 122 GB RAM, 
  • 200 GB SSD (4*50GB RAID0 + 15GB System)
We chose these instance type in order to best meet Vertica's recommended requirements.

Prior to this cluster we tested our use case in a single r3.4xlarge instance.

We were expecting some major improvement in a single thread test execution. That was not the case. "Maybe our queries are very simple and 5M is not that much, so all is happening in ram, hence maintaining the results. Lets go multithread" - we thought.

After running five threads executing the same queries in random order, we registered a 30% improvement comparing single instance vs three instances cluster. 

Not bad but wait... comparing single thread vs 5 thread we had a 50% drop in response times in the 3 nodes cluster. 

10 threads generated a 250% drop in performance relative to single thread.

I find these results disappointing since we have a three nodes, 16 core each, system and at most 10 concurrent threads (I will not even mention RAM since it was never used more than 6GB).

Questions:
  1. Considering this scenario should Vertica's performance be affected that significantly as we increase concurrency?
  2. Can the drop in performance be related with all the queries hitting the same tables at the same time?


Comments

  • Options
    Is using AWS a requirement?
  • Options
    Sadly yes, but you think it can be related to that?
  • Options
    Have you run validation tests in this environment?
  • Options
    I followed Vertica's installation guidelines include the one for AWS, besides that I hadn't done anything special.
  • Options
    I'd recommend running the validation tests (vnetperf, vcpuperf, vioperf) if you haven't already. Very few clients use AWS. I'd highly encourage using physical local hardware or reach out to Vertica if you need some hardware.
  • Options
    Ok Norbert, tests run, please find results at http://pastebin.com/HBf6KQVp, your opinion will be highly appreciated. 

    I didn't find anything wrong though. 


  • Options
    I reviewed the results, and my observations are below. I don't think there's much going on there.

    The vcpuperf results look fine to me.

    The vnetperf results seem to show some bytes aren't received in udp-throughput above 256MB/s. For a 2048 MB/s rate limit in udp-throughput, there's only an average of 91138730 bytes received out of 843644928 bytes sent. Vertica uses UDP for spread transmission. Have you looked at your retransmit rates in dc_spread_monitor? The tcp-throughput looked acceptable.

    The vioperf results are only reporting node0001.

    What does your test look like? Are the tests being load balanced, and are they just all SELECT statements?

    I'd recommend taking a glance at the scalability and concurrency documentation.
  • Options
    Also, if you did not build your AWS cluster using the pre-configured Vertica AMI, I would recommend that approach.
  • Options
    We have used Vertica AMI. The tests are not load balanced, all queries hit the same node. We have done a single test using a load balancer but the results remain unchanged so we removed that element from the equation for the moment.

    Let's simplify the use case. We did register a drop in performance in a single node environment when we increased concurrency from one user to five users (the response time increased by 100%). No spread/upd/network related issues there.

    Is this drop expected given the previous data?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file