What storage configuration do you recommend in AWS for vertica without breaking the bank?

The most cost effective is EBS ST1 and it delivers good thruput, but less ideal IOPS since it is not SSD-based. Any recommendations to increase IOPS so that IO performance ( blend of sequential and random reads ) is top-notch ? This has high impact in query response time. I am looking for data volumes of about 10-20 TB / node.
Thanks.

Answers

  • bryanwbryanw Employee

    Colin -
    As soon as you say AWS, I immediately think EON Mode - scaling compute and storage independently lets you dial in what you need, and dial back what you don't need. EON also makes it easier to change instance types, if you decide you want more (or less) CPU and memory per node. If you don't want to manage those options, Enterprise Mode in AWS is still a great option, and with a bit of work you can rebind EBS volumes to different instance types - not as automated as EON, but still a viable option.

    Vertica has a long history of delivering great results with spinning disks (so long as there are enough devices to keep the CPUs utilized). 10-20TB of EBS st1 (2375 MB/s of throughput at max size, 16TB) is plenty to utilize the CPUs of any EC2 instance type. Filled 60% per Best Practices, that's 9.6TB of ROS files per instance, which is easily 10-20TB with conservative ROS compression.

    Any "EBS Only" instance type can run Enterprise Mode, but my 2 favorite are:
    1st choice: i3 instances, being fairly new, have strong CPU throughput per dollar, and lots of memory for general purpose queries, ETL, enrichment, and etc.
    2nd choice: c5 instances, being newer, have more CPU throughput per dollar (but not much memory) - best for simple dashboard queries, e.g. EON sub-clusters dedicated to simple queries

    Both of those instance types are available in 2 VCPU up to "bare metal" in US East (N. Virginia and Ohio)

  • Colin,
    I am running r5.4xlarge in Enterprise mode with IO1 volumes.
    Switched from ST1 last year due to more traffic.
    More nodes with less storage might be a solution as well.
    replacing a node is less expensive regarding rebalance time.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file