How many nodes are enough for 24 terabytes of data?

SK21SK21 Vertica Customer

Currently we have 21 nodes for 24 terabytes of data. I am wondering, How many nodes are enough for 24 terabytes of data?

As i can see below alerts continuously-

Database: bk Lower than threshold Node Disk I/O 10 %

v_bk_node0001: 0.18%;
v_bk_node0002: 1.13%;
v_bk_node0004: 0.18%;
v_bk_node0005: 0.2%;
v_bk_node0006: 0.22%;
v_bk_node0007: 0.2%;
v_bk_node0008: 0.52%;
v_bk_node0010: 0.62%;
v_bk_node0011: 0.24%;
v_bk_node0013: 0.91%;
v_bk_node0014: 0.18%;
v_bk_node0016: 0.2%;
v_bk_node0017: 0.87%;
v_bk_node0018: 1.12%;
v_bk_node0019: 0.52%;
v_bk_node0020: 0.19%;
v_bk_node0021: 1.12%;
v_bk_node0022: 0.18%;
v_bk_node0024: 0.21%;
v_bk_node0003: 0.2%;
v_bk_node0023: 0.22%;

Database: bk Lower than threshold Node Memory 10 %

v_bk_node0021: 4.08%;
v_bk_node0002: 3.92%;
v_bk_node0007: 3.54%;
v_bk_node0005: 3.74%;
v_bk_node0019: 3.39%;
v_bk_node0024: 3.81%;
v_bk_node0011: 3.82%;
v_bk_node0003: 3.74%;
v_bk_node0006: 3.77%;
v_bk_node0016: 4.28%;
v_bk_node0004: 4.09%;
v_bk_node0008: 3.94%;
v_bk_node0017: 3.98%;
v_bk_node0022: 3.87%;
v_bk_node0013: 3.87%;
v_bk_node0018: 3.95%;
v_bk_node0010: 3.49%;
v_bk_node0020: 3.53%;
v_bk_node0014: 3.99%;
v_bk_node0023: 3.83%;
v_bk_node0001: 4.97%;


  • Options
    Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited December 2020

    See how efficient Vertica is with disk I/O :) Or you aren't executing a whole lot of queries or loading a bunch of data.

    The general recommedation for Enterprise mode for 40 TB–1 PB is 4–100 nodes.


    How much much space are you currenlty using per node?

    SELECT node_name, SUM(used_bytes) / 1024^4 used_tbytes FROM projection_storage GROUP BY node_name ORDER BY node_name;

    Fyi... Here is the query MC uses to help generate the node threshold I/O alerts:

    SELECT t.x, round(case when t.perc>=100 then 100 else t.perc end, 2.0) as y, node_name
                            timestamp_trunc(start_time,'MI') as x,
                            max((total_ios_mills_end_value-total_ios_mills_start_value)/((extract('epoch' from end_time)-extract('epoch' from start_time))*1000))*100 as perc
                    FROM v_internal.dc_io_info_by_minute
                    WHERE start_time >= sysdate() - interval '5 minute'
                    GROUP BY node_name, timestamp_trunc(start_time, 'MI')) as t
      ORDER BY t.x;
  • Options
    HibikiHibiki Vertica Employee Employee

    As Jim introduced the link for the KB, generally you can get enough response with the nodes which has around 1TB/node. But, please perform some tests with running the mixed queries in parallel, the data loading with the actual data volume, the batch processing to see if you can achieve the business requirements. If you can achieve it and you cannot see any bottleneck, the current number of nodes is enough for you.

  • Options

    There is no limits how much data single node can handle.
    Direct answer to your question - one node would do
    Most likely, one node will not suit you, but you are not asking for anything else.

  • Options

    Yes, there are limits to how much a single node can handle. There are only so many slots for physical disks, and physical disk volumes have limits. Is it possible to put 24 TB into a single node? Yes, but I wouldn't recommend it. You're likely to get more bang for your buck with 3 or 4 nodes depending on hardware configurations. In order for one node to happily satisfy 24 TB of data, it would need to be a very expensive machine. A single node is also a single point of failure. With 3 nodes, you get K-safety.

    A more reasonable option might be Eon mode using S3 storage. You can shove as much data as you want to into S3, and then apply as much compute as you need on the fly, or shut the whole thing down if you're not using it. So, Eon mode has a lot more flexibility here.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file