how many nodes in an ideal cluster

Are even numbers or odd numbers of nodes preferred? Is load distributed evenly regardless of the number of nodes?


  • I ask because we have a 5 node cluster where node 5 consistently runs higher IO wait time than the other 4 nodes.
  • Hi Jack, The number of nodes will depend of the amount of data and type of query that you run. If your queries are CPU/IO intensive more node the better. But if the issues is with 4 nodes are more io intensive that one node, check if the data is skewed. You can do a simple query like select node_name, sum(used_bytes) from projection_storage group by 1; and data should be evenly distributed. The load balancer what it does is to set the initiator of the transaction but to answer a query all the nodes should have the same amount of data to process. Does this make sense? If your data is skewed, that means that your projections do not have a right segmentation or it is not well rebalanced. In that case, I will recommend you to open a support ticket so they help you to investigate further. Eugenia
  • vertica=> select node_name, sum(used_bytes) from projection_storage group by 1; node_name | sum ---------------------+--------------- v_statsdb1_node0001 | 995835534583 v_statsdb1_node0002 | 996296069086 v_statsdb1_node0003 | 1035617264812 v_statsdb1_node0004 | 1035593250429 v_statsdb1_node0005 | 1041660351041 (5 rows) They look similar, but the troublesome node does have the most storage.
  • Hi Jack (and Eugenia), I think Eugenia's answer is already more thorough than what I had :-) But, just one more comment: When I see one node going slower with high I/O wait, my first instinct is "is that node's disk working properly?" I assume you have a RAID array? Is it running degraded and/or currently rebuilding a drive? Have you verified that its performance is the same as the other systems? (Sometimes minor configuration differences can cause big performance issues...) Anyway, just a thought, Adam
  • Hi Jack and Adam :) Adam has a good point too. Vertica has a tool vioperf that measure I/O through put. Search in the documentation details and run it in the 5 nodes to see if you see any difference. Hope that helps, Eugenia.
  • On the write test conducted on nodes 1 and 3, I'm getting 0 MB/s and %IO Wait from 11 to 16. Read tests are better, ranging from 13 to 21 MB/s, but still nowhere close to the recommendation for a 12 physical CPU server. The vioperf tool shouldn't be used on a running database, should it. Details: /opt/vertica/bin/vioperf --log-file=/tmp/vioperf.out --condense-log /data/vertica/statsdb1 Using direct io (buffer size=1048576, alignment=512) for directory "/data/vertica/statsdb1 $ free total used free shared buffers cached Mem: 74177420 72261848 1915572 0 9787200 54910868 -/+ buffers/cache: 7563780 66613640 Swap: 4194296 292 4194004
  • Adam, We are using hardware RAID 5 on each node: Smart Array P410i. The status is OK on all controllers and drives.
  • Hi Jack, Not even with a database running I saw such a bad IO performance. Depending on the version of Vertica that you are running vertica capture the IO statistics too, check if you have those tables : select table_name from data_collector where table_name ilike '%dc_io_%' and node_name ilike '%01%'; table_name ---------------------- dc_io_info dc_io_info_by_second dc_io_info_by_minute dc_io_info_by_hour dc_io_info_by_day (5 rows) And query them to see what is the performance that you are getting, there is a lot of info in those tables so you should come up with queries that are useful for you. About Raid 5, Vertica recommends RAID 1+0 for direct attached DATA storage location. RAID 5 has 1 disk fail tolerance but performance get affected on reads and writes because of the way that data need to be stored for that particular fail tolerance. You need to get a good IO performance as per vertica recommendations. You have less just 1TB of data per node what should be OK. If you still have issues, I recommend you to open a support ticket, they can follow up better than us in the community edition and also get more information to see the bigger picture. Hope this helps. Eugenia

