We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


how many nodes in an ideal cluster — Vertica Forum

how many nodes in an ideal cluster

Are even numbers or odd numbers of nodes preferred? Is load distributed evenly regardless of the number of nodes?

Comments

  • I ask because we have a 5 node cluster where node 5 consistently runs higher IO wait time than the other 4 nodes.
  • Hi Jack, The number of nodes will depend of the amount of data and type of query that you run. If your queries are CPU/IO intensive more node the better. But if the issues is with 4 nodes are more io intensive that one node, check if the data is skewed. You can do a simple query like select node_name, sum(used_bytes) from projection_storage group by 1; and data should be evenly distributed. The load balancer what it does is to set the initiator of the transaction but to answer a query all the nodes should have the same amount of data to process. Does this make sense? If your data is skewed, that means that your projections do not have a right segmentation or it is not well rebalanced. In that case, I will recommend you to open a support ticket so they help you to investigate further. Eugenia
  • vertica=> select node_name, sum(used_bytes) from projection_storage group by 1; node_name | sum ---------------------+--------------- v_statsdb1_node0001 | 995835534583 v_statsdb1_node0002 | 996296069086 v_statsdb1_node0003 | 1035617264812 v_statsdb1_node0004 | 1035593250429 v_statsdb1_node0005 | 1041660351041 (5 rows) They look similar, but the troublesome node does have the most storage.
  • Hi Jack (and Eugenia), I think Eugenia's answer is already more thorough than what I had :-) But, just one more comment: When I see one node going slower with high I/O wait, my first instinct is "is that node's disk working properly?" I assume you have a RAID array? Is it running degraded and/or currently rebuilding a drive? Have you verified that its performance is the same as the other systems? (Sometimes minor configuration differences can cause big performance issues...) Anyway, just a thought, Adam
  • Hi Jack and Adam :) Adam has a good point too. Vertica has a tool vioperf that measure I/O through put. Search in the documentation details and run it in the 5 nodes to see if you see any difference. Hope that helps, Eugenia.
  • On the write test conducted on nodes 1 and 3, I'm getting 0 MB/s and %IO Wait from 11 to 16. Read tests are better, ranging from 13 to 21 MB/s, but still nowhere close to the recommendation for a 12 physical CPU server. The vioperf tool shouldn't be used on a running database, should it. Details: /opt/vertica/bin/vioperf --log-file=/tmp/vioperf.out --condense-log /data/vertica/statsdb1 Using direct io (buffer size=1048576, alignment=512) for directory "/data/vertica/statsdb1 $ free total used free shared buffers cached Mem: 74177420 72261848 1915572 0 9787200 54910868 -/+ buffers/cache: 7563780 66613640 Swap: 4194296 292 4194004
  • Adam, We are using hardware RAID 5 on each node: Smart Array P410i. The status is OK on all controllers and drives.
  • Hi Jack, Not even with a database running I saw such a bad IO performance. Depending on the version of Vertica that you are running vertica capture the IO statistics too, check if you have those tables : select table_name from data_collector where table_name ilike '%dc_io_%' and node_name ilike '%01%'; table_name ---------------------- dc_io_info dc_io_info_by_second dc_io_info_by_minute dc_io_info_by_hour dc_io_info_by_day (5 rows) And query them to see what is the performance that you are getting, there is a lot of info in those tables so you should come up with queries that are useful for you. About Raid 5, Vertica recommends RAID 1+0 for direct attached DATA storage location. RAID 5 has 1 disk fail tolerance but performance get affected on reads and writes because of the way that data need to be stored for that particular fail tolerance. You need to get a good IO performance as per vertica recommendations. You have less just 1TB of data per node what should be OK. If you still have issues, I recommend you to open a support ticket, they can follow up better than us in the community edition and also get more information to see the bigger picture. Hope this helps. Eugenia

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file