HP Vertica Hardware Planning Guide

In reviewing the HP Vertica Analytics Platform 6.1.xInstallation GuideDoc Revision 3Date of Publication: Monday, September 23, 2013, page 12 of the .pdf refers to the link: Detailed hardware recommendations are available in the HP Vertica Hardware Planning Guide http://my.vertica.com/docs/6.1.x/HardwareDocs/HP_Vertica%20Hardware%20Planning%20Guide.pdf. which is 404'd anyone have a link for where this is at? I've searched to no avail

Comments

  • Try here:
    http://my.vertica.com/docs/6.1.x/HardwareDocs/HP_Vertica%20Planning%20Hardware%20Guide.pdf

    Looks like, for some reason, they flipped the words Planning and Hardware around. *shrug*

    I'll let the doc team know they need to revise the link.

    Thank you
  • Since there are typically two projections per table, why are odd numbers of nodes typically mentioned?
  • The number of projections and the number of nodes have no relation to each other.  A single projection always (unless otherwise specified) exists on all nodes.

    That doesn't mean that all data is replicated on all nodes.  You can do that; it's called an unsegmented projection.  But most projections are segmented; meaning that, for example, on a three-node cluster, each node has 1/3 of your data.

    If you have a second projection, it will also be distributed among all nodes by default; but it will be distributed differently such that, if one node goes down, we have a copy of all of that node's records elsewhere in the cluster.  (The distribution algorithm is clever; clusters can survive multi-node failures in some cases even with only a single extra copy of the data.)


    The "odd number of nodes" thing is actually due to a different requirement:  If you create a cluster, all accesses to that cluster (regardless of which node you connect to) must always be fully transactional.  In order to ensure this, all nodes must be able to talk to each other.  So what if you have your nodes split between two racks, and someone trips over the Ethernet cable between the racks and unplugs it?  Which rack is the real cluster?  Can't be "both"; then you could commit data on one rack that violates a constraint added on the other rack; etc.  And the clusters by definition can't talk to each other to sort things out; the one doesn't even know that the other is still up and running.  So we have a rule:  A viable cluster must always contain *more than half* of its nodes.  If a small set of nodes are separated from the cluster, they shut themselves down; they then recover when connectivity is restored.

    Extending that logic, a 2-node cluster doesn't provide high availability:  If you unplugged one from the other, both will realize that they could get out of sync with each other, so both will shut down.  The smallest high-availability Vertica cluster is therefore a 3-node cluster.  If you want to be able to lose up to two nodes, you need at least a 5-node cluster; etc.

    This tends to matter most with small clusters.  With two nodes, it's not uncommon for one to go down; but with 20 nodes, if 10 all fail simultaneously, that's probably a sign of a bigger problem...
  • Hi,
    I am unable to retrieve the Hardware Planning Guide from either of the URLs above. Please point me to the current URL.

    Thanks.
  • Don't know if anyone is still checking this, but I was able to find the document here: https://my.vertica.com/docs/Hardware/HP_Vertica%20Planning%20Hardware%20Guide.pdf

    Did a Google search for "filetype:pdf HP_Vertica Planning Hardware Guide.pdf"


Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file