what happens to performance when a node crashes

Hi,

 

I've worked with another MPP product which pairs nodes to achieve fault tolerance against losing a node due to h/w faults etc. This means that in say a 12 node system, if 1 node fails, the 2nd node in the pair has to do the job of both nodes, so the overall system throughput falls to that of a 6 node system (well nearly) ... ie the other pairs finish their work and then have to wait for the single node to complete. This can be partly tackled by grouping nodes into 4s rather than pairs, but this then introduces diskspace drawbacks.

 

Apologies if Ive missed this in the documentation, but can someone explain what happens to Vertica performance if a node drops out (assuming the system is still ksafe). Do querys access just the main projection (with the buddy projection maintained purely as a backup), or (where K=1) can they access both the main and buddy projection in parallel ... meaning that if one node goes down, performance would drop because only one projection is available (just like the other MPP Ive worked with) ?

 

Thanks,

 

 

Comments

  • As you mention if you lose a node in the cluster a node has to answer queries for 2 nodes. so the performance will degradate. However, in a 12 node cluster as you mention, if you lose 5 nodes in the cluster, it won't be worse than losing just 1. Note that the load should not change.

     

    How much it degradates, it will depend on the query and data distribuition.

     

    Hope this answer your questions, I think that you want more details but as mention it will depend a lot on type of queries and resources. For example if now the node that has to answer for 2 has to spill on disk because it has much more data to process, it will be slower. Does make sense?

     

    Eugenia

  • Hi, again thanks for the quick response.

     

    I think I've got it ... the other MPP product stripes its data across all 12 nodes, meaning that queries are run in parallel against all nodes, so if 1 node in a pair fails, the query finishes on the other node pairings and has to wait for the last pair (which has 1 node failes) to complete. As this final pair is running at half speed, the system effectively runs at half speed.

     

    Vertica's buddy projection approach means that only queries against data held in a projection which is stored on the failed node will degrade, whereas the other queries run as normal, hence the system as a whole doesnt degrade.

     

    Thanks,

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file