What kind of query performance hit should I expect when a node is down?
We have a 5-node cluster operating on K-safety 1, and we have noticed that query performance drops dramatically when a node is down (as much as 10x slowdown or more). Load performance also drops, but to a much smaller degree (more like 2x). During a recent outage, we generated an EXPLAIN plan for the following query: SELECT pk AS clip_key, SUM(viewDelta) AS view_count, SUM(rateDelta) AS rating_count, SUM(commentDelta) AS comment_count, SUM(ratingDelta) AS rating FROM clip WHERE pk = '86dd59d498a00acf1c0fdecd6db4074b' GROUP BY pk; Access Path: +-GROUPBY PIPELINED [Cost: 107K, Rows: 2K] (PATH ID: 1) | Aggregates: sum(clip.viewDelta), sum(clip.rateDelta), sum(clip.commentDelta), sum(clip.ratingDelta) | Group By: clip.pk | Execute on: v_prod_node0001, v_prod_node0002, v_prod_node0003, v_prod_node0005 | +---> STORAGE ACCESS for clip [Cost: 91K, Rows: 9M] (PATH ID: 2) | | Projection: vid.clip_sp_temp_b1 | | Materialize: clip.pk, clip.viewDelta, clip.commentDelta, clip.rateDelta, clip.ratingDelta | | Filter: (clip.pk = '86dd59d498a00acf1c0fdecd6db4074b') | | Execute on: v_prod_node0001, v_prod_node0002, v_prod_node0003, v_prod_node0005 +-GROUPBY PIPELINED (REPLACEMENT FOR DOWN NODE) [Cost: 142M, Rows: 2K] | Aggregates: sum(clip.viewDelta), sum(clip.rateDelta), sum(clip.commentDelta), sum(clip.ratingDelta) | Group By: clip.pk | Execute on: v_prod_node0003 | +---> STORAGE ACCESS for clip [Cost: 142M, Rows: 9M] | | Projection: vid.clip_sp_temp_b0 | | Materialize: clip.pk, clip.viewDelta, clip.commentDelta, clip.rateDelta, clip.ratingDelta | | Filter: (clip.pk = '86dd59d498a00acf1c0fdecd6db4074b') | | Execute on: v_prod_node0003 I am wondering why the storage access cost for the failover node is 3 orders of magnitude higher than for the original node, and if that is related to the dramatic query slowdown that we experience. I am also wondering what kind of query slowdown is normal for a 5-node cluster with one node in the process of recovering, and whether there are steps we can take to reduce the impact of a node outage.
0
Comments