advance checkpoint epochs on single node

phil2 · August 2018

Hello

Recently I figured out that one node is slower then the others due to different raid configuration on different nodes. So I decided to rebuild raid on that slow node to meet configuration of other nodes. I turned node down, rebuild raid, copied all data back, started node and it hunged for hours in recovery. Turned out there were delete on one table during node shutdown. And It went replaying delete vectors. And then I found this article https://my.vertica.com/hpe-vertica-troubleshooting-checklists/node-recovery-checklist/
I misunderstood that Data size ment the hole database data size and not the rocovering table data size. So there I made a huge mistake and stoped the node, moved ahm and started the node. So now I see my node fully recreating all projections from buddy nodes.

Can anybody point me if there is any option to advance that single node ahm or to advance checkpoint epoch of the projections living on that node so I can stop this disaster. They all are in a valid state except for one single table as they all already passed incremental recovery before I advanced ahm on other nodes?

please any help)

Sharon_Cutter · August 2018

Has your node already completed recovery, making this less of a "disaster"? A typical database can complete recovery from scratch in a few hours unless each node has a very large amount of data or the disks/network are underperforming.

You could have also possibly dropped the affected table and recreated it.

You can set the table recovery priorities for the future so that your most important tables recover first, getting those tables back to using all nodes and normal query execution at highest priority.

--Sharon

phil2 · August 2018

Yeah, it took 6 hours and ended well after all.

I'm wondering if there is any way to set epoch for specific projection on a single node. Or if there is any way to prevent long running recover. For now I came to a conclusion that I need to run moveout & make_ahm_now on all nodes before turning one node down to reduce delta to replay.

Sharon_Cutter · September 2018

You can prevent the long-running replay-delete by optimizing the table's projections for delete (and if there is more than one pair of projections, optimize all of them). Running make_ahm_now() before taking the node down is a good practice, but won't solve the problem of deletes occurring while a node is down.

Curious what version you are running? An alternative algorithm for replay-delete was added to try to avoid these situations, but it sounds like it's not effective in your case.

phil2 · September 2018

I'm running v8.1.1-4 with 100Tb data. Yeah it looks like a table stalling in long recovery is a waste and there is no need for delete at all. I just have to convince the table owner to get rid of it

But worst of all adding more disks and rebuilding raid seems to have no affect on node. I started all of it because vioperf showed that a one specific node had slower read performance comparing to others. So I found out that raid on that specific node had 16 disks instead of 24 like all the others. And I added 8 more disks and vioperf does not show that that specific node is a read oulier anymore. But it looks like it did not change a cluster performance at all.

I see CPU usage and LA are generally twice lower on that node. Some times there are spikes but they cease quite suddenly compering to others: yellow one is an outlier, green one is a one of the other nodes.

That specific node is a DELL with Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz and PowerEdge raid 50 with 24 disks now while all the others are HP with Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz and SmartArray raid 50 with 24 disks. Either that dell is just faster so it has less CPU on wait or it suffers some kind of throttle.

Is there any way to check what node completes query execution faster than the others?

Sharon_Cutter · September 2018

Do these two different hardware configurations have different numbers of cores? How is the EXECUTIONPARALLELISM of the resource pool set - with AUTO or a fixed number?

If you really want to dig into the timings on a per-node basis, you can PROFILE a longer-running query and check the profiling counters "start time" and "end time" for longer-running operators in the EXECUTION_ENGINE_PROFILES data.

We're Moving!

Create My New Community Account Now

advance checkpoint epochs on single node

Comments

Leave a Comment