Transaction: Memory read error ? what does it means / impact / solution
Hello Everyone
I see Transaction: Memory read error on vertica server. The detial log statement is mentioned. I need to know what does Transaction: Memory read error means ? and what could be the impact on server if Transaction: Memory read error occured ? Also what could be the solution in order to get rid from Transaction: Memory read error issue ? PLEASE ADVISE.
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Transaction: Memory read error
STATUS 8c00004000010093 MCGSTATUS 0
MCGCAP 1000819 APICID 2e SOCKETID 1
CPUID Vendor Intel Family 6 Model 62
Hardware event. This is not a software error.
MCE 0
CPU 15 BANK 7
MISC 140225200 ADDR 31303335c0
TIME 1470451921 Sat Aug 6 08:37:01 2016
MCG status:
MCi status:
Corrected error
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Transaction: Memory read error
STATUS 8c00004000010093 MCGSTATUS 0
MCGCAP 1000819 APICID 2e SOCKETID 1
CPUID Vendor Intel Family 6 Model 62
Comments
Looks like your server having hardware issues.
Looks like bad memory module.
It complains about "CPU 15 BANK 7"
Work with your system admin to do hardware diagnostic.
If server under warranty then open support case with server manufacturer.
As you mentioned this Looks like bad memory module.. I wonder what impact could this error bring
Please advise
There is no easy answer to this question, too many variables at play.
Some Memory modules have functionality to recover from some memory errors and other modules do not.
It all depends on what kind of RAM you have, how server can work with that RAM, etc.
Impact could range from INFORMATIONAL message to server down and not bootable.
I would recommend following:
Any data missing on this node will be available on other nodes in the cluster.
Restarted node should RECOVER on it's own. During recovery it will pull all it needs from other nodes.
Once RECOVERy complete your Vertica cluster will be fully redundant and should have no data loss.