Transaction: Memory read error ? what does it means / impact / solution

Hello Everyone

I  see Transaction: Memory read error on vertica server. The detial log statement is mentioned. I need to know what does Transaction: Memory read error means ? and what could be the impact on server if Transaction: Memory read error occured ? Also what could be the solution in order to get rid from Transaction: Memory read error issue ? PLEASE ADVISE.

 

MCi_MISC register valid

MCi_ADDR register valid

MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR

Transaction: Memory read error

STATUS 8c00004000010093 MCGSTATUS 0

MCGCAP 1000819 APICID 2e SOCKETID 1

CPUID Vendor Intel Family 6 Model 62

Hardware event. This is not a software error.

MCE 0

 

CPU 15 BANK 7

MISC 140225200 ADDR 31303335c0

TIME 1470451921 Sat Aug  6 08:37:01 2016

MCG status:

MCi status:

Corrected error

MCi_MISC register valid

MCi_ADDR register valid

MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR

Transaction: Memory read error

STATUS 8c00004000010093 MCGSTATUS 0

MCGCAP 1000819 APICID 2e SOCKETID 1

CPUID Vendor Intel Family 6 Model 62

Comments

  •  

     

    Looks like your server having hardware issues.

     

     

    Looks like bad memory module.

    It complains about "CPU 15 BANK 7"

     

    Work with your system admin to do hardware diagnostic.

    If server under warranty then open support case with server manufacturer.

  • As you mentioned this Looks like bad memory module.. I wonder what impact could this error bring

     

    Please advise

     

     

  • There is no easy answer to this question, too many variables at play.

     

    Some Memory modules have functionality to recover from some memory errors and other modules do not.

     

    It all depends on what kind of RAM you have, how server can work with that RAM, etc.

     

    Impact could range from INFORMATIONAL message to server down and not bootable.

     

    I would recommend following:

     

    1. if Vertica still running, stop Vertica on this node
    2. Work with system administrator and run RAM diagnostics based on instructions from hardware manufacturer
    3. Open support case with hardware manufacturer
    4. After hardware issue fully addressed start Vertica on this node

     

    Any data missing on this node will be available on other nodes in the cluster.

    Restarted node should RECOVER on it's own. During recovery it will pull all it needs from other nodes.

     

    Once RECOVERy complete your Vertica cluster will be fully redundant and should have no data loss.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file