Mce 1282 Status Bits Memory Controller Read Error
Contents |
while under a certain CPU or Memory intensive load - or even at random. Most of the times without throwing a cmci signaling for patrol scrub ucr errors not supported Purple Screen of Death so you can at least have a machine check exception decoder notion about what went wrong. There is a VMware KB Article 1005184 concerning this issue, and it has been updated intel machine check exception decoder significantly since I have started to take interest in these errors. UPDATE: I have published a new CPU Stress Test & Machine Check Error debugging article - check it out
Mce: 582: Registering Error Recovery Bh
if you'd like to learn more. If you are "lucky", you can see and decode yourself what preceded the crash. This is because both AMD and Intel CPUs have implemented something by the name of Memory Check Architecture. This architecture enables the CPUs to intelligently determine a fault that happens anywhere on the data transfer path during processor operation. machine check exception error This can capture Memory operation errors, CPU Bus interconnect errors, cache errors, and much more. How to determine what has been causing your system to fail? Read on. You will need to browse to Intel's website hosting the Intel® 64 and IA-32 Architectures Software Developer Manuals. There, download a manual named "Intel 64 and IA-32 Architectures Software Developer's Manual Combined Volumes 3A, 3B, and 3C: System Programming Guide". I highly recommend printing it, because you will be doing some back-and-forth seeking. Now, to get list of possible Machine Check Errors captured by the VMkernel, run the following in your SSH session with superuser privileges: cd /var/log;grep MCE vmkernel.log this will output something similar to this: Most of the times, the VMkernel decodes these messages for you - on this image you see that there are plenty of Memory Controller Read Errors. You can see more closely where the problem originates from: CMCI: This stands for Corrected Machine Check Interrupt - an error was captured but it was corrected and the VMkernel can keep on running. If
? Ask a question, help others, and get answers from the community Discussions Start a thread and discuss today's topics with top experts Blogs Read the latest tech blogs written by experienced community members >>VIEW ALL POSTS The Real (and Virtual) Adventures of Nathan the IT pf exception 14 in world Guy « BlackBerry 10 “Ready Offer” Free Device to Enterprise Think your Smartphone is Fast?
Mcelog
» Dec 18 2012 2:22PM GMT How to troubleshoot a Purple Screen of Death on an ESXi Host Nathan Simon Profile: Nathan Simon
Psod
So your ESXi host is stuck at a PSOD or the “Purple Screen of Death”, what do you do? Well one would figure its hardware, but it also could be software related. Well I am going to tell you https://vmxp.wordpress.com/2014/10/27/debugging-machine-check-errors-mces/comment-page-1/ how to download and review the error logs. Mind you the way I am going to explain it is if the host can boot up and be connected to either vCenter or VI Client. I will also show you a command you can run from the service console if you just want the support logs to send to VMware. Onto the Information. First you want to have the host back up and running, it could be unstable at http://itknowledgeexchange.techtarget.com/information-technology/how-to-troubleshoot-a-purple-screen-of-death-on-an-esxi-host/ the moment, but you should have enough time to pull the support logs. Highlight the host in question. Click on File (top left of the VI Client), then click on “Export” then “Export System Logs” Your next screen will allow you to select the system logs you would like to export, I just select them all. Once you click next you can select where you want to export them to. Click next to start the export. Use a program like 7-Zip to extract the newly created file to a temporary location, once it is extracted you need to extract again, I know, they doubled up the compression, more so to keep the normal folk out! 🙂 Once everything is extracted you should see the following folders. The most important one is the "Core" folder which contains the kernel dump, the PSOD will purge what was in memory to a file called vmkernel-zdump.1 or something to that affect and place it in that directory. You will have to use something like NotePad++ to open the vmkernel-zdump file, once you do, you can pretty much search for “error” or “fail” or “panic” and you should find your issue. In my example, there is a memory bank error, see below. 2012-12-17T13:07:25.816Z cpu19:8211)MCE: 1278: CMCI on cpu19 bank9: Status:0x900000400800009f Misc:0x0 Addr:0x0: Valid.Err enabled. 2012-12-17T13:07:25.816Z cpu19:8211)MCE: 1282: Status bits: "Memory Controller Read Error." 2012-12-17T13:07:26.367Z cpu19:8211)MCE: 1278: CMCI on cpu19 bank9: Status:0x
work correctly without it enabled. Please turn JavaScript back on and reload this page. Please enter a title. You can not post a blank message. Please type your message and try again. More discussions in Processors All PlacesSupport CommunityProcessors 2 Replies Latest reply on https://communities.intel.com/thread/30221 Jun 28, 2012 9:13 AM by Rob MCE Analysis Help Rob Jun 27, 2012 9:54 AM Any ideas where I can get some help with the analysis of the MCEs below? Using an https://gitlab.com/CallMeAldy/styx-condor/blob/607b30fcf20c6e5339591692db6ffa0b15e041a0/drivers/edac/i7core_edac.c "unqualified" OS (CentOS), which my OEM vendor doesn't support and therefore doesn't have the support pack tools that hook into the OS for analysis. They suggested I "ask Intel" to provide an analysis machine check of what part of the subsystem may be having the problem. OEM vendor is suggesting this is potentially not strictly a hardware error despite what the MCE says, and might actually be an interop problem between the OS and the hardware. These are IA64 systems, and I'm seeing them occur regularly on multiple machines.Thanks in advance,-RobHARDWARE ERROR. This is NOT a software problem!Please contact your machine check exception hardware vendorMCE 12CPU 0 BANK 8 MISC 14a6688000011080 ADDR 8e41d65c0 TIME 1340190061 Wed Jun 20 11:01:01 2012MCG status:MCi status:MCi_MISC register validMCi_ADDR register validMCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERRTransaction: Memory read errorSTATUS 8c0000400001009f MCGSTATUS 0MCGCAP 1c09 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44HARDWARE ERROR. This is NOT a software problem!Please contact your hardware vendorMCE 0CPU 1 BANK 8 MISC 4702108000016000 TIME 1340154061 Wed Jun 20 01:01:01 2012MCG status:MCi status:MCi_MISC register validMCA: MEMORY CONTROLLER MS_CHANNELunspecified_ERRTransaction: Memory scrubbing errorSTATUS 88000040000200cf MCGSTATUS 0MCGCAP 1c09 APICID 20 SOCKETID 1 CPUID Vendor Intel Family 6 Model 44HARDWARE ERROR. This is NOT a software problem!Please contact your hardware vendorMCE 31CPU 0 BANK 8 MISC d847010400011287 ADDR 87bc2aac0 TIME 1340215261 Wed Jun 20 18:01:01 2012MCG status:MCi status:MCi_MISC register validMCi_ADDR register validMCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERRTransaction: Memory read errorSTATUS 8c0000400001009f MCGSTATUS 0MCGCAP 1c09 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 3187Views Tags: none (add) itaniumContent tagged with itanium, mceContent tagged with mce, mcaContent tagged with mca This content has been marked as final. Show 2 replies 1. Re: MCE Analysis Help Adolfo_Intel Jun 28, 2012 8:47 AM (in response to Rob) Please let me know what is the K
0 Merge Requests 0 Wiki Network Create a new issue Commits Issue Boards Files Commits Network Compare Branches Tags Locked Files 607b30fcf20c6e5339591692db6ffa0b15e041a0 Switch branch/tag styx-condor drivers edac i7core_edac.c i7core_edac: Better describe the supported devices · 52707f91 ... 52707f91
Signed-off-by: Mauro Carvalho Chehab