Dell Poweredge Cpu 1 Machine Check Error Detected
Contents |
caused by CPU failure Issue: An error: "CPU 1 machine check error detected." occurs cpu 1 machine check detected dell causing a Windows stop error 124 as a result
E1422 Cpu Machine Check Error
of a CPU failure. Solution: Diagnose the failure of the CPU and replace as
E1422 Cpu2 Machine Check Error. Power Cycle Ac
necessary. Additional Information: Event ID 1604 is logged in the System event log. Log Name: System Source: Server Administrator Date: 3/24/2014 8:08:42 AM
Cpu 1 Has An Internal Error (ierr)
Event ID: 1604 Task Category: Instrumentation Service Level: Error Keywords: Classic User: N/A Computer: computername.domain.com Description: Processor sensor detected a failure value Sensor location: CPU1 Chassis location: Main System Chassis Previous state was: Unknown Processor sensor status: Presence detected, IERR Event ID 1001 is logged in the the system bios has reported a machine check error System event log. Log Name: System Source: Microsoft-Windows-WER-SystemErrorReporting Date: 3/24/2014 8:49:25 AM Event ID: 1001 Task Category: None Level: Error Keywords: Classic User: N/A Computer: computername.domainname.com Description: The computer has rebooted from a bugcheck. The bugcheck was: 0x00000124 (0x0000000000000007, 0xfffffa8025c38688, 0x0000000000000000, 0x0000000000000000). A dump was saved in: C:\Windows\Minidump\032414-27970-01.dmp. Report Id: 032414-27970-01. Review of the memory dump file using Debugging Tools for Windows will result in output similar to the following. BugCheck 124, {7, fffffa8025c38688, 0, 0} Probably caused by : GenuineIntel Followup: MachineOwner --------- 2: kd> !analyze -v WHEA_UNCORRECTABLE_ERROR (124) A fatal hardware error has occurred. Parameter 1 identifies the type of error source that reported the error. Parameter 2 holds the address of the WHEA_ERROR_RECORD structure that describes the error condition. Arguments: Arg1: 0000000000000007, BOOT Error Arg2: fffffa8025c38688, Address of the WHEA_ERROR_RECORD structure. Arg3: 0000000000000000 Arg4: 00000000000
1 machine check detected. Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] Could you provide more details on these systems? Which processor(s) are you using e1422 cpu machine chk r710 on the machines giving this error? Are they different from those in the e1715 io fatal err R815's that are not giving this error? Since this sounds like it's not specific to running Linux but a cpu machine chk processor sensor transitioned to non-recoverable was asserted hardware or firmware issue, you probably also want to contact Dell support. If there is a hardware failure occurring (not out of the question), you would need to be in contact with http://www.dell.com/support/Article/us/en/04/SLN290369/EN them anyway. By the way, the latest BIOS version for this system is 2.3.0, and there are several other firmware updates that you probably want to take (iDRAC, LC, and PSU firmware updates all marked "urgent"). --Jared -----Original Message----- From: linux-poweredge-bounces-Lists On Behalf Of G.Bakalarski at icm.edu.pl Sent: Monday, March 12, 2012 10:41 AM To: linux-poweredge-Lists Subject: Non Recoverable, SEL:CPU 1 machine check detected. http://lists.us.dell.com/pipermail/linux-poweredge/2012-March/046048.html Dear All the following error randomly appears on LCDs of our servers during reboot: Non Recoverable, SEL:CPU 1 machine check detected. We noted at least 10 such events. The server works then more or less correctly, but I dont feel comfortable with this (orange blinking LCD). This does not happen every reboot, nor periodically . Just after next reboot it appeared. There are some variants of this message - mainly CPU numer changes. Within error log it says that a sensor caused non recovarable assertion ..... When I clear event log during boot (CTRL-E), then make hardware test - no errors are detected ... And all look fine - until next Nth reboot (N from 1 to 10) It happened on at least 5 new R815 server (BIOS 1.5.2) ... We are running Debian Linux on our servers .. Any hints ??? GB _______________________________________________ Linux-PowerEdge mailing list Linux-PowerEdge at dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Previous message: Non Recoverable, SEL:CPU 1 machine check detected. Next message: Non Recoverable, SEL:CPU 1 machine check detected. Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] More information about the Linux-PowerEdge mailing list
Dell Technologies Brand Manager GROUP SPONSORED BY DELL TECHNOLOGY IN THIS DISCUSSION Join the Community! Creating your account only takes a few minutes. Join Now I have two PE1950 ii's that are tossing CPU errors. (I just got my https://community.spiceworks.com/topic/390081-poweredge-1950-cpu-errors-in-log hands on them, didn't know they existed before today) One has been doing this one since February Critical System Boot CPU1 Status: processor sensor for CPU1, IERR was asserted OK System https://bugs.launchpad.net/bugs/926136 Boot CPU1 Status: processor sensor for CPU1, IERR was deasserted The other one got this once in June.. Non-Recoverable 06/13/2013 15:13:50 CPU 1 machine check detected. BOTH SYSTEMS BOOT NORMALLY. I'm not seeing these codes on the machine check display or on bootup, I'm finding them in the DRAC logs. That being said, they were sent to me because someone said they saw them and couldn't get the machines to boot. What is typically done in these cases? I know the ideal thing would be to replace the machines. That is not possible. I can, however, replace the motherboards and CPU's if needed. Thanks Reply Subscribe View Best Answer RELATED TOPICS: Poweredge 1950/2950 "best" supported processors? machine check error cpu   5 Replies Jalapeno OP Eric Hickman Oct 2, 2013 at 6:36 UTC I'd start with reseating the CPU's... Any indication of them overheating or anything? 1 Datil OP utsec.net Oct 2, 2013 at 6:40 UTC Power Users, LLC is an IT service provider. Check your memory configurations. Maybe you have it in Mirror mode, and it needs to be in optimized mode. Just going off of this posting with someone having the same errors. Doing a quick search for that error brings up many others having the same issue. Look through these and see if they were able to resolve them. 1 Thai Pepper OP Dukat Oct 2, 2013 at 6:44 UTC Power Users wrote: Check your memory configurations. Maybe you have it in Mirror mode, and it needs to be in optimized mode. Just going off of this posting with someone having the same errors. Doing a quick search for that error brings up many others having the same issue. Look through these and see if they were able to resolve them. The one with the massive number of CPU errors DID have faulty memory as well, which I have replaced. I didn't see memory as being an issue in the threads that I found, but I see what you are talking about. Eric Hickman wrote: I'd start with reseating the CPU's... Any indication of
12 This bug affects 2 people Affects Status Importance Assigned to Milestone Checkbox Edit Fix Released High Brendan Donegan Edit You need to log in to change this bug's status. Affecting: Checkbox Filed here by: Brendan Donegan When: 2012-04-25 Confirmed: 2012-05-09 Assigned: 2012-04-25 Started work: 2012-05-09 Completed: 2012-05-09 Target Distribution Baltix BOSS Juju Charms Collection Elbuntu Guadalinex Guadalinex Edu Kiwi Linux nUbuntu PLD Linux Tilix tuXlab Ubuntu Ubuntu Linaro Evaluation Build Ubuntu RTM Package (Find…) Project (Find…) Status Importance Fix Released High Assigned to Me Brendan Donegan (brendan-donegan) Comment on this change (optional) Email me about changes to this bug report linux (Ubuntu) Edit Invalid Medium Unassigned Edit You need to log in to change this bug's status. Affecting: linux (Ubuntu) Filed here by: Brendan Donegan When: 2012-02-03 Completed: 2012-05-22 Target Distribution Baltix BOSS Juju Charms Collection Elbuntu Guadalinex Guadalinex Edu Kiwi Linux nUbuntu PLD Linux Tilix tuXlab Ubuntu Ubuntu Linaro Evaluation Build Ubuntu RTM Package (Find…) Project (Find…) Status Importance Invalid Medium Assigned to Nobody Me Comment on this change (optional) Email me about changes to this bug report Precise Invalid Medium Unassigned Edit You need to log in to change this bug's status. Affecting: linux (Ubuntu Precise) Filed here by: Ara Pulido When: 2012-02-08 Completed: 2012-05-22 Package (Find…) Status Importance Invalid Medium Assigned to Nobody Me Comment on this change (optional) Email me about changes to this bug report Also affects project (?) Also affects distribution/package Nominate for series Bug Description Running Precise Alpha2 on a Dell PowerEdge M610 I am finding that: echo 0 > /sys/devices/system/cpu/cpu1/online fails because the content of the 'online' file is already 0. This means 1 of the systems 16 cores is offline by default. I have not seen this in Oneiric so I assume it's not intended. Tags: precise regression-release kernel-da-key kernel-request-3.2.0-13.22 kernel-request-3.2.0-14.23 kernel-request-3.2.0-15.24 Edit Tag help Related branches lp:~brendan-donegan/checkbox/bug926136_cpu_offlining Merged into lp:checkbox at revision 1372 Danie