Bus Uncorrectable Error
Contents |
BladeCenter HS23 Blade Sever reporting 'Bus Uncorrectable Error' with 16 GB DIMMs, option part number 90Y3157, replacement part number 90Y3159 - IBM BladeCenter HS23 Applicable countries and regions Source RETAIN tip: H212016 Symptom IBM BladeCenter HS23 Blade Servers populated with cpus bus uncorrectable error 16 GB VLP RDIMMs, option part number 90Y3157, replacement part number 90Y3159, while under heavy
Critical Interrupt Bus Uncorrectable Error
I/O load might intermittently restart. The following errors are logged in the Advanced Management Module (AMM) event log: 0x806F0813 Group bus uncorrectable error ibm 4, (processor 1-2) (CPUs) bus uncorrectable error 0x806F010C Group 1, (memory device 1-16) (One of the DIMMs) uncorrectable ECC memory error Affected System UEFI code levels are: UEFI: v1.50, Build ID: TKE136TUS UEFI: v1.51, Build ID:
Uncorrectable Bus Error Has Occurred On Bus Cpus
TKE136V (where VLP= very low profile, RDIMM = Registered Dual In-Line Memory Module) Affected configurations The system can be any of the following IBM servers: BladeCenter HS23, type 1929, any model BladeCenter HS23, type 7875, any model The system is configured with one or more of the following IBM options: 16 GB (Dual-Rank x4) 1.5 V PC3-12800 CL11 ECC DDR3 1600 MHz VLP RDIMM, option part number 90Y3157, any model This tip is uncorrectable error count not software specific. The following system BIOS or UEFI levels are affected: UEFI: v1.50, Build ID: TKE136TUS UEFI: v1.51, Build ID: TKE136V The system has the symptom described above. Solution This behavior is corrected in Memory Reference Code Release (MRC) version 2.0.0.3 that is incorporated in system UEFI firmware Version 1.60, Build ID: tke140y. The file is available by selecting the appropriate Product Group, type of System, Product name, Product machine type, and Operating system on IBM Support's Fix Central web page, at the following URL: http://www.ibm.com/support/fixcentral/ Workaround There are two workaround options available for this issue. Workaround 1 Change memory performance mode in the UEFI F1 Setup menu, from Max Performance to Balanced Performance: Select F1 System Configuration and Boot Manager. Select System Settings -> Operating Modes Select Operating Mode to Custom Mode. Change Memory Speed from Max Performance to Balanced Performance. Select Save Settings. Exit UEFI. Workaround 2 Use UEFI Version 1.42 - Build ID: TKE130D, which can be located using these steps: Navigate to IBM Fix Central Web page and select the appropriate Product Group, type of System, Product name, Product machine type, and Operating system http://www.ibm.com/support/fixcentral/ After the Select Fixes page is loaded for the IBM HS23 Blade Server at the top of the page, click on the hyperlink Include supersede
Error Code = IMM events displayed by AMM (for example, Service Advisor, AMM web interface) Event ID = IMM events displayed by DSA diagnostic program (for example, in the Chassis Event Log section) Follow the suggested
Uncorrectable Error Imgburn
actions in the order in which they are listed in the Action column until
Uncorrectable Error In Data
the problem is solved. See Parts listing to determine which components are consumable, structural, or CRU parts. If an action step uncorrectable error count fail is preceded by "(Trained technician only)," that step must be performed only by a trained technician. Error Code Event ID Type Error Message Action 0x80010000 80010002-0701ffff Warning System board (CMOS battery) voltage under warning threshold. https://www.ibm.com/support/entry/portal/docdisplay?lndocid=migr-5094667 with chassis Reading: X, Threshold: Y If the machine has been recently installed, moved, or serviced, ensure the system battery is properly seated, and the system battery polarity is correct. Replace the system battery (see Removing the battery and Installing the battery). 0x80010200 80010202-0701ffff Error System board (CMOS battery) voltage under critical threshold. Reading: X, Threshold: Y Remove all expansion cards from the blade server (see Removing an I/O expansion card). https://publib.boulder.ibm.com/infocenter/bladectr/documentation/topic/com.ibm.bladecenter.hs23.doc/IMM_error_messages.html Remove all storage drives from the blade server (see Removing a hot-swap storage drive). If the error still occurs, replace the system-board assembly (see Removing the system-board assembly and Installing the system-board assembly). 0x80010200 80010202-0701ffff Error System board (SysBrd 5V) voltage under critical threshold. with chassis Reading: X, Threshold: Y Remove all expansion cards from the blade server (see Removing an I/O expansion card). Remove all storage drives from the blade server (see Removing a hot-swap storage drive). If the error still occurs, replace the system-board assembly (see Removing the system-board assembly and Installing the system-board assembly). 0x80010200 80010202-0701ffff Error System board (SysBrd 12V) voltage under critical threshold. with chassis Reading: X, Threshold: Y If the under voltage problem is occurring on all blade servers, look for other events in the log related to power and resolve those events (see Event logs). View the event log provided by the advanced management module for your BladeCenter unit and resolve any power related errors that might be displayed. If other modules or blades are logging the same issue then check the power supply for the BladeCenter unit. If the error still occurs, replace the system-board assembly (see Removing the system-board assembly and Installing the system-board assembly). 0x80010200 80010202-0701ffff Error System board (CMOS b
for Help Receive Real-Time Help Create a Freelance Project Hire for a Full Time Job Ways to Get Help Ask a Question Ask for Help Receive Real-Time Help Create a Freelance Project Hire for a Full Time Job Ways to Get Help Expand https://www.experts-exchange.com/questions/28036158/URGENT-ibm-x3650-m3-vmware-hardware-alarm-Bus-Uncorrectable-error.html Search Submit Close Search Login Join Today Products BackProducts Gigs Live Careers Vendor Services http://www.patentsencyclopedia.com/app/20090319836 Groups Website Testing Store Headlines Experts Exchange > Questions > URGENT! ibm x3650 m3 - vmware hardware alarm: Bus Uncorrectable error Want to Advertise Here? Solved URGENT! ibm x3650 m3 - vmware hardware alarm: Bus Uncorrectable error Posted on 2013-02-18 VMware Server Hardware 1 Verified Solution 4 Comments 6,330 Views Last Modified: 2013-03-24 Hardware: x3650 m3 (7945ac1) ESXi: uncorrectable error 5.1.0 799733 (ibm-specific build) The host just rebooted and now shows hardware alarms (after the reboot): Group 2 PCIs: Bus Uncorrectable error (I don't have host logs prior to reboot because the tech who installed esxi 5.1 had not yet set the syslog to persistent storage...) I can't find much info on the alert - is it a concern? I've tried "Reset Sensors" and Refresh, and the alerts are still present. 0 Question bus uncorrectable error by:snowdog_2112 Facebook Twitter LinkedIn Google LVL 116 Active today Best Solution byAndrew Hancock (VMware vExpert / EE MVE) Hardware Fault on motherboard or backplane, get escalated to IBM Support for Engineer Repair. reseat any pci devices if applicable Go to Solution 4 Comments Message Active today Author Comment by:snowdog_21122013-02-18 more info - the IMM event log shows the following at the time of the reboot: 02/18/2013; 15:05:01 0x816f03131701ffff System "SN# xxxx" has recovered from an NMI 02/18/2013; 15:03:46 0x806f002125820900 Fault in slot "All PCI Error" on system "SN# xxxx" 02/18/2013; 15:03:46 0x806f002130010901 Fault in slot "PCI 1" on system "SN# xxxx" 02/18/2013; 15:03:40 0x806f08132582ffff A Uncorrectable Bus Error has occurred on system "SN# xxxx" 02/18/2013; 15:03:40 0x806f03131701ffff A software NMI has occurred on system "SN# xxxx" Select all Open in new window 0 LVL 116 Overall: Level 116 VMware 109 Server Hardware 28 Message Active today Accepted Solution by:Andrew Hancock (VMware vExpert / EE MVE)2013-02-18 Hardware Fault on motherboard or backplane, get escalated to IBM Support for Engineer Repair. reseat any pci devices if applicable 0 Message Active today Author Closing Comment by:snowdog_21122013-02-18 lightpath also indicates a PCI fault. VLP from IMM - Fault: orange PCI: orange PCI1: orange <-- this must be the slot? (pci2 - 4): off 0 Message Expert Co
(Austin, TX, US) Mukund Purshottam Khatri (Austin, TX, US) Theodore Stratton Webb, Iii (Austin, TX, US) Assignees: DELL PRODUCTS L.P. IPC8 Class: AG06F1107FI USPC Class: 714 57 Class name: Reliability and availability error detection or notification error forwarding and presentation (e.g., operator console, error display) Publication date: 2009-12-24 Patent application number: 20090319836 Sign up to receive free email alerts when patent applications with chosen keywords are published SIGN UP Abstract: A method for recovery from uncorrectable errors in an information handling system including an operating system (OS) and one or more network interface cards (NICS) is provided. The method may include detecting an uncorrectable error; determining whether the uncorrectable error is isolated to a particular NIC; determining whether the particular NIC is teamed with one or more other NICs; and notifying the OS of a successful recovery from the uncorrectable error if it is determined that (a) the uncorrectable error is isolated to a particular NIC, and (b) the particular NIC is teamed with one or more other NICs.Claims: 1. A method for recovery from uncorrectable errors in an information handling system including an operating system (OS) and one or more network interface cards (NICs), the method comprising:detecting an uncorrectable error;determining whether the uncorrectable error is isolated to a particular NIC;determining whether the particular NIC is teamed with one or more other NICs; andin response to determining that (a) the uncorrectable error is isolated to a particular NIC, and (b) the particular NIC is teamed with one or more other NICs, notifying the OS of a successful recovery from the uncorrectable error. 2. A method according to claim 1, wherein a driver is used for determining whether the particular NIC is teamed with one or more other NICs. 3. A method according to claim 1, wherein firmware is used for determining if the uncorrectable error is isolated to a particular NIC. 4. A method according to claim 1, wherein the OS is used for determining whether the particular NIC is teamed with one or more other NICs, whether the uncorrectable error is isolated to a particular NIC, or both. 5. A method according to claim 1, wherein the OS is notified by using a hardware error handling system. 6. A method according to claim 1, wherein notifying the OS of a successful recovery from the uncorrectable error is performed using Windows Hardware Error Architecture (WHEA) functions. 7