289 Multiple Bit Error
Contents |
the order in which they are listed in the Action column until the problem is solved. See Parts listing, Types 8014, 8028 and 1916 to determine multiple bit error correction which components are CRUs and which components are FRUs. If an action
Ecc Multiple Bit Error Detected
step is preceded by "(Trained service technician only)," that step must be performed only by a trained service technician.
Ecc Multiple Bit Error Detected In Memory Module
Error code Description Action 062 Three consecutive startup failures Run the Configuration/Setup Utility program (Using the Configuration/Setup Utility program), select Load Default Settings, make sure that the date and time are
Multi Bit Error
correct, and save the settings. Reseat the following components one at a time, in the order shown, restarting the blade server each time: Battery - See Removing the battery and Installing the battery. (Trained service technician only) Microprocessor - See Removing a microprocessor and heat sink and Installing a microprocessor and heat sink. Replace the following components one at a time, in the how many digits are in a mac address order shown, restarting the blade server each time: Battery - See Removing the battery and Installing the battery. (Trained service technician only) Microprocessor - See Removing a microprocessor and heat sink and Installing a microprocessor and heat sink. (Trained service technician only) System-board assembly - See Removing the system-board assembly and Installing the system-board assembly. 101 Timer tick interrupt failure (Trained service technician only) Replace the system-board assembly - See Removing the system-board assembly and Installing the system-board assembly. 102 Timer 2 test failure (Trained service technician only) Replace the system-board assembly - See Removing the system-board assembly and Installing the system-board assembly. 106 Diskette controller failure For blade server type 8028 and 1916, replace the SAS interface card - See Removing a storage interface card and Installing a storage interface card. (Trained service technician only) For blade server model 8014, replace the system-board assembly - See Removing the system-board assembly and Installing the system-board assembly. 151 Real time clock failure Reseat the battery - See Removing the battery and Installing the battery. Replace the following components one at a time, in the order shown,
Memory issues Applicable countries and regions To begin troubleshooting, check the following top issues. If your issue is listed, select the link; otherwise, proceed to step 2. BIOS displays smaller memory size than installed Hynix DIMMs may cause system boot failures Empty DIMM slots disabled by default (xSeries 305) Memory is not all seen by the operating system Memory ProteXion error in event log (xSeries 445) Memory errors may not turn on all appropriate failure indicators and/or PFA's (xSeries 455) Memory failure after DIMM install or replacement (xSeries 366) System does not boot after replacing memory DIMM Check the RSA logs to see if there are any PFA errors, single bit errors (Memory ProteXion Events on xSeries 440 and 445), or Excessive or Correctable Errors https://publib.boulder.ibm.com/infocenter/bladectr/documentation/topic/com.ibm.bladecenter.hs12.doc/dw1hc_r_post_error_codes.html logged. Any of these errors should warrant a service call for FRU replacement. Check the POST error log for error message 289. If the DIMM was disabled by a system-management interrupt (SMI), replace the DIMM. If the DIMM was disabled by the user or by POST, proceed to step 9. To enhance reliability, servers use ECC (Error Checking and Correcting) memory and Chipkill. ECC memory is fault tolerant and single bit errors can be corrected without shutting down the https://www.ibm.com/support/entry/portal/docdisplay?lndocid=migr-40257 server. Chipkill corrects multiple single bit errors. If a memory chip error occurs, Chipkill will automatically take the failed memory chip offline while the server continues to run. Chipkill provides protection for memory similar to RAID protection for disks. Data bits and parity are written across multiple chips on the DIMM, and the controller is able to reconstruct a missing bit from a failed chip and continue working. Chipkill support is provided by the memory controller and is implemented using standard ECC DIMMs Check for POST numeric or beep codes at system startup. A startup beep code of 1-3 typically indicates a memory problem. If you are receiving an error at bootup, read the rest of the steps in this document, then refer to Troubleshooting bootup issues. Make sure your system is at the latest BIOS, Systems Management firmware, and diagnostics. These updates ensure that new or added components are recognized and used correctly by the system. Download latest BIOS and support files for your system For example: xSeries 365 - May not boot if DIMM pair J1 and J5 is not installed, or if BIOS not at latest level xSeries 365 - DIMMS disabled with downlevel BIOS xSeries 200 - latest BIOS required to recognize 512MB DIMM Check the IBM ServerProven compatibility Web site to ensure that the memory has been tested and certified as being supported for your system. Most third party me
Start here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business Learn more about hiring http://serverfault.com/questions/536636/multibit-error-encountered-on-dell-server-memory developers or posting ads with us Server Fault Questions Tags Users Badges Unanswered Ask Question _ Server Fault is a question and answer site for system and network administrators. Join them; it only takes a minute: Sign up Here's how it works: Anybody can ask a question Anybody can answer The best answers are voted up and rise to the top Multibit error encountered on Dell Server Memory up vote 3 down vote favorite Dell OpenManage reported the following: bit error Memory device status is critical Memory device location: DIMM_B2 Possible memory module event cause:Multi bit error encountered What does this mean? How bad is it? memory dell dell-openmanage share|improve this question asked Sep 5 '13 at 14:02 AXE-Labs 6361616 Call Dell Support, send it back as faulty. –Tom O'Connor Sep 5 '13 at 14:06 add a comment| 2 Answers 2 active oldest votes up vote 1 down vote The event message reference for this was 1404. It indicates multiple bit error a faulty DIMM that should be replaced but from what I read on blogs, the alert often clears and does not come back after reboots. Since it only tripped once for me, I cleared the memory errors using OMSA (dcicfg32.exe) and so far so good. share|improve this answer answered Sep 5 '13 at 14:02 AXE-Labs 6361616 This was a good move - replacement typically isn't warranted after a single occurrence, though I'd seriously consider it if the problem ever returns on that particular DIMM. –JimNim Sep 6 '13 at 14:52 Similarly, I was seeing "Single bit warning error rate exceeded" and "Single bit failure error rate exceeded" on a Linux host. These can be cleared as well but with omconfig: 'omconfig system alertlog action=clear' and 'omconfig system esmlog action=clear'. Lets hope they don't come back or its trash for the dimms. –AXE-Labs Mar 6 '14 at 20:18 Make sure you've got the latest firmware/BIOS too -- I have seen cases where these sorts of errors were spurious and "fixed" by firmware. –Wil Cooley May 19 '14 at 8:09 add a comment| up vote 1 down vote Cause of error according to Dell: "A memory device correction rate exceeded an acceptable value, a memory spare bank was activated, or a multibit ECC error occurred. The system continues to function normally (except for a multibit error). Replace the memory module identified in the m