Dell Poweredge 2950 Single-bit Failure Error Rate Exceeded
Contents |
Start here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the
Single-bit Failure Error Rate Exceeded Dell
workings and policies of this site About Us Learn more about Stack clear memory error dell openmanage Overflow the company Business Learn more about hiring developers or posting ads with us Server Fault Questions Tags correctable memory error rate exceeded for dimm Users Badges Unanswered Ask Question _ Server Fault is a question and answer site for system and network administrators. Join them; it only takes a minute: Sign up Here's how
Single Bit Warning Error Rate Exceeded Clear
it works: Anybody can ask a question Anybody can answer The best answers are voted up and rise to the top Multibit error encountered on Dell Server Memory up vote 3 down vote favorite Dell OpenManage reported the following: Memory device status is critical Memory device location: DIMM_B2 Possible memory module event cause:Multi bit error encountered What does this mean?
Correctable Memory Error Log Limit Reached
How bad is it? memory dell dell-openmanage share|improve this question asked Sep 5 '13 at 14:02 AXE-Labs 6361616 Call Dell Support, send it back as faulty. –Tom O'Connor Sep 5 '13 at 14:06 add a comment| 2 Answers 2 active oldest votes up vote 1 down vote The event message reference for this was 1404. It indicates a faulty DIMM that should be replaced but from what I read on blogs, the alert often clears and does not come back after reboots. Since it only tripped once for me, I cleared the memory errors using OMSA (dcicfg32.exe) and so far so good. share|improve this answer answered Sep 5 '13 at 14:02 AXE-Labs 6361616 This was a good move - replacement typically isn't warranted after a single occurrence, though I'd seriously consider it if the problem ever returns on that particular DIMM. –JimNim Sep 6 '13 at 14:52 Similarly, I was seeing "Single bit warning error rate exceeded" and "Single bit failure error rate exceeded" on a Linux host. These can be cleared
in Systems Management Forums Single Bit Warning Error Rate Exceeded. Systems Management Dell Systems Management Solutions: Dell OpenManage, iDRAC, Repository Manager, Microsoft SCCM, Chassis poweredge diagnostics Managment Controller, and more Get this RSS feed TechCenter Home Topic
Persistent Correctable Memory Error Rate Has Increased For A Memory Device At Location
Home Forums Wikis Twitter Details 3 Replies 1 Subscriber Postedover 4 years ago Options RSS Share Related Forums correctable memory error rate exceeded for dimm a1 Clear Forum Dell OpenManage Essentials Forum to discuss OpenManage Essentials (OME), a systems management console that provides simple, basic Dell hardware management. Dell Repository Manager Dell Repository Manager allows http://serverfault.com/questions/536636/multibit-error-encountered-on-dell-server-memory IT admins to more easily manage Dell system updates Dell Systems Management General Forum A general forum to discuss Dell Systems Management Enterprise IT solutions such as OMSA, iDRAC, CMC, SUU, SBUU, and more. OpenManage Connections for 3rd Party Console Integration Forum to discuss monitoring Dell servers and storage platforms with HP Operations Manager, IBM Tivoli Netcool / OMNIbus http://en.community.dell.com/techcenter/systems-management/f/4494/t/19459637 or CA Network and Systems Management (NSM) solutions. OpenManage Integration for VMware vCenter Next > Dell OpenManage Essentials Single Bit Warning Error Rate Exceeded. Posted by john.ross on 31 Jul 2012 10:10 Hello, I am using OpenMange Server Essential to monitor my DELL servers and recentlyI have been getting amemory warning on myPowerEdge 2950. Severity:Critical, Message:Memory device status is criticalMemory device location: DIMM3 Possible memory module event cause:Single bitwarning error rate exceeded,Single bit failure error rate exceeded I am just wondering if anybody has seen this error and what the solution could be. Any help is greatly appreciated. Thanks, john.ross Like 0 Reply You have posted to a forum that requires a moderator to approve posts before they are publicly available. Posted by DELL-Abhijit P on 31 Jul 2012 10:17 Hi John.ross, The error looks like an issue with the DIMM in slot 3. You might want to call Tech support and they can help you diagnose it further. Regards Abhijit Like 0 Reply You have posted to a forum that requ
sorted by: [ date ] [ thread ] [ subject ] [ author ] On Thu, 2008-10-09 at 11:20 +0000, Arnar Þórarinsson wrote: http://lists.us.dell.com/pipermail/linux-poweredge/2008-October/037484.html > > Hello, > > Could somebody please explain these error messges to http://blog.open-tribute.org/2013/03/dell-single-bit-warnings-error.html me. I've been > trying to find some info on this but have found nothing. > > Severity : Critical > ID : 1404 > Date and Time : Fri Oct 3 19:57:10 2008 > Category : Instrumentation Service > Description : Memory device status is critical Memory device error rate > location: DIMM2_B Possible memory module event cause:Single bit > warning error rate exceeded,Single bit error logging disabled > > Severity : Non-Critical > ID : 1403 > Date and Time : Fri Oct 3 18:01:02 2008 > Category : Instrumentation Service > Description : Memory device status is non-critical Memory device > location: DIMM2_B Possible memory module event cause:Single bit error rate exceeded > warning error rate exceeded > > > /Arnar Thorarinsson Single bit warning errors by them selves mean very little other then the memory found an error and corrected for it. However, IF you see many of these errors, then there is a more serious issue. That would indicate that you have a bad dimm or a bad dimm card. To test, just swap out dimm2-b with another dimm and see if the error follows the dimm or stays with the slot. If it stays with the slot, you need a new dimm card/MB, if it follows the dimm, you need a new dimm. Again, a few of these warnings mean nothing other then the ECC for your memory is working as designed. Many of these warnings means you have bad memory or bad memory riser/MB. -- Damon L. Chesser
I guess :D Working on Dells, I would assume that you all have OpenManage installed on your servers. Have you ever received an alert about DIMMs on Dell servers? Looking at the logs you get: # omreport chassis Health Main System Chassis SEVERITY : COMPONENT Non-Critical : Memory Ok : Power Management Ok : Processors [...] # omreport system alertlog Alert Log Alert Log contains... Severity : Non-Critical ID : 1403 Date and Time : Thu Feb 28 20:36:14 2013 Category : Instrumentation Service Description : Memory device status is non-critical Memory device location: DIMM_A4 Possible memory module event cause:Single bit warning error rate exceeded [...] You called straight away the Dell Call centre, waited and received an answer like: "Try this command ... or this command ... have you tried to power it off and on again?" Well, I did a while ago! Just don't fire at the Dell techs! It's their job to ask you all those questions, as much as it is your job to get it resolved quickly. Guess what, unless this becomes a real issue, you don't need to call Dell Call centre anymore. After all, this is a Non-Critical issue. It basically means that the memory found an error and corrected for it. So what I would advise, is, calm down, drink some tea, and just clear the memory failures! On linux, you can clear them with: #/opt/dell/srvadmin/sbin/dcicfg command=clearmemfailures When printing again the health of your server, everything will be OK!! But, if this happens again, on the same DIMM, in a few weeks time, your DIMM or slot might have real problem. In this case, open your server and swap memory in this slot with another slot. For example, sap DIMMs between slot 4 (in this case) and slot 8. If it happens on slot 8, then you have a bad memory. If it happens again on slot 4, then you slot is bad, so you'll need to change your motherboard ... now you can call Dell! Hope it helps in your day to day stress :P Posted