Extended Error Code Ecc Chipkill
May 2007 12:22:39 -0600 (MDT) Hi, I have a 4-way Opteron 870 system, with 16 2GB DIMMs and a Tyan Thunder K8QS Pro (S4882) motherboard. It has been crashing, and there are entries like this in the messages log: May 9 22:57:47 monolith kernel: EDAC k8 MC0: general bus error: participating processor(local node response), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic) May 9 22:57:47 monolith kernel: MC1: CE page 0x240cb0, offset 0x700, grain 8, syndrome 0x11c1, row 2, channel 0, label "": k8_edac May 9 22:57:47 monolith kernel: MC1: CE - no information available: k8_edac Error Overflow set May 9 22:57:47 monolith kernel: EDAC k8 MC0: extended error code: ECC chipkill x4 error or (I guess equivalently) in dmesg: EDAC k8 MC0: general bus error: participating processor(local node response), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic) MC1: CE page 0x2557b8, offset 0x0, grain 8, syndrome 0x4c58, row 2, channel 0, label "": k8_edac MC1: CE - no information available: k8_edac Error Overflow set EDAC k8 MC0: extended error code: ECC chipkill x4 error Some Googling suggests that the problem is likely a flaky DIMM. But how to tell which one? Can any kernel or hardware gurus out there let me know if the error messages above allow me to locate the potentially bad memory stick? Note that every set of log entries includes "row 2, channel 0". Both the mobo and OS have NUMA enabled. It's running the 2.6.9-55.ELsmp kernel. Thanks for any suggestions, Peter Ruprecht U. of Colorado Follow-Ups: Re: locating bad memory From: Paul Krizak Re: locating bad memory From: Kay Diederichs [Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]
here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business Learn more about hiring developers or posting ads with us Unix & Linux Questions Tags Users Badges Unanswered Ask Question _ Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. Join them; it only takes a minute: Sign up Here's how it works: Anybody can ask a question Anybody can answer The best answers are https://www.redhat.com/archives/nahant-list/2007-May/msg00131.html voted up and rise to the top OS errors : kernel: EDAC k8 MC0: extended error code: ECC chipkill x4 [duplicate] up vote 0 down vote favorite This question already has an answer here: Does kernel: EDAC MC0: UE page 0x0 point to bad memory, a driver, or something else? 1 answer We noticed the server crashes with below errors. Not sure it is related to any defected piece of the hardware http://unix.stackexchange.com/questions/91714/os-errors-kernel-edac-k8-mc0-extended-error-code-ecc-chipkill-x4 or totally not related to Server detail:Red Hat Enterprise Linux ES release 4 (Nahant Update 6) [root@athena log]# uname -a Linux athena.nsdecatur.local 2.6.9-67.0.7.ELsmp #1 SMP Wed Feb 27 04:47:23 EST 2008 x86_64 x86_64 x86_64 GNU/Linux messages Sep 17 15:08:16 athena kernel: EDAC k8 MC0: general bus error: participating processor(local node response), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic) Sep 17 15:08:16 athena kernel: MC0: CE page 0x2c2766, offset 0xb10, grain 8, syndrome 0xac08, row 1, channel 0, label "": k8_edac Sep 17 15:08:16 athena kernel: MC0: CE - no information available: k8_edac Error Overflow set Sep 17 15:08:16 athena kernel: EDAC k8 MC0: extended error code: ECC chipkill x4 error Sep 17 15:08:17 athena su(pam_unix)[19579]: session opened for user oracle by (uid=0) Sep 17 15:08:17 athena su(pam_unix)[19579]: session closed for user oracle Sep 17 15:08:17 athena su(pam_unix)[19634]: session opened for user oracle by (uid=0) Sep 17 15:08:17 athena su(pam_unix)[19634]: session closed for user oracle Sep 17 15:08:18 athena kernel: EDAC k8 MC0: general bus error: participating processor(local node origin), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic) Sep 17 15:08:18 athena kernel: MC0: CE page 0x39c857, offset 0xd50, grain 8, syndrome 0x1cc8, row 1, channel 0, label "": k8_edac Sep 17 15:08:18 athena kernel: MC
Mar29,2010,6:40AM Post #1 of 5 (2494 views) Permalink EDAC: Is it possible to calculate which piece of memory http://www.gossamer-threads.com/lists/linux/kernel/1207815 is bad? Hello, I see the following errors: EDAC MC0: CE page 0x8abba, offset 0xa10, grain 8, syndrome 0x4758, row 0, channel 0, label "": k8_edac EDAC MC0: CE - no information http://www.linuxquestions.org/questions/linux-hardware-18/inside-var-log-messages-reporting-these-errors-constantly-802254/ available: k8_edac Error Overflow set EDAC k8 MC0: extended error code: ECC chipkill x4 error EDAC k8 MC0: general bus error: participating processor(local node origin), time-out(no timeout) memory transaction type(generic read), extended error mem or i/o(mem access), cache level(generic) Is it possible to use the page or offset to calculate which DIMM is having a problem? Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo [at] vger More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ jkosin at intcomgrp Mar29,2010,7:08AM Post #2 of 5 (2463 views) extended error code Permalink Re: EDAC: Is it possible to calculate which piece of memory is bad? [In reply to] On 3/29/2010 9:50 AM, Justin Piszcz wrote: > Hello, > > I see the following errors: > > EDAC MC0: CE page 0x8abba, offset 0xa10, grain 8, syndrome 0x4758, row > 0, channel 0, label "": k8_edac > EDAC MC0: CE - no information available: k8_edac Error Overflow set > EDAC k8 MC0: extended error code: ECC chipkill x4 error > EDAC k8 MC0: general bus error: participating processor(local node > origin), time-out(no timeout) memory transaction type(generic read), mem > or i/o(mem access), cache level(generic) > > Is it possible to use the page or offset to calculate which DIMM is > having a > problem? > > Justin. > Theoretically, YES. However, you would have to have some important information: 1) The number and size of each memory stick in the machine. 2) The physical location accessed. With virtual memory being the norm there isn't always a 1-1 mapping here. But, this should be attainable. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the bod
Tags Search LQ Wiki Search Tutorials/Articles Search HCL Search Reviews Search ISOs Go to Page... LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware inside /var/log/messages reporting these errors constantly User Name Remember Me? Password Linux - Hardware This forum is for Hardware issues. Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux? Notices Welcome to LinuxQuestions.org, a friendly and active Linux Community. You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today! Note that registered members see fewer ads, and ContentLink is completely disabled once you log in. Are you new to LinuxQuestions.org? Visit the following links: Site Howto | Site FAQ | Sitemap | Register Now If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here. Having a problem logging in? Please visit this page to clear all LQ-related cookies. Introduction to Linux - A Hands on Guide This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter. For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own. Click Here to receive this Complete Guide absolutely free. Search this Thread 04-15-2010, 02:28 PM #1 narayanapalla LQ Newbie Registered: Jan 2010 Posts: 8 Rep: inside /var/log/messages reporting these errors constantly Hi, The folling error messages will continuoesly genera