Kernel Sbridge Handling Mce Memory Error
Contents |
Start here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the
Memory Scrubbing On Fatal Area
company Business Learn more about hiring developers or posting ads with us Server Fault sbridge handling mce memory error ubuntu Questions Tags Users Badges Unanswered Ask Question _ Server Fault is a question and answer site for system and network administrators. Join edac sbridge lost memory errors them; it only takes a minute: Sign up Here's how it works: Anybody can ask a question Anybody can answer The best answers are voted up and rise to the top How to find faulty memory module
Memory Read On Fatal Area Overflow:
from MCE message? up vote 2 down vote favorite 1 I am trying to understand MCE message to find which memory module is bad on a server. This message appears in /var/log/kern.log in one server that freezes two times today. Apr 13 22:39:22 mbox kernel: [36247975.116860] sbridge: HANDLING MCE MEMORY ERROR Apr 13 22:39:22 mbox kernel: [36247975.116867] CPU 0: Machine Check Exception: 0 Bank 5: 8c00004000010090 Apr 13 22:39:22 mbox kernel: [36247975.116869] TSC 0
Ce Memory Read Error
ADDR 4a0d75900 MISC 21405cdc86 PROCESSOR 0:206d7 TIME 1428957562 SOCKET 0 APIC 0 Apr 13 22:39:22 mbox kernel: [36247975.951013] EDAC MC0: 1 CE memory read error I suspect a bad memory module. The server is a 2x Xeon E5-2650 with 8x8Go memory modules (8 memory slots for each cpu) Here is the memory module population from lshw: *-memory:0 description: System Memory physical id: 2d slot: System board or motherboard *-bank:0 description: DIMM DDR3 1333 MHz (0,8 ns) product: 9965516-197.A vendor: Kingston physical id: 0 serial: B83AE5C2 slot: P1_DIMMA1 size: 8GiB width: 64 bits clock: 1333MHz (0.8ns) *-bank:1 description: DIMM Synchronous [empty] product: Dimm1_PartNum vendor: Dimm1_Manufacturer physical id: 1 serial: Dimm1_SerNum slot: P1_DIMMA2 width: 64 bits *-bank:2 description: DIMM DDR3 1333 MHz (0,8 ns) product: 9965516-048.A vendor: Kingston physical id: 2 serial: EC309238 slot: P1_DIMMB1 size: 8GiB width: 64 bits clock: 1333MHz (0.8ns) *-bank:3 description: DIMM Synchronous [empty] product: Dimm4_PartNum vendor: Dimm4_Manufacturer physical id: 3 serial: Dimm4_SerNum slot: P1_DIMMB2 width: 64 bits *-bank:4 description: DIMM DDR3 1333 MHz (0,8 ns) product: 9965516-048.A vendor: Kingston physical id: 4 serial: E9305438 slot: P1_DIMMC1 size: 8GiB width: 64 bits clock: 1333MHz (0.8ns) *-bank:5 description: DIMM Synchronous [empty] product: Dimm7_PartNum vendor: Dimm7_Manufacturer physical id: 5 serial: Dimm7_SerNum slot: P1_DIMMC2 width: 64 bits *-bank:6 description: DIMM DDR3 1333 MHz (0,8 ns) product: 9965516-048.A vendor: Kingston phys
-Clone This Bug -Last Comment First Last Prev Next how to find faulty dimm in linux This bug is not in your last search results.
How To Check Dimm Status In Linux
Bug875194 - sbridge: HANDLING MCE MEMORY ERROR Summary: sbridge: HANDLING MCE MEMORY ERROR Status: CLOSED WONTFIX how to check dimm failure in linux Aliases: None Product: Fedora Classification: Fedora Component: kernel (Show other bugs) Sub Component: --- Version: 17 Hardware: Unspecified Unspecified Priority unspecified Severity unspecified TargetMilestone: --- http://serverfault.com/questions/682909/how-to-find-faulty-memory-module-from-mce-message TargetRelease: --- Assigned To: Linda Wang QA Contact: Fedora Extras Quality Assurance Docs Contact: URL: Whiteboard: Keywords: Depends On: Blocks: Show dependency tree /graph Reported: 2012-11-09 13:55 EST by joshua Modified: 2013-08-20 09:17 EDT (History) CC List: 6 users (show) gansalmon itamar johanvz653 jonathan kernel-maint madhu.chinakonda See Also: Fixed https://bugzilla.redhat.com/show_bug.cgi?id=875194 In Version: Doc Type: Bug Fix Doc Text: Story Points: --- Clone Of: Environment: Last Closed: 2013-07-31 23:26:22 EDT Type: Bug Regression: --- Mount Type: --- Documentation: --- CRM: Verified Versions: Category: --- oVirt Team: --- RHEL 7.3 requirements from Atomic Host: Cloudforms Team: --- Attachments (Terms of Use) Add an attachment (proposed patch, testcase, etc.) Groups: None (edit) Description joshua 2012-11-09 13:55:21 EST Description of problem: Fedora 17 x86_64 on new server with 192 gigs of memory. No actual problems to report, save these disturbing messages: [ 3063.105128] sbridge: HANDLING MCE MEMORY ERROR [ 3063.105139] CPU 4: Machine Check Exception: 0 Bank 5: 8c00004000010092 [ 3063.105141] TSC 0 ADDR 2f050eb080 MISC 24403ebe86 PROCESSOR 0:206d7 TIME 1352412000 SOCKET 1 APIC 20 [ 3064.017116] EDAC MC1: 1 CE memory read error on CPU_SrcID#1_Channel#2_DIMM#2 (channel:2 slot:2 page:0x2f050eb offset:0x80 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0092 socket:1 channel_mask:2 rank:8) [ 5158.195757] sbridge: HANDLING MCE MEMORY ERR
Red Hat Certificate System Red Hat Satellite Subscription Asset Manager Red Hat Update Infrastructure Red Hat Insights Ansible Tower https://access.redhat.com/solutions/496803 by Red Hat Cloud Computing Back Red Hat CloudForms Red Hat http://www.advancedclustering.com/act-kb/what-are-machine-check-exceptions-or-mce/ OpenStack Platform Red Hat Cloud Infrastructure Red Hat Cloud Suite Red Hat OpenShift Container Platform Red Hat OpenShift Online Red Hat OpenShift Dedicated Storage Back Red Hat Gluster Storage Red Hat Ceph Storage JBoss Development and Management Back Red Hat JBoss Enterprise Application Platform memory error Red Hat JBoss Data Grid Red Hat JBoss Web Server Red Hat JBoss Portal Red Hat JBoss Operations Network Red Hat JBoss Developer Studio JBoss Integration and Automation Back Red Hat JBoss Data Virtualization Red Hat JBoss Fuse Red Hat JBoss A-MQ Red Hat JBoss BPM Suite Red Hat JBoss BRMS Mobile Back Red sbridge handling mce Hat Mobile Application Platform Services Back Consulting Technical Account Management Training & Certifications Red Hat Enterprise Linux Developer Program Support Get Support Production Support Development Support Product Life Cycle & Update Policies Knowledge Search Documentation Knowledgebase Videos Discussions Ecosystem Browse Certified Solutions Overview Partner Resources Tools Back Red Hat Insights Learn More Red Hat Access Labs Explore Labs Configuration Deployment Troubleshooting Security Additional Tools Red Hat Access plug-ins Red Hat Satellite Certificate Tool Security Back Product Security Center Security Updates Security Advisories Red Hat CVE Database Security Labs Resources Overview Security Blog Security Measurement Severity Ratings Backporting Policies Product Signing (GPG) Keys Community Back Discussions Red Hat Enterprise Linux Red Hat Virtualization Red Hat Satellite Customer Portal Private Groups All Discussions Start a Discussion Blogs Customer Portal Red Hat Product Security Red Hat Access Labs Red Hat Insights All Blogs Events Customer Events Red Hat Summit Stories Red Hat Subscription Benefits You Asked. We Acted. Open Source Communities Subscriptions Downloads Support Cases
systemsStorageMicroHPC WorkstationsSoftwareeQUEUE – Our innovative web-based job submission tool.ACT Utils – Full featured cluster management software.Breakin – Open-source full featured hardware testing and diagnostics.ServicesACTnowHPC – On Demand HPC Cloud ComputingOur servicesRequest a quote CloseTechIntel Xeon BroadwellKnights Landing - New Intel Xeon PhiGPU ComputingAMD OpteronInfiniband CloseSupportSupport requestWarrantyKnowledge baseDownloadsCustomer portal CloseIndustriesEducationGovernmentEngineeringLife sciencesFinanceClimate and weatherEnergyManufacturing CloseBlog Close ACT knowledge base KB CategoriesGetting Support (3)Hardware (1)Areca Raid Arrays (3)Infiniband (8)LSI Raid Arrays (7)Nvidia Graphics Cards (0)Power (1)Racks (2)Troubleshooting (8)Software (0)ACT Utilities (4)HPC apps & benchmarks (2)Linux (1)Schedulers (0)Open Grid Scheduler (Grid Engine) (1)TORQUE (1)Tech Tips (21)Search the KB Need Assistance?Support ticketName* First Last Company*Email* PhoneSerial numberPlease enter your system's serial number. This will expedite the handling of your ticket.Problem*Detailed description*Please make sure you are detailed as possible in your description above. Please include serial numbers, order numbers, or any other details that can help us resolve your issue as quick as possible.Attachments Drop files here or Include any screenshots or log files that will make your issue easier to diagnose.NameThis field is for validation purposes and should be left unchanged. Submit a support ticketWhat are Machine Check Exceptions (or MCE)?Last update: August 18, 2014Categories:Hardware / TroubleshootingIf you are seeing messages in your system logs that state "Machine Check Event logged" this could be an indication of a hardware problem or failure.A machine check exception is an error detected by your system's processor. There are 2 major types of MCE errors, a notice or warning error, and a fatal exception. The warning will be logged by a "Machine Check Event logged" notice in your system logs, and can be later viewed via some Linux utilities. A fatal MCE will cause the machine to stop responding and the details of the MCE will be printed out to the system's console.What causes MCE errors?There most common reason for MCE events to occur are:Memory errors or Error Correction Code (ECC) problemsInadequate cooling / processor over-heatingSystem bus errorsCache errors in the processor or hardwareHow do I find out what the errors mean?If you