Red Hat Hardware Error Logs
Contents |
Linux discussion list
Hardware Error Machine Check Events Logged Ubuntu
Subject: Does redhat linux log all hardware events/issues/error in /var/log/mcelog? Date: Mon,
Hardware Event. This Is Not A Software Error
12 Mar 2012 18:28:23 -0400 Hi, We run redhat linux on intel hardware (mostly Dell, lately dell R710s). We want to be able to catch mca: memory controller gen_channelunspecified_err any hardware issues when they occur to act on them as quickly as possible. My understanding is that all hardware events/issues/errors are logged in /var/log/mcelog (Machine Check Events log). Is this correct? Can't stress this enough; does it log all hardware issues (cpu,memory,disk,ethernet,fibre/hba etc) ? Thanks, Follow-Ups: Re: Does redhat linux log all hardware events/issues/error in /var/log/mcelog? From: Paul Tader [Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]
Red Hat Certificate System Red Hat Satellite Subscription Asset Manager Red Hat Update Infrastructure Red Hat Insights Ansible Tower by Red mca: internal parity error Hat Cloud Computing Back Red Hat CloudForms Red Hat OpenStack Platform mce: [hardware error]: machine check events logged centos Red Hat Cloud Infrastructure Red Hat Cloud Suite Red Hat OpenShift Container Platform Red Hat OpenShift Online transaction: memory scrubbing error Red Hat OpenShift Dedicated Storage Back Red Hat Gluster Storage Red Hat Ceph Storage JBoss Development and Management Back Red Hat JBoss Enterprise Application Platform Red Hat JBoss https://www.redhat.com/archives/redhat-list/2012-March/msg00001.html Data Grid Red Hat JBoss Web Server Red Hat JBoss Portal Red Hat JBoss Operations Network Red Hat JBoss Developer Studio JBoss Integration and Automation Back Red Hat JBoss Data Virtualization Red Hat JBoss Fuse Red Hat JBoss A-MQ Red Hat JBoss BPM Suite Red Hat JBoss BRMS Mobile Back Red Hat Mobile Application Platform Services Back https://access.redhat.com/solutions/298163 Consulting Technical Account Management Training & Certifications Red Hat Enterprise Linux Developer Program Support Get Support Production Support Development Support Product Life Cycle & Update Policies Knowledge Search Documentation Knowledgebase Videos Discussions Ecosystem Browse Certified Solutions Overview Partner Resources Tools Back Red Hat Insights Learn More Red Hat Access Labs Explore Labs Configuration Deployment Troubleshooting Security Additional Tools Red Hat Access plug-ins Red Hat Satellite Certificate Tool Security Back Product Security Center Security Updates Security Advisories Red Hat CVE Database Security Labs Resources Overview Security Blog Security Measurement Severity Ratings Backporting Policies Product Signing (GPG) Keys Community Back Discussions Red Hat Enterprise Linux Red Hat Virtualization Red Hat Satellite Customer Portal Private Groups All Discussions Start a Discussion Blogs Customer Portal Red Hat Product Security Red Hat Access Labs Red Hat Insights All Blogs Events Customer Events Red Hat Summit Stories Red Hat Subscription Benefits You Asked. We Acted. Open Source Communities Subscriptions Downloads Support Cases Account Back Log In Register Red Hat Account Number: Account De
here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta http://unix.stackexchange.com/questions/73063/how-to-detect-a-possible-hardware-error Discuss the workings and policies of this site About Us Learn more http://www.mcelog.org/ about Stack Overflow the company Business Learn more about hiring developers or posting ads with us Unix & Linux Questions Tags Users Badges Unanswered Ask Question _ Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating hardware error systems. Join them; it only takes a minute: Sign up Here's how it works: Anybody can ask a question Anybody can answer The best answers are voted up and rise to the top How to detect a possible hardware error? up vote 2 down vote favorite 1 I'm running Debian Wheezy on a HP Pavilion dv7 laptop and it machine check events freezes every now and then, requiring a reboot. One time it didn't even load the operating system, but it wasn't I who was using it so I can't tell what error was displayed. On a previous Windows 7 install it constantly failed to load Windows, throwing the user at the "attempting repairs" screen, which would do something for a few minutes and then say Windows couldn't fix the problem. This leads me to think that there is a hardware problem, and I was wondering if there's anything at /var/log or somewhere else that could provide some info on what's going on, or if there's any test I could run, and what I should be looking for. I issued grep -i "error" /var/log The full output is here. The only line I could understand and that I think might have something to do with the problem was /var/log/dmesg.0:[ 11.632723] [drm:radeon_pci_probe] *ERROR* radeon kernel modesetting for R600 or later requires firmware-linux-nonfree. But lspci | grep -i vga Returns 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor
Flow References For Developers: Testing Logfile format Client protocol BIOS support Code README mcelog logs and accounts machine checks (in particular memory, IO, and CPU hardware errors) on modern x86 Linux systems. mcelog is required by both 32bit x86 Linux kernels (since 2.6.30) and 64bit Linux kernels (since early 2.6 kernel releases) to log machine checks and should run on all Linux systems that need error handling. The mcelog daemon accounts memory and some other errors errors in various ways. mcelog --client can be used to query a running daemon. The daemon can also execute triggers when configurable error thresholds are exceeded. This is used to implement a range of automatic predictive failure analysis algorithms: including bad page offlining and automatic cache error handling. User defined actions can be also configured. All errors are logged to /var/log/mcelog or syslog or the journal. For memory errors it supports modern x86 systems with integrated memory controllers; for CPU errors all modern x86 systems are supported. Traditionally mcelog was run as a cronjob, but this usage is deprecated now. The modern way to run it is to start it at boot up time and run it always as a daemon. In addition it can be used to decode fatal machine checks on the command line (but this is also usually not needed anymore on modern kernels which log those after reboot automatically) For installation information and how to set up a mcelog package (if you're a distributor) please see the README.