Edac Error Injection
Contents |
exported by these drivers in sysfs. With no options, edac-util will report any uncorrected error (UE) or corrected error (CE) information recorded by EDAC, along with any DIMM label information registered with EDAC. Options -h, --help Display a edac-util summary of the command-line options. -q, --quiet Quiet mode. For some reports, edac-util edac vs mcelog will report corrected and uncorrected error counts for all MC, csrow, and channel combinations, even if the current count of errors edac sbridge lost memory errors is zero. The --quiet flag will suppress the display of any locations with zero errors, thus creating a more terse report. No output will be generated if there are zero total errors currently recorded by edac-util: error: no memory controller data found. EDAC. Additionally, the use of --quiet will suppress all informational and debug messages, displaying only fatal errors. -v, --verbose Increase verbosity. Multiple -v's may be used. -s, --status Displays the current status of EDAC drivers. edac-util will report whether it detects that EDAC drivers are loaded, and the number of memory controllers (MCs) found in sysfs. In verbose mode, the MC id and name of each controller will also be
Handling Mce Memory Error
printed. -r, --report=report,... Specify the report to generate. Currently, the available reports are default, simple, full, ue, and ce. These reports are detailed in the EDAC REPORTS section below. More than one report may be specified in a comma-separated list. Edac Reports default The default edac-util report is generated when the program is run without any options. If there are no errors logged by EDAC, this report will display "No errors to report." to stdout. Otherwise, error counts for each MC, csrow, channel combination with attributed errors are displayed, along with corresponding DIMM labels, if these labels have been registered in sysfs. The default report will also display any errors that do not have any DIMM information. These errors occur when errors are reported in the memory controller overflow register, indicating that more than one error occurred during a given EDAC poll cycle. It is usually obvious from which DIMM locations these errors were generated. simple The simple report reports total corrected and uncorrected errors for each MC detected on the system. It also displays a tally of total errors. With the --quiet option, only non-zero error counts are displayed. full The full report generates a line of output for every MC, csrow, channel combination found i
Flow References For Developers: Testing Logfile format Client protocol BIOS support Code README Frequently asked questions How do I report bugs in mcelog? Here is this machine check output. Please tell me what it means I have this corrected
Edac Wiki
error message. Is my system broken? I inject errors, but nothing happens How do edac mc0 I get an overview of what errors happened on the system? How do I enable memory error reporting on SLES11-SP1? How ce memory read error do I decode fatal machine checks? How do I "run through mcelog --ascii"? How do I log fatal machine checks to disk? On what systems does DMI DIMM decoding work? I get "Cannot open /dev/mem https://linux.die.net/man/1/edac-util for DMI decoding" I get "failed to prefill DIMM database from DMI data" How do I enable corrected memory error reporting on Intel Xeon 7500,6500,E7 series systems? How does mcelog compare to EDAC? I get "machine check events logged"? I get "kernel hardware error no human readable mce decoding support on this cpu type" Can you release mcelog? I get a "only decoding architectural errors" message. Does mcelog log all errors? http://www.mcelog.org/faq.html mcelog does not start on newer AMD systems anymore Can I configure mcelog to send an email on each hardware error On SUSE systems I see "mcelog: SMTP server problem" messages mcelog on my old Linux distribution (RHEL 4 or similar vintage) reports wrong CPUs? How do I report bugs in mcelog? Please send them to the maintainer (see contact ) There is currently no mcelog specific mailing list. This is for bugs in mcelog itself, not for asking what is wrong with your hardware. Here is this machine check output. Please tell me what it means You have to ask your hardware vendor. Linux and mcelog developers cannot do hardware support for you. A machine check is a hardware problem and not a software problem. Such questions will be ignored. An exception are crashes or problems in the actual error reporting. Please report those. If you're doing over clocking or otherwise running your system out of spec: consider to stop doing so now. I have this corrected error message. Is my system broken? A low rate of corrected memory errors is expected and does not require replacing hardware or other action. Also over a long uptime the total number of corrected errors may also be quite high. That
multiple definitions: Sysfs Error Injection facilities found in drivers/edac/Kconfig The configuration item CONFIG_EDAC_AMD64_ERROR_INJECTION: prompt: Sysfs Error Injection facilities type: http://cateee.net/lkddb/web-lkddb/EDAC_AMD64_ERROR_INJECTION.html bool depends on: CONFIG_EDAC_AMD64 defined in drivers/edac/Kconfig found in Linux kernels: 2.6.31–2.6.36 Help text Recent Opterons (Family 10h and later) provide for Memory Error Injection into the ECC detection circuits. The amd64_edac module allows the operator/user to inject Uncorrectable and Correctable errors into DRAM. When memory error enabled, in each of the respective memory controller directories (/sys/devices/system/edac/mc/mcX), there are 3 input files: - inject_section (0..3, 16-byte section of 64-byte cacheline), - inject_word (0..8, 16-bit word of 16-byte section), - inject_ecc_vector (hex ecc vector: select bits of inject word) In addition, there are edac error injection two control files, inject_read and inject_write, which trigger the DRAM ECC Read and Write respectively. Sysfs HW Error injection facilities found in drivers/edac/Kconfig The configuration item CONFIG_EDAC_AMD64_ERROR_INJECTION: prompt: Sysfs HW Error injection facilities type: bool depends on: CONFIG_EDAC_AMD64 defined in drivers/edac/Kconfig found in Linux kernels: 2.6.37–2.6.39, 3.0–3.19, 4.0–4.8, 4.8+HEAD Help text Recent Opterons (Family 10h and later) provide for Memory Error Injection into the ECC detection circuits. The amd64_edac module allows the operator/user to inject Uncorrectable and Correctable errors into DRAM. When enabled, in each of the respective memory controller directories (/sys/devices/system/edac/mc/mcX), there are 3 input files: - inject_section (0..3, 16-byte section of 64-byte cacheline), - inject_word (0..8, 16-bit word of 16-byte section), - inject_ecc_vector (hex ecc vector: select bits of inject word) In addition, there are two control files, inject_read and inject_write, whic