Error Failed To Send A Fma Eventrc=-1
another T5220. SC Alert: [ID 599537 daemon.error] Chassis | major:ERROR: Failed to send a fma event(rc=-1) ereport.cpu.ultraSPARC-T2.dac Both systems have: SunOS 5.10 Generic_139555-08 sun4v sparc SUNW,SPARC-Enterprise-T5220 119578-30 - FMA patch Sun System Firmware 7.2.2.e and 7.2.7.b Does anyone know about ereports and what they signify? What does rc=-1 signify? Thanks in advance Hi, The FMA errors - ereport.cpu.ultraSPARC-T2.dsc & ereport.cpu.ultraSPARC-T2.dac are examples of correctable memory error. ILOM on T5220 (also known as SP or SC ) detected these error and hence it is reporting but at the same time it is unable to communicate the same to Solaris FMA DE. Hence you are getting the error message from SC - http://comments.gmane.org/gmane.os.solaris.opensolaris.sysadmin/3419 SC Alert: [ID 599537 daemon.error] Chassis | major:ERROR: Failed to send a fma event(rc=-1) ereport.cpu.ultraSPARC-T2.dac Simple solution for this problem could be restarting your SC using resetsc command from SC which might fix the broken communication between SC and Solaris. There could also be bug on firmware or Solaris FMA DE (FMA Diagnostic Engine ). Most FMA errors are included in the kernel patch. Kernel patch on your system looks to be a http://www.betterdevelopers.com/article/46904813/SC+Alert+on+T5220 bit old, you may want to try updating it to latest along with the latest firmware for ILOM which is currently 7.2.9.a as per handbook. If this error messages are flooding, it suggest there could be a bad memory. Check showfaults on SC to see if any faulted memory on the server. Running max POST to verify if any memory issue should be a good idea. HTH -Mehul Thanks for the reply. Do you know what do those ereports mean? What do DAC and DSC stand for? I couldn't find any documentation about ereports on Sun website. Hi, DSC = DRAM Scrub Correctable error DAC = DRAM Access Correctable error HTH -Mehul Thanks so much.. Do you have or know any documentation about these ereports? I'll look into SC logs and run diagnostics to see if it gives info about troublesome memory module(s) I don't have any document but I learn this from one of the training I'd attended. I'm not sure but below links might be useful http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/fm/modules/SUNW,Netra-T5220/etm/etm.conf http://blogs.sun.com/sdaven/entry/ultrasparc_t2_fma http://hub.opensolaris.org/bin/view/Community+Group+fm/WebHome#HDocumentation -Mehul Hi, detail on the ereport can be checked using fmdump -eV command under Solaris, which will give useful information about the error report that is generated by FMA DE. In the current message that you see from the SC, ereport is not received by Solari
Healing (FMA) for T5140/T5240 By user9148476 on Apr 09, 2008 April 9, 2008: Sun announced the T5140/T5240 platforms centered around the https://blogs.oracle.com/sdaven/entry/fma_for_t2plus UltraSPARC T2 Plus processor. The T2 Plus extends the capabilities of the UltraSPARC T2 processor, the most obvious being the capability for multiple processors in a system. And I'm happy to report that Solaris' Predictive Self Healing feature (also known as the Fault Management Architecture (FMA)) has been extended to include coverage for T2 Plus. With respect error failed to fault management, T2 Plus is very similar to the T2. The fault management features of T5140/T5240 are listed below, along with example output for a couple of the new T2 Plus diagnosis features. Base UltraSPARC T2 features: All of the FMA features present on the T2 processor are also available with the T2 Plus-based systems error failed to Coherency plane diagnosis: The T2 Plus processors in the T5140/T5240 systems communicate with one another across a coherency plane, similar in nature to a Fully Buffered DIMM (FB-DIMM) channel. Error handling and diagnosis have been enhanced to detect and diagnose errors (single-lane/multi-lane/protocol errors) on the coherency plane. Local vs. remote errors: With multiple processors in the system, it is possible that one T2 Plus can trigger an error in another T2 Plus (e.g. a remote read of memory/cache). The error handlers have been extended to recognize local vs. remote errors and produce the proper telemetry so diagnosis engines indict the correct T2 Plus. Automatic FB-DIMM lane failover: The UltraSPARC T2 Plus memory controller seamlessly handles a single lane failover on an FM-DIMM link without a system crash. The fault management subsystem has been updated to differentiate between FB-DIMM errors resulting in lane failovers vs. those that do not. Additional information on FB-DIMM diagnosis is in one of my earlier blogs. Extended IO controller diagnosis: The embedded IO
be down. Please try the request again. Your cache administrator is webmaster. Generated Tue, 11 Oct 2016 13:05:49 GMT by s_ac15 (squid/3.5.20)