Critical Interrupt Sensor Pcie Bus Fatal Error
Contents |
Frequently Asked Questions This section explains how to perform tasks related to diagnosing and troubleshooting a remote managed server using the iDRAC facilities. It contains the following a bus fatal error was detected on a component at bus 0 device 3 function 0 subsections: Trouble Indications helps you to find messages and other e171f pcie fatal error on bus 0 device 5 function 0 system indications that can lead to a diagnosis of the problemProblem-solving tools describes iDRAC tools that
A Bus Fatal Error Was Detected On A Component At Bus 0 Device 9 Function 0
you can use to troubleshoot your systemTroubleshooting and frequently asked questions answers to typical situations you may encounter Safety FirstFor You and Your System To perform certain
Pci1320 Bus Fatal Error On Bus 64 Device 3 Function 0
procedures in this section, you must work with the chassis, the PowerEdge server, or other hardware modules. Do not attempt to service the system hardware except as explained in this guide and elsewhere in your system documentation. CAUTION: Many repairs may only be done by a certified service technician. You should only perform troubleshooting and dell pci1318 simple repairs as authorized in your product documentation, or as directed by online or telephone service and support team. Damage due to servicing that is not authorized by Dell is not covered by your warranty. Read and follow the safety instructions that came with the product. Trouble Indicators This section describes indications that there may be a problem with your system. LED Indicators The initial indication of system trouble may be the LEDs on the chassis or components installed in the chassis. The following components and modules have status LEDs: Chassis LCD displayServersFansCMCsI/O modulesPower supplies The single LED on the chassis LCD summarizes the status of all of the components in the system. A solid blue LED on the LCD indicates that no fault conditions have been detected in the system. A blinking amber LED on the LCD indicates that one or more fault conditions have been detected. If the chassis LCD has a blinking amber LED, you can use the LCD menu to l
we frankly don't have a whole lot of issues. I personally attribute it to using Best Practices, and regular maintenance. That said, things will still
A Fatal Error Was Detected On A Component At Bus 2 Device 0 Function 0
get lost in the weeds occasionally. We ran into the Purple Screen of whea-logger event id 16 Death on one of our ESX 4.1 boxes yesterday. It isa Dell R610, and apparentlyhad a hardware hiccup, andkicked out a pci parity error was detected on a component at bus 0 device 5 function 0 errors stating: Tue Apr 19 21:09:15 2011 PCIE Fatal Err: Critical Event sensor, bus fatal error (Bus 1 Device 0 Function 1) was asserted 0xA10002FBF9AD4DB1000413186FAA0101h Tue Apr 19 21:09:15 2011 Err Reg https://stuff.mit.edu/afs/athena/dept/cron/documentation/Manuals/dell-server-admin/en/idrac1/chap13.htm Pointer: OEM sensor, OEM Diagnostic data event was asserted 0xA00002FBF9AD4DB10004C11A7E011610h Tue Apr 19 21:09:15 2011 PCIE Fatal Err: Critical Event sensor, bus fatal error (Bus 1 Device 0 Function 0) was asserted 0x9F0002FBF9AD4DB1000413186FAA0001h Tue Apr 19 21:09:15 2011 Err Reg Pointer: OEM sensor, OEM Diagnostic data event was asserted 0x9E0002FBF9AD4DB10004C11A7E011610h We rebooted the box, and it came back online just fine, but we didn't feel http://www.virtualizetheworld.com/2011/04/purple-screen-of-death-dell-issues.html comfortable with it, so we stuck it in maintenance mode and had someone contact Dell. Dell reports that we need to update the Bios on it: ----- Yes, it appears your system is affected by some of the microcode updates released from Intel on the 5500 and 5600 series processors. That is likely the cause of these PCI errors. The course of action we need to take is: Update the BIOS Update the iDRAC Clear out the old log entries Monitor for re-occurance. ------ So it's sitting in maintenance mode until someone has some time to love on it. The awesome thing is that we run N+1 (one more box than we need) so we have that luxury. I know plenty of people that refuse to listen to why you should go N+1 who would be scrambling to make a maintenance window to update it. The downside to this whole fiasco was that when it hiccupped, it stayed online (as is the default with ESX), and held onto the Storage of it's VMs. Therefore, HA couldn't restart them on another box until someone manually SHUT OFF the pretty Purple-VM-Eater. As
02:59 PMMy company has been deploying some telco application software runing under Ubuntu on Dell R710s for some time. The R710 has been end-of-lifed, so for the next customer deployment we purchased Dell R720 https://ubuntuforums.org/archive/index.php/t-2053947.html servers. Is anyone successfully using 12.04.1 64-bit on a Dell R720? My configuration has a https://kb.vmware.com/kb/2069475 single E5-2620 processor and 16 GB RAM. The R720 does not appear on the certified platforms list, but I assume it will at some point since all the previous generation hardware does. Is there a projected release that will support the R720? The installation went fine, but I am seeing a couple issues. They could simply be flaky fatal error hardware, but the Dell diagnostics pass, and since Dell doesn't suport Ubuntu on servers, I doubt I'll get any help from them. 1. For testing, I was connected only to eth0 with 1000BT. After 5-15 minutes the link went down (no link lights) and I was unable to get it back by plugging/unplugging or ifdown/ifup. After a reboot it worked again for a short time. I observed this both using DHCP and bus fatal error a static IP. 2. The BIOS event log and diagnostic display show 2 errors, "Critical CPU1 Status: Processor sensor for CPU1, IERR was asserted" and "Critical PCIE Fatal Err: Critical Event sensor, bus fatal error [Bus 1 Device 0 Function 0] was asserted". These have only shown up one time even though I've seen problem 1 and rebooted several times. Before I pull a second server out of the box and repeat my testing, does anyopne have any experience or advice? Thanks! -Mark sandydSeptember 6th, 2012, 03:19 PM"Critical CPU1 Status: Processor sensor for CPU1, IERR was asserted" indicates that the processor has activated its IERR pin, which then goes onto the second error, which is the real problem. The second error refers to a problem in the PCI-E port. Have you checked/reseated any cards plugged in PCI-E ports? Tried the cards in other computers? macptcomSeptember 6th, 2012, 03:51 PMI think it's more likely a software problem than an actual hardware problem, but that's why I'm asking for feedback from anyone else with an R720. There is only one actual PCI-E card, which is the currently unused second network card. There's also permanently attached devices like the RAID controller and onboard network interfaces which I suspect are behind the PCI-E controller. Thanks, -Mark norris900September 12th, 2012, 0