Perc Cache Memory Error
Contents |
to troubleshoot memory or battery errors on the PERC controller on Dell PowerEdge servers This article provides information on how to troubleshoot the "Memory/battery problems were detected. The adapter has recovered, but cached data was lost. Press any key to continue" error and other memory raid adapter memory error please check the sdram connection related errors that may occur on the Dell PERC controller on Dell PowerEdge servers.
Memory/battery Problems Were Detected The Adapter Has Recovered But Cached Was Lost
Table of Contents: RAID Controller Error Message During Post Troubleshooting Conditions That Lead to Error Message Reboot to OS Clear dell poweredge 2850 memory battery error Controller Cache Check the Physical PERC Controller Additional Information PERC Battery Maintenance Cache Use 1. RAID Controller Error Message During Post During POST, the RAID controller presents a message: Memory/battery problems were detected. The adapter raid adapter unrecoverable error sdram connection has recovered but cache data may be lost. Press any key to continue. For errors that appear on the LCD or when running diagnostics, refer to the following article: Interpreting LCD and Embedded Diagnostic (ePSA) event messages. Back to Top 2. Troubleshooting Conditions That Lead to Error Message This message can occur normally when one of the following conditions occur. Troubleshooting the associated events will likely also prevent this message from occurring. OS indicates abnormal shutdown. OS indicates error occurred (blue screen occurred in Windows). Spontaneous power loss condition. Common troubleshooting steps include: 1. Reboot to OS If the OS boot is successful, rebooting again should result in no message being displayed. 2. Clear Controller Cache CTRL-M for SCSI controllers (PERC 3, PERC 4). CTRL-R for SAS/SATA controllers (PERC 5, PERC 6 and newer controllers). Wait five minutes to allow contents of cache to purge. Reboot back to controller BIOS. Note: If error persists, the likelihood of a hardware error is increased. Please contact Technical Support for further troubleshooting steps. If error is eliminated, boot to OS. If OS boot is still not successful and/or the error persists, this may indicate a problem with the OS. Please contact Technical Support for further troubleshooting steps if you have an active warranty. Back to Top 3. Check the Physical PERC Controller Inspect the DIMM and DIMM Socket for Damage. Power the system off and remove the power cable(s) from the system. Let the system sit for 30 seconds to allow any remaining flea power to drain. Remove the PERC controller. For information on removing and replacing parts in this system, refer to the user guide locate
for Help Receive Real-Time Help Create a Freelance Project Hire for a Full Time Job Ways to Get Help Ask a Question Ask for Help Receive Real-Time Help Create a Freelance Project Hire for a Full Time Job Ways to Get Help Expand Search Submit Close Search Login Join Today Products BackProducts Gigs Live Careers Vendor Services Groups Website Testing Store Headlines Experts Exchange > Questions > Safe to Disable cache on a PowerEdge 2800 PERC 4e/Di ? Want to Advertise Here? Solved Safe to Disable cache on a PowerEdge 2800 PERC 4e/Di ? Posted on 2009-03-10 Storage Storage Hardware Server Hardware 1 Verified Solution 17 Comments http://www.dell.com/support/article/SLN130018 21,427 Views Last Modified: 2013-11-14 We have an out of warranty Dell PowerEdge 2800 server running Win Server 2003 with the following embedded RAID controller: PERC 4e/Di. Here are a few of its specs: RAID BIOS H435 Build April 23, 2008 PERC/CERC BIOS Configuration utility U827 RAID 5 At boot we get the following RAID Post error message: "Memory/battery problems were detected. The adapter has recovered, but cached data https://www.experts-exchange.com/questions/24217854/Safe-to-Disable-cache-on-a-PowerEdge-2800-PERC-4e-Di.html was lost. Press any key to continue." While up and running, we get various blue screens and random OS freezing and crashing. Blue Screen A process or thread crucial to system operation has unexpectedly exited or been terminated. Stop: 0x000000F4 (0x00000003, 0x899A9D08, 0x899A9E6C, 0x8094C6E6) Through multiple Dell technician phone calls, we arranged for them to send us a replacement battery for the RAID controller. We installed it, let it charge for 24 hours, but we still get the Post error and the random crashing. I have also reseated the RAM DIMM, but the problems persist. We updated the MLB BIOS as well as the PERC firmware BIOS, with the problems still persisting. Now, we believe that the problem is a bad RAM DIMM on the controller. The documentation for the PERC 4e/Di is hard to find on Dell's website and not very helpful. I found LSI's site a bit more helpful because they give detailed manuals for their controllers, which is the platform upon which dell's PERC controllers are based. All troubleshooting there and in other places on the web all point to the battery or the memory DIMM being the problem. And since we have a brand new battery, we think it is the
5 Post Error Messages Red Hat Enterprise Linux Operating System Errors LED Behavior Patterns Audible Alarm Warnings To get help with your Dell PowerEdge Expandable RAID Controller (PERC) 5 controller, you can contact your Dell Technical Service https://stuff.mit.edu/afs/athena/dept/cron/documentation/Manuals/dell-server-admin/en/Perc5i_5e/chapterj.htm representative or access the Dell Support website at support.dell.com. Virtual Disks Degraded A redundant virtual disk is in a degraded state when one physical disk has failed or is inaccessible. For example, a RAID 1 https://calomel.org/megacli_lsi_commands.html virtual disk consisting of two physical disks can sustain one physical disk in a failed or inaccessible state and become a degraded virtual disk. To recover from a degraded virtual disk, rebuild the physical disk memory error in the inaccessible state. Upon successful completion of the rebuild process, the virtual disk state changes from degraded to optimal. For the rebuild procedure, see Performing a Manual Rebuild of an Individual Physical Disk in RAID Configuration and Management. Memory Errors Memory errors can corrupt cached data, so the controllers are designed to detect and attempt to recover from these memory errors. Single-bit memory errors can be handled by the perc cache memory firmware and do not disrupt normal operation. A notification will be sent if the number of single-bit errors exceeds a threshold value. Multi-bit errors are more serious, as they result in corrupted data and data loss. The following are the actions that occur in the case of multi-bit errors: If an access to data in cache memory causes a multi-bit error when the controller is started with dirty cache, the firmware will discard the cache contents. The firmware will generate a warning message to the system console to indicate that the cache was discarded and will generate an event.If a multi-bit error occurs at run-time either in code/data or in the cache, the firmware will stop. The firmware will log an event to the firmware internal event log and will log a message during POST indicating that a multi-bit error has occurred. NOTE: In case of a multi-bit error, contact Dell Technical Support. General Problems Table6-1 describes general problems you might encounter, along with suggested solutions. Table 6-1. General Problems Problem Suggested Solution The device displays in Device Manager but has a yellow bang (exclamation point). Reinstall the driver. See the driver installation procedures in the section Driver Installation. The device does not appear in Device Manager. Turn off
in Supermicro, DELL (PERC), ESXi and Intel servers. The program is a text based command line interface (CLI) and is comprised of a single static binary file. We are not a fan of graphical interfaces (GUI) and appreciate the control a command line program gives over a GUI solution. Using some simple shell scripting we can find out the health of the RAID, email ourselves about problems and work with failed drives. There are many MegaCLI command pages which simply rehash the same commands over and over and we wanted to offer something more. For our examples we are using Ubuntu Linux and FreeBSD with the MegaCli64 binary. All of these same scripts and commands work for the 32bit and 64bit binaries. Installing the MegaCLI binary In order to communicate with the LSI card you will need the MegaCLI or MegaCLI64 (64bit) program. The install should be quite easy, but LSI make us jump through a few hoops. This is what we found: Go to the LSI Downloads page: LSI Downloads Search by keyword "megacli Click on "Management Software and Tools" Download the MegaCLI zip file. You will see the same file is for DOS, Windows, Linux and FreeBSD. Unzip the file In the Linux directory there is an RPM. If you are using Redhat you can install it. For Ubuntu got the next step. For Ubuntu run "rpm2cpio MegaCli-*.rpm | cpio -idmv" to expand the directory structure. You may need to "apt-get install rpm2cpio" . For FreeBSD unzip the file in the FreeBSD directory. On our Ubuntu Linux 64bit and FreeBSD 64bit servers we simply copied MegaCli64 (64bit) to /usr/local/sbin/ . You can put the binary anywhere you want, but we choose /usr/local/sbin/ because it is in root's path. Make sure to secure the binary. Make the owner root and chmod the binary to 700 (chown root /usr/local/sbin/MegaCli64; chmod 700 /usr/local/sbin/MegaCli64). The install is now done. We would like to see LSI make a Ubuntu PPA or FreeBSD ports entry sometime in the future, but this setup was not too bad. The lsi.sh MegaCLI interface script Once you have MegaCLI installed, the following is a script to help in getting information from the raid card. The shell script does nothing more then execute the commands you normally use on the CLI. The script can show the status of the raid and drives.