Buffer I/o Error On Device Dm-10 Logical Block
Contents |
on megaraid rebuild, analysis and Issues related to hardware problems Post Reply Print view Search Advanced search 13 posts 1 2 Next jamesNJ Posts: 18 Joined: 2015/02/25 21:49:44 CentOS server
Buffer I O Error On Device Sdb Logical Block
freeze/crash on megaraid rebuild, analysis and Quote Postby jamesNJ » 2015/07/24 17:25:21 Hello buffer i o error on device sr0 logical block all,I have a problem with a large CentOS 7 server hosting an LSI MegaRAID controller with 16x 1tb SAS drives. buffer i o error on device sdc1 logical block The server goes dead at night requiring a forced reboot or power cycle to restore service. If it matters, this server has 1 large RAID-6 volume with 1 global hot spare available.I believe
Buffer I O Error On Device Sda Logical Block
I have narrowed this issue down to the MegaRAID controller being busy with a RAID rebuild, and some automated action occurring at night that confuses LVM into oblivion. The issue is difficult to narrow down because this “automated action” seems to result in all file systems being marked read-only. Syslog seems to continue working, but obviously cannot write useful data out to disk. Hence I have
Buffer I O Error On Device Sdb1 Logical Block
only been able to collect data on those rare times that I can actually log in when this issue occurs. I was able to capture 2 points of data that seem to start out with the same error condition.This only seems to occur when a drive fails and the MegRAID rebuilds to the global hot spare, or if I force some action on the RAID which causes a drive fail and rebuild to an alternate disk (I had a few disks get SMART predictive failures and have been working to replace these with new). I initially thought this issue was related to smartd warning messages, however when I replaced the last drive with predictive failures, the rebuild triggered the same behavior.So what seems to be the pattern is that I kick off a rebuild (which takes many hours) and then sometime around midnight a systemd-udevd process kicks in and the system eventually ends up unresponsive. From the 2 times I was able to get on, these messages seem to be in common right at the time file systems go read-only:Jul 15 00:45:44 server1 kernel: megaraid_sas: scanning for scsi0...Jul 15 00:45:44 server1 systemd-udevd: failed to execute '/sbin/mdad
Bad disk? Date: Wed, 10 Nov 2010 09:39:00 -0500 Yesterday I added a hard drive (to put extra stuff on it) to my ubuntu buffer i o error on device sdb logical block 0 10.10 box and created a LVM in it. Then copied some files
Buffer I O Error On Device Sdc Logical Block 0
to it and restarted the machine to see if it would mount into the right mountpoint. It buffer io error on device logical block didn't. So I decided to see if it was there (vg in question is export): raub strangepork:~$ sudo vgscan Reading all physical volumes. This may take a while... http://www.centos.org/forums/viewtopic.php?t=53481 /dev/dm-0: read failed after 0 of 4096 at 429496664064: Input/output error /dev/dm-0: read failed after 0 of 4096 at 429496721408: Input/output error /dev/dm-0: read failed after 0 of 4096 at 0: Input/output error /dev/dm-0: read failed after 0 of 4096 at 4096: Input/output error /dev/dm-0: read failed after 0 of 4096 at 0: Input/output error Found volume group https://www.redhat.com/archives/linux-lvm/2010-November/msg00011.html "export" using metadata type lvm2 Found volume group "root" using metadata type lvm2 raub strangepork:~$ Those dm-0 messages do not make me happy. dmesg and vgchange make me think the problem is on the new drive: [ 268.024593] scsi 0:0:0:0: Direct-Access ATA ST3500320NS SN04 PQ: 0 ANSI: 5 [ 268.024900] sd 0:0:0:0: [sdc] 976773168 512-byte logical blocks: (500 GB/465 GiB) [ 268.024918] sd 0:0:0:0: Attached scsi generic sg2 type 0 [ 268.024996] sd 0:0:0:0: [sdc] Write Protect is off [ 268.025003] sd 0:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [ 268.025046] sd 0:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 268.025377] sdc: sdc1 [ 268.049853] sd 0:0:0:0: [sdc] Attached SCSI disk [ 335.467482] quiet_error: 3 callbacks suppressed [ 335.467492] Buffer I/O error on device dm-0, logical block 104857584 [ 335.467540] Buffer I/O error on device dm-0, logical block 104857584 [ 335.467589] Buffer I/O error on device dm-0, logical block 104857598 [ 335.467615] Buffer I/O error on device dm-0, logical block 104857598 [ 335.
Customer Profit Analyzer Internet Gateway Application ICE Workstations Backups Faxing Printers Thin Clients Virtualization Scanners Outbound Email Services Linux Technical Support http://kb.eclipseinc.com/kb/can-i-safely-ignore-io-errors-on-dm-devices/ Advisories ECLA-20110330-1 Search Search for: Recent Posts Best practices for securing http://unix.stackexchange.com/questions/98208/i-o-errors-on-linux-lvm the Eclipse database server Badlock Security Alert How to set the destination of the backup report email? DROWN SSL Security Alert March 2016 Configure Postfix Relay with Google Apps Popular Posts How do I reboot my AIX server? How do I access Dell OpenManage? error on How do I find my AIX server’s IP address? How do I manage Linux print queues? How do I increase the size of my AIX dump device? Links Eclipse Support Site Admin Can I safely ignore I/O errors on dm devices? The root user of a system using may occasionally receive a message similar to the error on device following in the daily logwatch email: --------------------- Kernel Begin ------------------------ WARNING: Kernel Errors Present Buffer I/O error on device dm-7, ...: 11 Time(s) EXT3-fs error (device dm-7): e ...: 90 Time(s) lost page write due to I/O error on dm-7 ...: 11 Time(s) Likewise, you may notice similar error messages in the /var/log/messages file: May 16 04:04:52 eclipse kernel: lost page write due to I/O error on dm-20 May 16 04:04:52 eclipse kernel: Buffer I/O error on device dm-20, logical block 0 May 16 04:04:52 eclipse kernel: lost page write due to I/O error on dm-20 May 16 04:04:52 eclipse kernel: Buffer I/O error on device dm-20, logical block 0 May 16 04:04:52 eclipse kernel: lost page write due to I/O error on dm-20 May 16 04:04:52 eclipse kernel: Buffer I/O error on device dm-20, logical block 0 May 16 04:04:52 eclipse kernel: lost page write due to I/O error on dm-20 If the device mapper (dm-n) device(s) mentioned in the me
here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business Learn more about hiring developers or posting ads with us Unix & Linux Questions Tags Users Badges Unanswered Ask Question _ Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. Join them; it only takes a minute: Sign up Here's how it works: Anybody can ask a question Anybody can answer The best answers are voted up and rise to the top I/O errors on Linux LVM up vote 4 down vote favorite I have a CentOS 6 box with LVM setup and one of the PVs is a USB disk (I know). One of them is getting the error: Oct 30 10:57:07 alpha01 kernel: lost page write due to I/O error on dm-3 Oct 30 10:57:07 alpha01 kernel: Buffer I/O error on device dm-3, logical block 4 Which is causing problems with all of the LVs on it. pvs shows the PV as unknown device. I can ls to the logical volumes and they show up in lvdisplay, but first I get a bunch of IO errors. I made sure the cables are secure between the USB drive. What should I do to get this back up and running for the meanwhile? Should I unmount each LV and run an fsck.ext4 on each one like fsck.ext4 -y /dev/vg1/lv_logvolname ? linux lvm fsck share|improve this question asked Oct 30 '13 at 15:06 Gregg Leventhal 2,29032752 In addition to fsck, if the external drive is SMART capable, checking the drive status/health and running the drive self tests may be useful. Backing up all the data may also be important. –rickhg12hs Oct 30 '13 at 15:44 add a comment| 1 Answer 1 active oldest votes up vote 2 down vote accepted I usually don't go the route of running an fsck and assume the disk is failing or has ba