Raid 5 Read Error Not Correctable
Contents |
2011, 07:03 AMTL;DR - I am trying to create a new mdadm RAID 5 device /dev/md0 across three disks where such an array previously existed, but whenever I do it never recovers properly and tells me that I have a faulty hdparm read-sector spare in my array. More-specific details below. Advice/help much appreciated! Hi all, I recently installed
Add. Sense: Unrecovered Read Error - Auto Reallocate Failed
Ubuntu Server 10.10 on a new box with the intent of using it as a NAS sorta-thing. I have 3 HDDs (2 TB each)
Hdparm Make Bad Sector
and was hoping to use most of the available disk space as a RAID5 mdadm device (which gives me a bit less than 4TB.) I configured /dev/md0 during OS installation across three partitions on the three disks - /dev/sda5,
Hdparm Repair-sector
/dev/sdb5 and /dev/sdc5, which are all identical sizes. The OS, swap partition etc. are all on /dev/sda. Everything worked fine, and I was able to format the device as ext4 and mount it. Good so far. Then I thought I should simulate a failure before I started keeping important stuff on the RAID array - no point having RAID 5 if it doesn't provide some redundancy that I actually know how to use, right? So I unplugged one sense key : medium error [current] of my drives, booted up, and was able to mount the device in a degraded state; test data I had put on there was still fine. Great. My trouble began when I plugged the third drive back in and re-booted. I re-added the removed drive to /dev/md0 and recovery began; things would look something like this: user@guybrush:~$ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid5 sdc5[3] sdb5[1] sda5[0] 3779096448 blocks level 5, 64k chunk, algorithm 2 [3/2] [UU_] [>....................] recovery = 3.3% (63301376/1889548224) finish=475.2min speed=64049K/sec unused devices:
here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business Learn more about hiring developers or posting hdparm yes i know what i am doing ads with us Unix & Linux Questions Tags Users Badges Unanswered Ask Question _ Unix & hdparm pending sector Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. Join them; it only takes a unhandled sense code minute: Sign up Here's how it works: Anybody can ask a question Anybody can answer The best answers are voted up and rise to the top RAID 5 with 4 disks fails to operate with one failed disk? up vote 6 down https://ubuntuforums.org/archive/index.php/t-1681924.html vote favorite I found a question about mdadm spare disks which almost answers my question, but it isn't clear to me what is happening. We have a RAID5 set up with 4 disks - and all are labeled in normal operation as active/sync: Update Time : Sun Sep 29 03:44:01 2013 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 ... Number Major Minor RaidDevice State 0 202 32 0 active sync /dev/sdc 1 http://unix.stackexchange.com/questions/93976/raid-5-with-4-disks-fails-to-operate-with-one-failed-disk 202 48 1 active sync /dev/sdd 2 202 64 2 active sync /dev/sde 4 202 80 3 active sync /dev/sdf But then when one of the disks failed, the RAID stopped working: Update Time : Sun Sep 29 01:00:01 2013 State : clean, FAILED Active Devices : 2 Working Devices : 3 Failed Devices : 1 Spare Devices : 1 ... Number Major Minor RaidDevice State 0 202 32 0 active sync /dev/sdc 1 202 48 1 active sync /dev/sdd 2 0 0 2 removed 3 0 0 3 removed 2 202 64 - faulty spare /dev/sde 4 202 80 - spare /dev/sdf What is really going on here?? The fix was to reinstall the RAID - luckily I can do that. Next time it'll probably have some serious data on it. I need to understand this so I can have a RAID that won't fail because of a single disk failure. I realized I didn't list what I expected vs. what happened. I expect that a RAID5 with 3 good disks and 1 bad will operate in a degraded mode - 3 active/sync and 1 faulty. What happened was a spare was created out of thin air and declared faulty - then a new spare was also created out of thin air and declared sound - after which the RAID was declared inoperative. This is the output from blkid: $ blkid /dev/xvda1: LABEL="/" UUID="4797c72d-85bd-421a-9c01-52243aa28f6c" TYPE="ext4" /dev/xvdc: UUID="feb2c515-6003-478b-beb0-089fed71b33f" TYPE="ext3" /dev/xvdd: UUID="feb2c515-6003-478b-beb0-089fed71b33f" SEC_TYPE="ext2" TYPE="ext3" /dev/xvde: UUID="feb2c515-6003-478b-beb0-089fed71b33f" SEC_TYPE="ext2" TYPE="ext3" /dev/xvdf: UUID="feb2c515-6003-478b-beb0-089fed71b
Start here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us http://serverfault.com/questions/598955/what-steps-should-i-take-to-best-attempt-to-recover-a-failed-software-raid5-setu Learn more about Stack Overflow the company Business Learn more about hiring developers https://bbs.archlinux.org/viewtopic.php?id=106919 or posting ads with us Server Fault Questions Tags Users Badges Unanswered Ask Question _ Server Fault is a question and answer site for system and network administrators. Join them; it only takes a minute: Sign up Here's how it works: Anybody can ask a question Anybody can answer The best answers are voted read error up and rise to the top What steps should I take to best attempt to recover a failed software raid5 setup? up vote 4 down vote favorite My raid has failed, and I'm not sure what the best steps to take are in order to best attempt to recover it. I've got 4 drives in a raid5 configuration. It seems as if one has failed (sde1), but md raid 5 read can't bring the array up because it says sdd1 is not fresh Is there anything I can do to recover the array? I've pasted below some excerpts from /var/log/messages and mdadm --examine: /var/log/messages $ egrep -w sd[b,c,d,e]\|raid\|md /var/log/messages nas kernel: [...] sd 5:0:0:0: [sde] nas kernel: [...] sd 5:0:0:0: [sde] CDB: nas kernel: [...] end_request: I/O error, dev sde, sector 937821218 nas kernel: [...] sd 5:0:0:0: [sde] killing request nas kernel: [...] md/raid:md0: read error not correctable (sector 937821184 on sde1). nas kernel: [...] md/raid:md0: Disk failure on sde1, disabling device. nas kernel: [...] md/raid:md0: Operation continuing on 2 devices. nas kernel: [...] md/raid:md0: read error not correctable (sector 937821256 on sde1). nas kernel: [...] sd 5:0:0:0: [sde] Unhandled error code nas kernel: [...] sd 5:0:0:0: [sde] nas kernel: [...] sd 5:0:0:0: [sde] CDB: nas kernel: [...] end_request: I/O error, dev sde, sector 937820194 nas kernel: [...] sd 5:0:0:0: [sde] Synchronizing SCSI cache nas kernel: [...] sd 5:0:0:0: [sde] nas kernel: [...] sd 5:0:0:0: [sde] Stopping disk nas kernel: [...] sd 5:0:0:0: [sde] START_STOP FAILED nas kernel: [...] sd 5:0:0:0: [sde] nas kernel: [...] md: unbind
Registered: 2006-03-31 Posts: 575 [SOLVED] mdadm / RAID trouble Hi all,In relation to this post: Migrating data to a "new" setup, questionsI think I've a problem with my new RAID5 array everything seemed to work fine, I was transferring stuff to it, which went ok. Then I reduced the old fs on /home (which held everything) to the minimum size ~2.3TB, and pvmove:ed a partition off it so I could add it to the RAID array. After unmounting the raid to do an fsck, something happened (though it didn't say!), it's like the RAID array just disappeared.Here's the dmesg: http://dpaste.org/1oUi/ snippet below:md0: detected capacity change from 4000795590656 to 0md: md0 stopped.md: unbind