Read Error Not Correctable Sector
Contents |
and only cause some noise in your syslog. In most cases the disk will hdparm read-sector automatically reallocate one or two damaged sectors and you should hdparm make bad sector start planning on buying a new disk while your data is safe. However, sometimes the disk won't hdparm repair-sector automatically reallocate these sectors and you'll have to do that manually yourself. Luckily, this doesn't include any rocket science. A few days ago, one of my add. sense: unrecovered read error - auto reallocate failed disks reported some problems in my syslog while rebuilding a RAID5-array: Jan 29 18:19:54 dragon kernel: [66774.973049] end_request: I/O error, dev sdb, sector 1261069669 Jan 29 18:19:54 dragon kernel: [66774.973054] raid5:md3: read error not correctable (sector 405431640 on sdb6). Jan 29 18:19:54 dragon kernel: [66774.973059] raid5: Disk failure on sdb6, disabling device. Jan 29
Sense Key : Medium Error [current]
18:20:11 dragon kernel: [66792.180513] sd 3:0:0:0: [sdb] Unhandled sense code Jan 29 18:20:11 dragon kernel: [66792.180516] sd 3:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jan 29 18:20:11 dragon kernel: [66792.180521] sd 3:0:0:0: [sdb] Sense Key : Medium Error [current] [descriptor] Jan 29 18:20:11 dragon kernel: [66792.180547] sd 3:0:0:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed Jan 29 18:20:11 dragon kernel: [66792.180553] sd 3:0:0:0: [sdb] CDB: Read(10): 28 00 4b 2a 6c 4c 00 00 c0 00 Jan 29 18:20:11 dragon kernel: [66792.180564] end_request: I/O error, dev sdb, sector 1261071601 Modern hard disk drives are equipped with a small amount of spare sectors to reallocate damaged sectors. However, a sector only gets relocated when a write operation fails. A failing read operation will, in most cases, only throw an I/O error. In the unlikely event a second read does succeed, some disks perform a auto-reallocation and data is preserved. In my case, the second read failed miserably ("Unrecovered read error
Registered: 2006-03-31 Posts: 575 [SOLVED] mdadm / RAID trouble Hi all,In relation to this post: Migrating data to a "new"
Hdparm Pending Sector
setup, questionsI think I've a problem with my new RAID5 hdparm yes i know what i am doing array everything seemed to work fine, I was transferring stuff to it, which went ok. Then unhandled sense code I reduced the old fs on /home (which held everything) to the minimum size ~2.3TB, and pvmove:ed a partition off it so I could add http://www.sj-vs.net/forcing-a-hard-disk-to-reallocate-bad-sectors/ it to the RAID array. After unmounting the raid to do an fsck, something happened (though it didn't say!), it's like the RAID array just disappeared.Here's the dmesg: http://dpaste.org/1oUi/ snippet below:md0: detected capacity change from 4000795590656 to 0md: md0 stopped.md: unbind
here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business Learn more about hiring developers or posting http://unix.stackexchange.com/questions/93976/raid-5-with-4-disks-fails-to-operate-with-one-failed-disk ads with us Unix & Linux Questions Tags Users Badges Unanswered Ask Question _ Unix & http://serverfault.com/questions/598955/what-steps-should-i-take-to-best-attempt-to-recover-a-failed-software-raid5-setu Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. Join them; it only takes a minute: Sign up Here's how it works: Anybody can ask a question Anybody can answer The best answers are voted up and rise to the top RAID 5 with 4 disks fails to operate with one failed disk? up vote 6 down read error vote favorite I found a question about mdadm spare disks which almost answers my question, but it isn't clear to me what is happening. We have a RAID5 set up with 4 disks - and all are labeled in normal operation as active/sync: Update Time : Sun Sep 29 03:44:01 2013 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 ... Number Major Minor RaidDevice State 0 202 32 0 active sync /dev/sdc 1 read error not 202 48 1 active sync /dev/sdd 2 202 64 2 active sync /dev/sde 4 202 80 3 active sync /dev/sdf But then when one of the disks failed, the RAID stopped working: Update Time : Sun Sep 29 01:00:01 2013 State : clean, FAILED Active Devices : 2 Working Devices : 3 Failed Devices : 1 Spare Devices : 1 ... Number Major Minor RaidDevice State 0 202 32 0 active sync /dev/sdc 1 202 48 1 active sync /dev/sdd 2 0 0 2 removed 3 0 0 3 removed 2 202 64 - faulty spare /dev/sde 4 202 80 - spare /dev/sdf What is really going on here?? The fix was to reinstall the RAID - luckily I can do that. Next time it'll probably have some serious data on it. I need to understand this so I can have a RAID that won't fail because of a single disk failure. I realized I didn't list what I expected vs. what happened. I expect that a RAID5 with 3 good disks and 1 bad will operate in a degraded mode - 3 active/sync and 1 faulty. What happened was a spare was created out of thin air and declared faulty - then a new spare was also created out of thin air and declared sound - after which the RAID was declared inoperative. This is the output from blkid: $ blkid /dev/xvda1: LABEL="/" UUID="4797c72d-85bd-421a-9c01-52243aa28f6c" TYPE="ext4" /dev/xvdc: UUID="feb2c515-6003-478b-beb0-089fed71b33f" TYPE="ext3" /dev/xvdd: UUID="feb2c515-6003-478b-beb0-089fed71b33f" SEC_TYPE="ext2" TYPE="ext3" /dev/xvde: UUID="feb2c515-6003-478b-beb0-089fed71b33f" SEC_TYPE="ext2" TYPE="ext3" /dev/xvdf: UUID="feb2c515-6003-478b-beb0-089fed71b33f" SEC_TYPE="e
Start here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business Learn more about hiring developers or posting ads with us Server Fault Questions Tags Users Badges Unanswered Ask Question _ Server Fault is a question and answer site for system and network administrators. Join them; it only takes a minute: Sign up Here's how it works: Anybody can ask a question Anybody can answer The best answers are voted up and rise to the top What steps should I take to best attempt to recover a failed software raid5 setup? up vote 4 down vote favorite My raid has failed, and I'm not sure what the best steps to take are in order to best attempt to recover it. I've got 4 drives in a raid5 configuration. It seems as if one has failed (sde1), but md can't bring the array up because it says sdd1 is not fresh Is there anything I can do to recover the array? I've pasted below some excerpts from /var/log/messages and mdadm --examine: /var/log/messages $ egrep -w sd[b,c,d,e]\|raid\|md /var/log/messages nas kernel: [...] sd 5:0:0:0: [sde] nas kernel: [...] sd 5:0:0:0: [sde] CDB: nas kernel: [...] end_request: I/O error, dev sde, sector 937821218 nas kernel: [...] sd 5:0:0:0: [sde] killing request nas kernel: [...] md/raid:md0: read error not correctable (sector 937821184 on sde1). nas kernel: [...] md/raid:md0: Disk failure on sde1, disabling device. nas kernel: [...] md/raid:md0: Operation continuing on 2 devices. nas kernel: [...] md/raid:md0: read error not correctable (sector 937821256 on sde1). nas kernel: [...] sd 5:0:0:0: [sde] Unhandled error code nas kernel: [...] sd 5:0:0:0: [sde] nas kernel: [...] sd 5:0:0:0: [sde] CDB: nas kernel: [...] end_request: I/O error, dev sde, sector 937820194 nas kernel: [...] sd 5:0:0:0: [sde] Synch