Read Error On /dev/dsk/c1t1d0s0
Contents |
Important Solaris Volume Manager FilesChapter25 Troubleshooting Solaris Volume Manager (Tasks) This chapter describes how to troubleshoot problems that are related to Solaris solaris svm cheat sheet Volume Manager. This chapter provides both general troubleshooting guidelines and
Warning: Stale State Database Replicas. Metastat Output May Be Inaccurate
specific procedures for resolving some known problems. This chapter includes the following information: Troubleshooting set md:mirrored_root_flag=1 Solaris Volume Manager (Task Map) Overview of Troubleshooting the System Replacing Disks Recovering From Disk Movement Problems Device ID Discrepancies After Upgrading to the metastat: there are no existing databases Solaris 10 Release Recovering From Boot Problems Recovering From State Database Replica Failures Recovering From Soft Partition Problems Recovering Storage From a Different System Recovering From Disk Set Problems Performing Mounted Filesystem Backups Using the ufsdump Command Performing System Recovery This chapter describes some Solaris Volume Manager problems
Metastat Needs Maintenance
and their appropriate solution. This chapter is not intended to be all-inclusive. but rather to present common scenarios and recovery procedures. Troubleshooting Solaris Volume Manager (Task Map) The following task map identifies some procedures that are needed to troubleshoot Solaris Volume Manager. Task Description For Instructions Replace a failed disk Replace a disk, then update state database replicas and logical volumes on the new disk. How to Replace a Failed Disk Recover from disk movement problems Restore disks to original locations or contact product support. Recovering From Disk Movement Problems Recover from improper /etc/vfstab entries Use the fsck command on the mirror, then edit the /etc/vfstab file so that the system boots correctly. How to Recover From Improper /etc/vfstab Entries Recover from a boot device failure Boot from a different submirror. How to Recover From a Boot Device Failure Recover from ins
Logical Volume Manager (LVM), and is for the most part transparent (other than the I/Os that occur when the new device is synchronized from parity or a RAID-1 mirror). The mount: /dev/md/dsk/d0 is not this fstype. Solaris Volume Manager (SVM) supports hot sparing with hot spares pools, which are a metaclear metadevice in use collection of devices devoted to being hot spares. This pool is associated with one or more meta devices, which are
Metareplace
configured through the metahs(1m) utility: $ metahs -a hsp001 c1t5d0s0 $ metahs -a hsp001 c1t6d0s0 This example created a new hot spare pool named hsp001, and assigns two devices to it. We can http://docs.oracle.com/cd/E19253-01/816-4520/6manpiepm/index.html view the contents of the hot spare pools with metahs's "-i" option: $ metahs -i hsp001: 2 hot spares Device Status Length Reloc c1t6d0s0 Available 35523720 blocks Yes c1t5d0s0 Available 35523720 blocks Yes This displays both devices that are currently assigned to the pool, and includes a status field to indicate if the drive is actively being used to replace a faulted device. Once a hot spare pool http://prefetch.net/articles/solarishotspares.html is created, it needs to be attached to a meta device with the metaparam(1m) utility: $ metaparam -h hsp001 d5 This will attach the hot spare pool hsp001 to the meta device d5. To see which hot spare pool is attached to a meta device, you can run metastat(1m) and look for the "Hot spare pool" attribute: $ metastat d5 d5: RAID State: Okay Hot spare pool: hsp001 Interlace: 128 blocks Size: 106085968 blocks (50 GB) Original device: Size: 106086528 blocks (50 GB) Device Start Block Dbase State Reloc Hot Spare c1t1d0s0 6002 No Okay Yes c1t2d0s0 4926 No Okay Yes c1t3d0s0 4926 No Okay Yes c1t4d0s0 4926 No Okay Yes When a disk fails, the kernel will usually log errors similar to the following: Jul 1 22:42:52 tigger scsi: [ID 107833 kern.warning] WARNING: /pci@1f,0/pci@1/scsi@4/sd@2,0 (sd3): Jul 1 22:42:52 tigger Error for Command: read(10) Error Level: Fatal Jul 1 22:42:52 tigger scsi: [ID 107833 kern.notice] Requested Block: 26672702 Error Block: 26672733 Jul 1 22:42:52 tigger scsi: [ID 107833 kern.notice] Vendor: SEAGATE Serial Number: NM020253 Jul 1 22:42:52 tigger scsi: [ID 107833 kern.notice] Sense Key: Media Error Jul 1 22:42:52 tigger scsi: [ID 107833 kern.notice] ASC: 0x11 (unrecovered read error), ASCQ
known simply as Solaris, is a Unix-based operating system introduced by Sun Microsystems. The Solaris OS is now owned by Oracle. Search Forums Show Threads Show Posts Tag Search Advanced Search Unanswered Threads Find All http://www.unix.com/solaris/114671-how-can-we-confirm-disk-has-failed-solaris.html Thanked Posts Go to Page... unix and linux operating commands How can we confirm that, the disk has failed in Solaris? Solaris Thread Tools Search this Thread Display Modes http://unixservermemo.web.fc2.com/suite/trouble.htm #1 07-18-2009 Naresh Kommina Registered User Join Date: Jul 2009 Last Activity: 18 July 2009, 5:09 AM EDT Posts: 1 Thanks: 0 Thanked 0 Times in 0 Posts How can we confirm that, read error the disk has failed in Solaris? Hi All, Seems to be one of the disk has failed on my Solaris server. How do i confirm that disk has really failed or not? Here are alert details. ------- Code: iostat -En out/put c1t3d0 Soft Errors: 1884 Hard Errors: 153 Transport Errors: 54 Vendor: FUJITSU Product: MAW3073NCSUN72G Revision: 1703 Serial No: 0618B0DR30 Size: 73.40GB <73400057856 bytes> Media Error: 144 Device Not Ready: 0 No Device: 9 read error on Recoverable: 1884 Illegal Request: 0 Predictive Failure Analysis: 9 dmesg o/p Jul 17 21:42:40 fiesnolsp03 md_stripe: [ID 641072 kern.warning] WARNING: md: d41: read error on /dev/dsk/c1t3d0s3 Jul 17 21:42:40 fiesnolsp03 md_mirror: [ID 104909 kern.warning] WARNING: md: d41: /dev/dsk/c1t3d0s3 needs maintenance metastat o/p Code: # metastat d4: Mirror Submirror 0: d40 State: Okay Submirror 1: d41 State: Needs maintenance Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 142606464 blocks (68 GB) d40: Submirror of d4 State: Okay Size: 142606464 blocks (68 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t2d0s3 0 No Okay Yes d41: Submirror of d4 State: Needs maintenance Invoke: metareplace d4 c1t3d0s3
※激しくデータが書き換わるproxyやメールサーバのパーティションに多い気がする metadetach -f d7 d27 サブミラーの強制切断 metattach d7 d27 再接続(再同期) ■ミラーリングの障害対応(要約) /var/adm/messages とmetastatで障害ディスクを特定する metadb 状態データベースを確認 metadb -d c0t1d0s4 その後異常のある状態データベースを削除。 metadetach d0 d10 問題のあるサブミラーを切断 boot -s シングルユーザモードになる formatコマンドにて正常なディスクとまったく同じパーティションを切る metadb -a c 3 c0t1d0s4 状態データベースを追加 metattach d0 d10 ミラーを再接続で完了 ■ミラーリングの障害対応(詳細) ※内容は上記と同じです ■HDDの電源を1つ落としたとき(故障した時と仮定しテストしました) 下記のようなエラーが表示され、このエラーが続く Apr 16 13:35:31 TestHost scsi: WARNING: /pci@1c,600000/scsi@2/sd@1,0 (sd1): Apr 16 13:35:31 TestHost disk not responding to selection ただしもうひとつのディスクは正常なので、そのまま動きつづける ○エラー内容 Apr 16 13:35:03 TestHost scsi: WARNING: /pci@1c,600000/scsi@2/sd@1,0 (sd1): SC Alert: DISK @ HDD1 has been removed. Apr 16 13:35:03 TestHost disk not responding to selection Apr 16 13:35:05 TestHost rmclomv: DISK @ HDD1 has been removed. Apr 16 13:35:07 TestHost scsi: WARNING: /pci@1c,600000/scsi@2/sd@1,0 (sd1): Apr 16 13:35:07 TestHost disk not responding to selection Apr 16 13:35:09 TestHost scsi: WARNING: /pci@1c,600000/scsi@2/sd@1,0 (sd1): Apr 16 13:35:09 TestHost disk not responding to selection Apr 16 13:35:09 TestHost scsi: WARNING: /pci@1c,600000/scsi@2/sd@1,0 (sd1): Apr 16 13:35:09 TestHost disk not responding to selection Apr 16 13:35:12 TestHost scsi: WARNING: /pci@1c,600000/scsi@2/sd@1,0 (sd1): Apr 16 13:35:12 TestHost disk not responding to selection Apr 16 13:35:16 TestHost s