Esx Reservation Error Scsi Reservation Conflict
Contents |
Determining root cause for a SCSI Reservation Conflict Issue Posted on May 10, 2012 by Rick Blythe Previous vmware scsi device reset deep dive posts have dealt with performance issues or faulty hardware.
Fcp Cmd X28 Failed
This week Nathan Small (Twitter handle: vSphereStorage) takes us through the determination of root cause for
Scsi Reservation Conflict Solaris
a SCSI Reservation Conflict issue: History of issue: Customer performed a firmware upgrade to their IBM SVC's. Upon completing the firmware update and bringing them back online, https://kb.vmware.com/kb/1005009 command failures and SCSI reservation conflicts were observed across all hosts. Issue was resolved by sending LUN reset (vmkfstools -L lunreset /vmfs/devices/disks/naa.xxxxxxxxxxxxxxxxx) to the affected LUNs. Root cause analysis requested. We need to start by ensuring that all hosts saw the issue the same way, and that the firmware update sequence logically makes http://blogs.vmware.com/kb/2012/05/determining-root-cause-for-a-scsi-reservation-conflict-issue.html sense. Let's start by looking at the firmware update sequence, and to determine when it began and finished. We have to locate when the paths to one target dropped: 2012-04-29T13:18:19.684Z cpu10:4768)<3> rport-5:0-4: blocked FC remote port time out: saving binding 2012-04-29T13:18:19.685Z cpu2:4794)<3>lpfc820 0000:04:00.0: 0:(0):0203 Devloss timeout on WWPN 50:05:07:68:01:30:59:fd NPort x3f010d Data: x0 x7 x0 2012-04-29T13:18:19.713Z cpu12:4759)<3> rport-6:0-4: blocked FC remote port time out: saving binding 2012-04-29T13:18:19.714Z cpu7:4795)<3>lpfc820 0000:04:00.1: 1:(0):0203 Devloss timeout on WWPN 50:05:07:68:01:10:59:fd NPort x5d0010 Data: x0 x7 x0 From the above we can see there were 2 paths dropped, 1 per HBA. Infact, these Emulex device loss messages even give you the WWPN of the IBM SVC targets: WWPN 50:05:07:68:01:30:59:fd WWPN 50:05:07:68:01:10:59:fd For our example, we will refer to LUN 42 when comparing WWPNs of targets and tracking path failures: naa.60050768018900725000000000000810 : IBM Fibre Channel Disk (naa.60050768018900725000000000000810) vmhba1:C0:T2:L42 LUN:42 state:active fc Adapter: WWNN: 20:00:00:00:c9:a2:26:87 WWPN: 10:00:00:00:c9:a2:26:87 Target: WWNN: 50:05:07:68:01:00:59:fd WWPN: 50:05:07:68:01:10:59:fd vmhba1:C0:T1:L42 LUN:42 state:activ
is just to spread knowledge and has nothing to do with credit. Problem : We had 5 ESX host cluster running 3.5 U4 on DL380 G5 http://vikashkumarroy.blogspot.com/2009/12/deep-dive-into-scsi-reservation.html with 32 GB of memory with 8 logical CPU. Suddenly we realize that 3 http://blog.vmpros.nl/2010/12/23/vmware-scsi-reservation-conflicts/ of the ESX host were disconnected from Virtual Center. I restart " mgmt-vmware " service but no luck. I then disconnected the ESX from VC and also removed it but still we were not able to add the host. Finally I decided to login to ESX using VC client and it failed. That is scsi reservation the time we realize there is some serious issue. I then tried to CD /vmfs/volumes and it was giving me message I checked all the ESX host and each one has same issue. I can ‘cd' into any directory but not vmfs. This confirm that there is something happened with vmfs volume. I ran qlogic utility "iscli" and tried to perform HBA level troubleshooting. HBA does report scsi reservation conflict issue I checked vmkernal log which made me very suspicious . It had lots of SCSI reservation related issue. All the 3 ESX host had similar issue . Since it being production scenario, I decided not to waste time and contact VMware for possible solution. After having webex with VMware , VMware passed following recommendation to break fix this issue - put each ESX into maintenance mode (only 1 at a time) - allow vm's to be migrated off - reboot esx - repeat procedure for each esx. OR - Failover the active storage processor (Contact SAN vendor for assistance with this) Well I can not put ESX into ‘MM' since all the ESX is sharing the lun. Though only 3 has reported but other two ESX host are also not able to see the lun. I checked with Storage admin for possible fail over and he decline since Filer is also used by other application. Finally I decided to reboot the esx host which was shortcut to fix the issue with maximum outage . After rebooting ESX host I saw following messages into the console After rebooting ESX host it was not able to come back online. One the ESX
some trouble by a customer, 4 of my 8 datastores aren’t visible/accessible on the 6 ESX 3.5u2 hosts connected to a (FC) HP MSA1500. Some datastores become unavailable and some were not affected. Numerous VMs were down, some of those with warning messages like Orphaned and Inaccessible. Oke.. let’s troubleshoot: Checking for active paths: esxcfg-mpath -l | grep -i active FC 6:2.1 210100e08bb27a58<->500508b30091aac9 vmhba1:0:2 On active preferred FC 6:2.1 210100e08bb27a58<->500508b30091aac9 vmhba1:0:6 On active preferred Local 70:0.0 vmhba2:0:0 On active preferred FC 6:2.1 210100e08bb27a58<->500508b30091aac9 vmhba1:0:1 On active preferred FC 6:2.1 210100e08bb27a58<->500508b30091aac9 vmhba1:0:5 On active preferred FC 6:2.1 210100e08bb27a58<->500508b30091aac9 vmhba1:0:3 On active preferred FC 6:2.1 210100e08bb27a58<->500508b30091aac9 vmhba1:0:7 On active preferred FC 6:2.1 210100e08bb27a58<->500508b30091aac9 vmhba1:0:0 On active preferred FC 6:2.1 210100e08bb27a58<->500508b30091aac9 vmhba1:0:4 On active preferred All online! Checking for death paths: esxcfg-mpath -l | grep -i death [root@esxmeri01 /]# None death paths! Checking my HBA’s connected to the SAN: cd /vmfs/devices/disksand list: ls vmh* vmhba1:0:1:0 vmhba1:0:2:1 vmhba1:0:4:0 vmhba1:0:5:1 vmhba1:0:7:0 vmhba2:0:0:1 vmhba2:0:0:3 vmhba2:0:0:6 vmhba2:0:0:9 vmhba1:0:1:1 vmhba1:0:3:0 vmhba1:0:4:1 vmhba1:0:6:0 vmhba1:0:7:1 vmhba2:0:0:10 vmhba2:0:0:4 vmhba2:0:0:7 vmhba1:0:2:0 vmhba1:0:3:1 vmhba1:0:5:0 vmhba1:0:6:1 vmhba2:0:0:0 vmhba2:0:0:2 vmhba2:0:0:5 vmhba2:0:0:8 All online! Represent LUN’s from HP ACU to ESX hosts: I unpresent and represented the LUN’s in the HP ACU to the hosts, did a rescan but no still no success Reservation conflicts?? After some troubleshooting and trying to get my datastores online I found some information to point me in the right direction: cd /var/log # cat dmesg resize_dma_pool: unknown device type 12 scsi2 (0,0,1) : RESERVATION CONFLICT scsi2 (0,0,1) : RESERVATION CONFLICT scsi2 (0,0,1) : RESERVATION CONFLICT scsi2 (0,0,1) : RESERVATION CONFLICT scsi2 (0,0,1) : RESERVATION CONFLICT scsi2 (0,0,1) : RESERVATION CONFLICT VMWARE: Device that would have been attached as scsi disk sda at scsi2, channel 0, id 0, lun 1 Has not been attached because this path could not complete a READ command eventhough a TUR worked. result = 0x18 key = 0x0, asc = 0x0, ascq = 0x0 VMWARE: Device that would have been attached as scsi disk sda at scsi2, channel 0, id 0, lun 1 Has not been attached because it is a duplicate path or on a passive path resize_dma_pool: unknown device type 12 VMWARE SCSI Id: Supported VPD pages for sda : 0x0 0x80 0x83 0xc0 0xb0 0xc1 VMWARE SCS