7096 Reservation Error Timeout
Contents |
NSXVirtual SAN vCenterFusionWorkstationvExpertVMware {code} CloudCredSubmit a Link Home > VMTN > VMware vSphereâ„¢ > VMware ESX 4 > Discussions Please enter a title. You can not post a blank message. Please type your message and try again. 3 Replies Latest reply: Mar 11, 2011 7:57 AM
Reservation State On Device Is Unknown.
by EricD201110141 Lose a storage path, Lose an entire ESX Server unsichtbare scsi reservation conflict vmware Mar 9, 2011 8:33 AM OK, I have a flakey iSCSI Storage path/device. That's not relly the part
Nmp Device State In Doubt Requested Fast Path State Update
that bothers me, rather that when I loose the path/device I loose the entire ESX and all of the VMs which are running on it!What's worse, although I can not what is scsi reservation connect with the vSphere client, I can connect via SSH or iLO (the COS) and although HA is configured, it does not recognize a host failure and therefore no VMs restart. I have chosen not to enable Virtual Machine Monitoring for other reasons, although I understand it might resolve the situation with the VMs themselves.I am concerned mostly with the fact scsideviceio 2338 that loosing a storage path/device seems to disable the entire ESX affected. Here's the vmkernel log for the afected ESX.Mar 9 09:20:53 control13 vmkernel: 10:17:41:58.157 cpu2:4199)NMP: nmp_PathDetermineFailure: SCSI cmd RESERVE failed on path vmhba32:C0:T2:L0, reservation state on device naa.5000144f25649955 is unknown.Mar 9 09:20:53 control13 vmkernel: 10:17:41:58.157 cpu2:4199)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.5000144f25649955" state in doubt; requested fast path state update...Mar 9 09:20:53 control13 vmkernel: 10:17:41:58.157 cpu2:4199)ScsiDeviceIO: 1672: Command 0x16 to device "naa.5000144f25649955" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.Mar 9 09:20:53 control13 vmkernel: 10:17:41:58.157 cpu3:4113)WARNING: FS3: 7096: Reservation error: IO was abortedMar 9 09:20:53 control13 vmkernel: 10:17:41:58.157 cpu2:4107)FS3: 7412: Waiting for timed-out heartbeat [HB state abcdef02 offset 4161536 gen 599 stamp 922837575795 uuid 4d698103-23122782-17dd-001a4bd16da2 jrnl
Determining root cause for a SCSI Reservation Conflict Issue Posted on May 10, 2012 by Rick Blythe Previous deep dive posts have dealt with performance
Nmp_pathdeterminefailure
issues or faulty hardware. This week Nathan Small (Twitter handle: vSphereStorage) takes nmp_throttlelogfordevice failed us through the determination of root cause for a SCSI Reservation Conflict issue: History of issue: Customer performed
Lun Reset
a firmware upgrade to their IBM SVC's. Upon completing the firmware update and bringing them back online, command failures and SCSI reservation conflicts were observed across all hosts. Issue was https://communities.vmware.com/thread/305738?start=0&tstart=0 resolved by sending LUN reset (vmkfstools -L lunreset /vmfs/devices/disks/naa.xxxxxxxxxxxxxxxxx) to the affected LUNs. Root cause analysis requested. We need to start by ensuring that all hosts saw the issue the same way, and that the firmware update sequence logically makes sense. Let's start by looking at the firmware update sequence, and to determine when it began and finished. We have http://blogs.vmware.com/kb/2012/05/determining-root-cause-for-a-scsi-reservation-conflict-issue.html to locate when the paths to one target dropped: 2012-04-29T13:18:19.684Z cpu10:4768)<3> rport-5:0-4: blocked FC remote port time out: saving binding 2012-04-29T13:18:19.685Z cpu2:4794)<3>lpfc820 0000:04:00.0: 0:(0):0203 Devloss timeout on WWPN 50:05:07:68:01:30:59:fd NPort x3f010d Data: x0 x7 x0 2012-04-29T13:18:19.713Z cpu12:4759)<3> rport-6:0-4: blocked FC remote port time out: saving binding 2012-04-29T13:18:19.714Z cpu7:4795)<3>lpfc820 0000:04:00.1: 1:(0):0203 Devloss timeout on WWPN 50:05:07:68:01:10:59:fd NPort x5d0010 Data: x0 x7 x0 From the above we can see there were 2 paths dropped, 1 per HBA. Infact, these Emulex device loss messages even give you the WWPN of the IBM SVC targets: WWPN 50:05:07:68:01:30:59:fd WWPN 50:05:07:68:01:10:59:fd For our example, we will refer to LUN 42 when comparing WWPNs of targets and tracking path failures: naa.60050768018900725000000000000810 : IBM Fibre Channel Disk (naa.60050768018900725000000000000810) vmhba1:C0:T2:L42 LUN:42 state:active fc Adapter: WWNN: 20:00:00:00:c9:a2:26:87 WWPN: 10:00:00:00:c9:a2:26:87 Target: WWNN: 50:05:07:68:01:00:59:fd WWPN: 50:05:07:68:01:10:59:fd vmhba1:C0:T1:L42 LUN:42 state:active fc Adapter: WWNN: 20:00:00:00:c9:a2:26:87 WWPN: 10:00:00:00:c9:a2:26:87 Target: WWNN: 50:05:07:68:01:00:59:fc WWPN: 50:05:07:68:01:20:59:fc vmhba0:C0:T3:L42 LUN:42 state:active fc Adapter: WWNN: 20:00:00:00:c9:a2:26:86 WWPN: 10:00:00:00:c9:a2:26:86 Target: WWNN: 50:05:07:68:01:00:59:fc WWPN: 50:05:07:68:01:40:59:fc vmhba0:C0:T2:L42 LUN:42 state:active fc Adapter: WWNN: 20:00:00:00:c9:a2:26:86 WWPN: 10:00:00:00:c9:a2:26:86 Target: WWNN: 50:05:07:68:01:00:59:fd WW