Failed To Modify Qp To Error State
unavailable" Next message: ***SPAM*** Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable" Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] Hi Hal, Thanks again, I will try this in a minute. I think I have found the moment it went bad on Machine A using Dmesg: ib_mthca 0000:87:00.0: Catastrophic error detected: unknown error ib_mthca 0000:87:00.0: buf[00]: ffffffff ib_mthca 0000:87:00.0: buf[01]: ffffffff ib_mthca 0000:87:00.0: buf[02]: ffffffff ib_mthca 0000:87:00.0: buf[03]: ffffffff ib_mthca 0000:87:00.0: buf[04]: ffffffff ib_mthca 0000:87:00.0: buf[05]: ffffffff ib_mthca 0000:87:00.0: buf[06]: ffffffff ib_mthca 0000:87:00.0: buf[07]: ffffffff ib_mthca 0000:87:00.0: buf[08]: ffffffff ib_mthca 0000:87:00.0: buf[09]: ffffffff ib_mthca 0000:87:00.0: buf[0a]: ffffffff ib_mthca 0000:87:00.0: buf[0b]: ffffffff ib_mthca 0000:87:00.0: buf[0c]: ffffffff ib_mthca 0000:87:00.0: buf[0d]: ffffffff ib_mthca 0000:87:00.0: buf[0e]: ffffffff ib_mthca 0000:87:00.0: buf[0f]: ffffffff ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib0: ib_query_gid() failed ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib0: ib_query_port failed ib0: Failed to modify QP to ERROR state ib0: timing out; 1 sends 250 receives not completed ib0: Failed to modify QP to RESET state ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib_mthca 0000:87:00.0: HW2SW_CQ failed (-11) ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib_mthca 0000:87:00.0: HW2SW_CQ failed (-11) ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib_mthca 0000:87:00.0: HW2SW_SRQ failed (-11) ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11) Does this help to pinpoint what might have caused this? Thanks, Rob -----Original Message----- From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] Sent: 25 November 2008 15:19 To: Robert Dunkley Subject: Re: [ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable" Hi Rob, On Tue, Nov 25, 2008 at 10:01 AM, Robert Dunkley
Thu, 17 Mar 2011 14:13:42 +0200 Cc: RDMA list
Tips and FunLatest Release AnnouncementsSearchSearchSearchCancelError: You don't have JavaScript enabled. This tool uses JavaScript and much of it will not work correctly without it enabled. Please turn JavaScript back on and reload this page. All Places > Technical Forums > Software & Drivers > Mellanox OFED > Discussions Please enter a title. You can not post https://community.mellanox.com/thread/2069 a blank message. Please type your message and try again. 1 Reply Latest reply http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2015-July/005675.html on Mar 9, 2015 8:28 PM by pnot Modify QP error (HCA reset) pnot Mar 9, 2015 8:48 PM I'm having an issue as of yesterday with a system that has a 40GB dual port daughter card in it. The network connections for the 2 ports are showing disconnected. I'm in a home lab with failed to a single unmanaged switched running 2 instances of OpenSM on two separate servers.The error I'm getting is in the event viewer and it spams repeated until I stop the OpenSM service on the host.Mellanox ConnectX-2 IPoIB Adapter device reports a "Modify QP error" on qpn #0x58 Status #0xffffffea. Therefore, the HCA Nic will be reset. (The issue is reported in Function CMcast::CompleteJoinMcastWi).My other 4 40GB IB cards are functioning properly failed to modify and some of the things I've tried:1. Restart the OpenSM service on both hosts2. reset the daughter card3. tried a different set of cables4. reset the switch5. reinstall the device drivers (4.90)6. compared the advanced settings in the driver to the other daughter cards on another hostI've attached a snapshot and would appreciate any help.Thanks 290Views Tags: none (add) This content has been marked as final. Show 1 reply Re: Modify QP error (HCA reset) pnot Mar 9, 2015 8:28 PM (in response to pnot) I finally figured this one out ....I have a C6100 with 4 nodes and have the dual port daughter cards installed. When I purchased the server I ran through each node and updated the firmware to 2.10.720 from the 2.7... well one node was missed and that was my issue.Microsoft base drivers dated from 2013 would show the cards online but RDMA capable, via PowerShell, was false. This immediately prompted me to check the firmware version as I initially had to update the firmware for RDMA to work.consider this one closed. Like Show 0 Likes(0) Actions Actions More Like This Retrieving data ... © 2016 Jive Software | Powered by Jive SoftwareHome | Top of page | HelpJive Software Version: 2016.2.5.1, revision: 20160908201010.1a61f7a.hotfix_2016.2.5.1
[ subject ] [ author ] Thanks for sending this out. We'll try to reproduce this issue and see how best to resolve it. In the meantime it should be safe for you to continue using MV2_USE_MPIRUN_MAPPING=0 as a work around. On Tue, Jul 28, 2015 at 11:30 AM Martin Pokorny