Multiple iSCSI volumes, connection bounces if one connects/disconnects

mloebl · Feb 18, 2013

Sorry for the awful title... but here's the issue. I'm running FreeNAS 8.3 x64 and seeing an issue with an iSCSI connection bouncing. I've got a single portal, two volumes. Volume1 has machine1 connected to it, Volume2 has machine2 connected to it. I noticed the iSCSI connection on machine 1 kept bouncing what I thought was randomly. After doing some digging, looks like anytime machine 2 is turned on/off, suspended or woken up, it causes the connection on volume1 to bounce. Machine 1 is a Windows 7 box (always on), machine 2 is an Ubuntu 12.10 that is off usually during the day. I don't see anything unsual in the FreeNAS logs other than I see machine1 reconnecting every time I do something with machine2. I'm open to ideas as I'm not an iSCSI expert, so possible I have a mistake in here somewhere...

Thank you,

-Mike

cyberjock · Feb 18, 2013

Can you provide ALOT more info. Like your version of FreeNAS, hardware on the server, etc? Kinda hard to help you if you're just saying "its broke". :)

mloebl · Feb 19, 2013

Sorry, guess this cold has messed me up more than I thought :)

FreeNAS-8.3.0-RELEASE-x64 (r12701M)
Intel(R) Core(TM) i3-2120T CPU @ 2.60GHz
16GB Ram
LSI MegaRAID 9260CV-8i Controller Card (latest firmware 12.12.0-0139)
Running 2 native RAID containers (non-ZFS), one RAID 5+1, other RAID 1
Dual Intel GB NIC using Road Robin LAGG for data side (isolated network), 192.168.50.2
Marvel GB NIC for management network

Portal -
192.168.50.2:3260

Target config I believe pretty much stock to FreeNAS wiki example with no encryption or CHAP secret

Initiators -
1, ALL, 192.168.50.1/32
2, ALL, 192.168.50.3/32

Targets -
Volume01, Init group 1, Portal 1, Auth Group 1, Auto Auth
Volume02, Init group 2, Portal 1, Auth Group 2, Auto Auth

Associated Targets -
Volume01:Volume01
Volume02:Volume02

Windows 7 Box (Machine 1)
Intel Dual NIC, LACP, 192.168.50.1

Ubuntu 12.10 Box (Machine 2)
Intel Dual NIC, LACP, 192.168.50.3

Example Log output from FreeNAS log when Machine 2 powers up and kicks Machine 1:
Feb 19 08:41:59 NAS01 istgt[6781]: Login from iqn.1991-05.com.microsoft:mikehome._____.local (192.168.50.1) on iqn.2011-03.nas01._____.local:volume01 LU1 (192.168.50.2:3260,1), ISID=400001370000, TSIH=8, CID=1, HeaderDigest=off, DataDigest=off
Feb 19 08:42:00 NAS01 istgt[6781]: Login from iqn.1993-08.org.debian:01:16631398338e (192.168.50.3) on iqn.2011-03.nas01._____.local:volume02 LU2 (192.168.50.2:3260,1), ISID=23d010000, TSIH=5, CID=0, HeaderDigest=off, DataDigest=off
Feb 19 08:42:00 NAS01 istgt[6781]: istgt_lu_disk.c:6737:istgt_lu_disk_execute: ***ERROR*** unsupported SCSI OP=0x85
Feb 19 08:42:00 NAS01 last message repeated 2 times
Feb 19 08:42:46 NAS01 istgt[6781]: Login from iqn.1991-05.com.microsoft:mikehome._____.local (192.168.50.1) on iqn.2011-03.nas01._____.local:volume01 LU1 (192.168.50.2:3260,1), ISID=400001370000, TSIH=9, CID=1, HeaderDigest=off, DataDigest=off
Feb 19 08:42:47 NAS01 istgt[6781]: Login from iqn.1993-08.org.debian:01:16631398338e (192.168.50.3) on iqn.2011-03.nas01._____.local:volume02 LU2 (192.168.50.2:3260,1), ISID=23d010000, TSIH=6, CID=0, HeaderDigest=off, DataDigest=off
Feb 19 08:42:51 NAS01 kernel: GEOM: mfid0: the secondary GPT header is not in the last LBA.
Feb 19 08:42:52 NAS01 istgt[6781]: Login from iqn.1991-05.com.microsoft:mikehome._____.local (192.168.50.1) on iqn.2011-03.nas01._____.local:volume01 LU1 (192.168.50.2:3260,1), ISID=400001370000, TSIH=10, CID=1, HeaderDigest=off, DataDigest=off

The Windows logs in Machine 1 are at this time filled with messages saying it was disconnected from it's iSCSI device, and then reconnects. I initially suspected something weird networking wise, also monitoring log on switch, I see the LACP group start for machine 2 as it powers on, but LACP group for machine 1 and Lagg group for NAS01 do not cycle or bounce network wise so they appear ok. I would run LACP on FreeNAS as well, but have this issue, hence only round robin.

Thanks!

-Mike

mloebl · Feb 19, 2013

I should also note it has been working VERY stably now for about a year until I recently added the second volume for Machine 2.

-Mike

mloebl · Mar 11, 2013

*BUMP*

cyberjock · Mar 11, 2013

When you say native RAID controllers, but not ZFS, are you saying the partitions are UFS? Or are they still ZFS but acting like a single disk because of the hardware RAID?

mloebl · Mar 11, 2013

Thanks for the response; I created the volumes thru the RAID controller directly. I believe they should be UFS formatted as I specifically did not do any ZFS knowing ZFS + HW RAID can be taboo. They are disks mfid0 (Volume01) and mfid1 (Volume02).

Thank you,

-Mike

cyberjock · Mar 11, 2013

Well, since you did know better than to use ZFS with Hardware RAID my first 2 guesses would be that either you accidentally used ZFS anyway and you are suffering from the "ZFS with iscsi doesn't go well together in some situations" or you are using UFS and you may have a disk failing in your array. Have you done any SMART tests on your drives lately or checked out the SMART data on your drives?

Got2GoLV · Mar 11, 2013

I would disable LACP first and test that way...then go from there.
(Simplify)

cyberjock · Mar 11, 2013

Got2GoLV said:
I would disable LACP first and test that way...then go from there.
(Simplify)

Oh, I didn't notice the LACP. Yeah, I'd disable that too.

mloebl · Mar 12, 2013

Drives look good and strangely again only happens with machine 2 connects/disconnects to/from FreeNAS. I'll try disabling LACP on the network interfaces and see what it does tonight and let you guys know.

My other thought is could it be something with resetting since it's getting invalid queries (i.e. the 0x85 ATA passthrough support query)? Not sure which log that may show up in if a service is resetting when this happens.

Thanks!

-Mike

mloebl · Apr 5, 2013

Finally found the issue, you guys were right, I realized LACP was still enabled on my Ubuntu nics. Disabled it, and haven't had any problems. Talking to my buddy who used to work for a storage company, and he's wondering if my TP-LINK switch with LACP enabled maybe crashing the TCP stack somewhere.

I'm going to see about swapping out the switch to one of the Cisco SG200 switches and see if it helps...

Thanks again!

-Mike

Important Announcement for the TrueNAS Community.

Multiple iSCSI volumes, connection bounces if one connects/disconnects

mloebl

Dabbler

cyberjock

Inactive Account

mloebl

Dabbler

mloebl

Dabbler

mloebl

Dabbler

cyberjock

Inactive Account

mloebl

Dabbler

cyberjock

Inactive Account

Got2GoLV

Dabbler

cyberjock

Inactive Account

mloebl

Dabbler

mloebl

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Multiple iSCSI volumes, connection bounces if one connects/disconnects

Dabbler

Inactive Account

Dabbler

Dabbler

Dabbler

Inactive Account

Dabbler

Inactive Account

Dabbler

Inactive Account

Dabbler

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Multiple iSCSI volumes, connection bounces if one connects/disconnects"

Similar threads