Multiple iSCSI volumes, connection bounces if one connects/disconnects

Status
Not open for further replies.

mloebl

Dabbler
Joined
Mar 5, 2012
Messages
16
Sorry for the awful title... but here's the issue. I'm running FreeNAS 8.3 x64 and seeing an issue with an iSCSI connection bouncing. I've got a single portal, two volumes. Volume1 has machine1 connected to it, Volume2 has machine2 connected to it. I noticed the iSCSI connection on machine 1 kept bouncing what I thought was randomly. After doing some digging, looks like anytime machine 2 is turned on/off, suspended or woken up, it causes the connection on volume1 to bounce. Machine 1 is a Windows 7 box (always on), machine 2 is an Ubuntu 12.10 that is off usually during the day. I don't see anything unsual in the FreeNAS logs other than I see machine1 reconnecting every time I do something with machine2. I'm open to ideas as I'm not an iSCSI expert, so possible I have a mistake in here somewhere...

Thank you,

-Mike
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Can you provide ALOT more info. Like your version of FreeNAS, hardware on the server, etc? Kinda hard to help you if you're just saying "its broke". :)
 

mloebl

Dabbler
Joined
Mar 5, 2012
Messages
16
Sorry, guess this cold has messed me up more than I thought :)

FreeNAS-8.3.0-RELEASE-x64 (r12701M)
Intel(R) Core(TM) i3-2120T CPU @ 2.60GHz
16GB Ram
LSI MegaRAID 9260CV-8i Controller Card (latest firmware 12.12.0-0139)
Running 2 native RAID containers (non-ZFS), one RAID 5+1, other RAID 1
Dual Intel GB NIC using Road Robin LAGG for data side (isolated network), 192.168.50.2
Marvel GB NIC for management network

Portal -
192.168.50.2:3260

Target config I believe pretty much stock to FreeNAS wiki example with no encryption or CHAP secret

Initiators -
1, ALL, 192.168.50.1/32
2, ALL, 192.168.50.3/32

Targets -
Volume01, Init group 1, Portal 1, Auth Group 1, Auto Auth
Volume02, Init group 2, Portal 1, Auth Group 2, Auto Auth

Associated Targets -
Volume01:Volume01
Volume02:Volume02


Windows 7 Box (Machine 1)
Intel Dual NIC, LACP, 192.168.50.1

Ubuntu 12.10 Box (Machine 2)
Intel Dual NIC, LACP, 192.168.50.3

Example Log output from FreeNAS log when Machine 2 powers up and kicks Machine 1:
Feb 19 08:41:59 NAS01 istgt[6781]: Login from iqn.1991-05.com.microsoft:mikehome._____.local (192.168.50.1) on iqn.2011-03.nas01._____.local:volume01 LU1 (192.168.50.2:3260,1), ISID=400001370000, TSIH=8, CID=1, HeaderDigest=off, DataDigest=off
Feb 19 08:42:00 NAS01 istgt[6781]: Login from iqn.1993-08.org.debian:01:16631398338e (192.168.50.3) on iqn.2011-03.nas01._____.local:volume02 LU2 (192.168.50.2:3260,1), ISID=23d010000, TSIH=5, CID=0, HeaderDigest=off, DataDigest=off
Feb 19 08:42:00 NAS01 istgt[6781]: istgt_lu_disk.c:6737:istgt_lu_disk_execute: ***ERROR*** unsupported SCSI OP=0x85
Feb 19 08:42:00 NAS01 last message repeated 2 times
Feb 19 08:42:46 NAS01 istgt[6781]: Login from iqn.1991-05.com.microsoft:mikehome._____.local (192.168.50.1) on iqn.2011-03.nas01._____.local:volume01 LU1 (192.168.50.2:3260,1), ISID=400001370000, TSIH=9, CID=1, HeaderDigest=off, DataDigest=off
Feb 19 08:42:47 NAS01 istgt[6781]: Login from iqn.1993-08.org.debian:01:16631398338e (192.168.50.3) on iqn.2011-03.nas01._____.local:volume02 LU2 (192.168.50.2:3260,1), ISID=23d010000, TSIH=6, CID=0, HeaderDigest=off, DataDigest=off
Feb 19 08:42:51 NAS01 kernel: GEOM: mfid0: the secondary GPT header is not in the last LBA.
Feb 19 08:42:52 NAS01 istgt[6781]: Login from iqn.1991-05.com.microsoft:mikehome._____.local (192.168.50.1) on iqn.2011-03.nas01._____.local:volume01 LU1 (192.168.50.2:3260,1), ISID=400001370000, TSIH=10, CID=1, HeaderDigest=off, DataDigest=off

The Windows logs in Machine 1 are at this time filled with messages saying it was disconnected from it's iSCSI device, and then reconnects. I initially suspected something weird networking wise, also monitoring log on switch, I see the LACP group start for machine 2 as it powers on, but LACP group for machine 1 and Lagg group for NAS01 do not cycle or bounce network wise so they appear ok. I would run LACP on FreeNAS as well, but have this issue, hence only round robin.


Thanks!

-Mike
 

mloebl

Dabbler
Joined
Mar 5, 2012
Messages
16
I should also note it has been working VERY stably now for about a year until I recently added the second volume for Machine 2.

-Mike
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
When you say native RAID controllers, but not ZFS, are you saying the partitions are UFS? Or are they still ZFS but acting like a single disk because of the hardware RAID?
 

mloebl

Dabbler
Joined
Mar 5, 2012
Messages
16
Thanks for the response; I created the volumes thru the RAID controller directly. I believe they should be UFS formatted as I specifically did not do any ZFS knowing ZFS + HW RAID can be taboo. They are disks mfid0 (Volume01) and mfid1 (Volume02).

Thank you,

-Mike
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, since you did know better than to use ZFS with Hardware RAID my first 2 guesses would be that either you accidentally used ZFS anyway and you are suffering from the "ZFS with iscsi doesn't go well together in some situations" or you are using UFS and you may have a disk failing in your array. Have you done any SMART tests on your drives lately or checked out the SMART data on your drives?
 

Got2GoLV

Dabbler
Joined
Jun 2, 2011
Messages
26
I would disable LACP first and test that way...then go from there.
(Simplify)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I would disable LACP first and test that way...then go from there.
(Simplify)

Oh, I didn't notice the LACP. Yeah, I'd disable that too.
 

mloebl

Dabbler
Joined
Mar 5, 2012
Messages
16
Drives look good and strangely again only happens with machine 2 connects/disconnects to/from FreeNAS. I'll try disabling LACP on the network interfaces and see what it does tonight and let you guys know.

My other thought is could it be something with resetting since it's getting invalid queries (i.e. the 0x85 ATA passthrough support query)? Not sure which log that may show up in if a service is resetting when this happens.

Thanks!

-Mike
 

mloebl

Dabbler
Joined
Mar 5, 2012
Messages
16
Finally found the issue, you guys were right, I realized LACP was still enabled on my Ubuntu nics. Disabled it, and haven't had any problems. Talking to my buddy who used to work for a storage company, and he's wondering if my TP-LINK switch with LACP enabled maybe crashing the TCP stack somewhere.

I'm going to see about swapping out the switch to one of the Cisco SG200 switches and see if it helps...

Thanks again!

-Mike
 
Status
Not open for further replies.
Top