ESXi 5.5 host reboots, doesn't mount FreeNAS iSCSI target

Status
Not open for further replies.

paulatmig

Dabbler
Joined
Jul 14, 2014
Messages
41
Poking around the forums I didn't see anyone else having this particular issue, so I think I'm just missing something completely. After rebooting the ESXi hosts, they can see two LUNs - which is expected and good, but doesn't mount them. I have to manually go back in and add the storage via esxcfg-volume -M ... etc. and that works fine, no errors or anything.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
Was this host running FreeNAS 9.2 before? If yes, then this issue may be consequence of changed LUN IDs in FreeNAS 9.3, that makes VMware think that it sees snapshot instead of real volume. I don't know how to fix it easier, but at least one user reported that problem has gone after he created new zvols/LUNs, moved VMs there and deleted the old ones.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
The proper way to handle this is to completely disconnect ESXi from the iSCSI device. Then reconnect it after rebooting the ESXi host.

The issue seems to be with ESXi as it is the only platform to get upset. But I can't tell you more than that as ESXi isn't exactly going to share code with iXsystems ;)
 

paulatmig

Dabbler
Joined
Jul 14, 2014
Messages
41
One other thing I saw, briefly on my cellphone on Friday and can no longer find the link for, was a VMWare KB about vmfs corruption (or something like that) and repairing/rebuilding the volumes. Since, thank goodness, this system isn't in production yet that's something I can do. I'd already totally disconnected the storage from the ESXi hosts earlier, so I'm starting to think it's the file system itself. I could, and probably should, just nuke it. Will update shortly, thanks!
 

paulatmig

Dabbler
Joined
Jul 14, 2014
Messages
41
Okay, digging further in and looking at the vmkernel logs, I see these:

2015-04-21T14:11:43.080Z cpu16:32784)WARNING: LinScsi: SCSILinuxProcessCompletions:773: Error BytesXferred > Requested Length Marking transfer length as 0 - vmhba = vmhba1, Driver Name = hpvsa, Requested length = 512, Resid = 4080

3015-04-21T14:37:26.560Z cpu10:37897 opID=baa5e653)Vol3: 716: Couldn't read volume header from control: Not supported
2015-04-21T14:37:26.560Z cpu10:37897 opID=baa5e653)Vol3: 716: Couldn't read volume header from control: Not supported
2015-04-21T14:37:26.560Z cpu10:37897 opID=baa5e653)FSS: 5091: No FS driver claimed device 'control': Not supported
2015-04-21T14:37:26.602Z cpu10:37897 opID=baa5e653)FSS: 5091: No FS driver claimed device 't10.F reeBSD_iSCSI_Disk______90e2ba47af78002_________________:1': Not supported

2015-04-21T14:37:25.975Z cpu6:33007 opID=ccc2e5f9)ScsiUid: 153: vmhba37:C3:T0:L1: Invalid IQN SCSI Name String iqn.2000-03.com.migcom.istgt:caber-emerald-nas,lun,1. Identifiers of this type are not supported by this version of ESX
2015-04-21T14:37:25.975Z cpu8:33010 opID=ccc2e5f9)WARNING: ScsiUid: 411: vmhba37:C3:T0:L2: NAA identifier type has an unknown naa value of 0x3

Doing a quick Google search, I see there's a bug listed for 9.3 - #7252. I can't view the bug (asking for a login), but if anyone else can could you post it up? Thanks!
 

paulatmig

Dabbler
Joined
Jul 14, 2014
Messages
41
Also dug around and found the bit about the block size - fixed that, going to try again. Finding all sorts of little problems the more I dig!
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Was this host running FreeNAS 9.2 before? If yes, then this issue may be consequence of changed LUN IDs in FreeNAS 9.3, that makes VMware think that it sees snapshot instead of real volume. I don't know how to fix it easier, but at least one user reported that problem has gone after he created new zvols/LUNs, moved VMs there and deleted the old ones.

Experienced this same problem during the 9.2 to 9.3 upgrade, it has to do with the new LUN IDs as a result of changing from istgt -> CTL.

@paulatmig - here's the Google cache of the bug page. http://webcache.googleusercontent.c...bsd.org/issues/7252+&cd=5&hl=en&ct=clnk&gl=ca

Brandt Winchell has it nailed: "ESXi is finding a VMFS volume with an existing signature but is being presented from a different controller (what ESXi thinks due to the upgrade process of FreeNAS)" and it thinks it's seeing a snapshot.

You can resignature and reimport your VMs but this is definitely a major issue on upgrades.
 

paulatmig

Dabbler
Joined
Jul 14, 2014
Messages
41
Noticed the identification labels prefixed "snap-", so that makes sense. I read about the re-signaturing of the volumes via VMWare's KB late last week and tried their method to re-signature, but that didn't seem to fix the issue. Since I only had the one VM in place, I just wiped it and re-imported the datastores. Will note this for the next major upgrade, as that'll be a huge problem once I get this system into production.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
It shouldn't be an issue unless they switch target layers again, or something in the update causes CTL to re-ID the LUNs.

But my attitude towards major upgrades is the same as most extreme-sports enthusiasts:

"Looks safe to me ... You go first."
 

shaithis

Cadet
Joined
Jan 27, 2013
Messages
5
I think it's the same issue I've been having for a while......I've been manually working around it by SSHing to each host and doing

esxcfg-volume -l (to get the volume GUIDs)


and then
esxcfg-volume -M <Volume GUID>

Would be nice to get a proper solution......I can also confirm the problem still exists in ESX 6
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Here's what I know about this issue...

There's a few users who imported LUNs as snapshots (as shown above).

If you happened to have used 9.2.1.x and enabled the easter egg for CTL iSCSI on 9.2.1.x and mounted the LUNs as snapshots on ESXi and potentially some other component we aren't 100% sure of because of the small sample size, it seems that it may have had a bug or something that is forcing a resignature to be required.

The resignaturing is the proper procedure (technically the only known procedure aside from wiping the LUN and recreating it) to recover from this scenario. Instructions can be found in VMWare's website.

Unfortunately this seems to be where we as a community hit an impasse. I'm not sure what other machinations may be involved, but this also prevents you from expanding the vmfs partition after expanding a LUN. So I have no doubt that as time goes on there will be a small number of people that will, on occasion, see this problem for the foreseeable future.
 
Status
Not open for further replies.
Top