Can't find zvol from ESXi 6.5

Joined
Oct 2, 2019
Messages
9
Okay, where to begin... there are so many things on fire.

Basic background - I show up to solve problems because this company has suffered high turnover in system/network admins for the past few years. Basically no one documented anything useful and so no one really knew how anything work and just duct-taped and baling-wired stuff on top of existing systems.

I am not a networking or storage genius. I'm figuring most of this out as I go along. I know VMware decently well and know my way around most tools ranging from software to hammer drills. (Hence "JackAlltrades".) Put it this way - I own my own hardhat and I'm not afraid of Linux.

We have over a dozen ESXi hosts that are (thanks to me) now centrally-managed across two sites with a pair of linked vCenters with SSO and all the trimmings. We also have an aging Dell EqualLogic with about 17TB in RAID 6 with a dead drive. (yay) This is set up on its own switch which is a pair of stacked Dell Force10 S25 (stacking cables in the back) on a 172.31.0.0/24 storage network. Everything runs via 1Gb/s Ethernet - the SAN and the hosts mostly run 4 cables each to this switch.

The new SAN is a Dell PowerEdge R510 with 128GB RAM and 12x 3TB 7200 disks in RAID-z2 running FreeNAS 11.2-U6. (Probably not the "best" config, but there's a reason for it based on my admittedly limited understanding of ZFS.)

Critical VMs run on local datastores in specific hosts with RAID'd SSD drives. Non-critical VMs run mostly on the old SAN or on HDD local datastores on older hosts.

The new SAN is intended to take over for the old one so that it can be safely "reconditioned" and put back into service as a backup datastore. (There is a frankly terrifying lack of backups right now.) It doesn't have to be fast, it needs to be big and reasonably resilient. Hence z2. This will be a production environment, but of non-critical systems that A: should be backed up to the old SAN and B: can be rebuilt from templates with unfortunate but acceptable downtime. We have other solutions in place for things that need fast disk.


The Problem:
I followed the guides I found online for connecting ESXi hosts to iSCSI shares on ZFS. There's a bunch, they're all basically the same, and I hit the same problem no matter what I do: the hosts do not find the zvol datastore when I scan.

I set an Ethernet interface to the 172.16.1.0/24 LAN for management, and one to the 17.31.0.0/24 LAN for storage, and connect them to the correct switches. Since I already have these ESXi hosts connected to another SAN on the same network I should just need to add the FreeNAS storage interface's IP to the Dynamic Discovery list, right? I've added additional hosts to the existing old SAN - set up the virtual switches, done the network port binding, etc. and that's worked. But none of them see the FreeNAS datastore.

I tried moving it to a different network: 172.16.1.0/24 and matching a Port Group in ESXi to it, still nothing.

I tried connecting to a separate switch I found laying around and reset to be completely flat (Cisco Catalyst 2960G). Nothing.

I plug my Surface into that switch and scan the network and there's FreeNAS, right where it should be, answering the port scan.

I'm at a loss and I assume making some stupid, simple mistake. Here's the settings:
17.57TiB zvol: "vmware-target"

Block iSCSI:
Target Global Configuration: left it alone, it set a Base Name, ISNS Servers blank, Pool Available Space Threshold blank.
Portals: 0.0.0.0:3260, Discovery Auth Method - None, Discovery Auth Group - None (just want to get it working for now, we can lock it down later.)
Initiators: ALL/ALL
Authorized Acces: blank
Targets: vmware-target - Portal Group ID-1, Initator Group ID-2 (the ALL/ALL one), Auth Method - None, Authentication Group number - None
Extents: name - vmware-extant, Extent type - Device, Device FreePool/FreeNAS (17.5T), Serial - random, Logical block size - 512, Enable TPC - checked, LUN RPM - SSD, everything else blank.
Associated Targets: target - vmware-target, LUN - 0, Extent - vmware-exent


iSCSI Services are turned on and set to autostart.


At this point I've pretty much hit a wall and I'm just trying random changes to settings. I need some guidance on what to try next.

Once the new SAN is up and we can migrate the VMs to it I can finally start fixing the other giant fires in this server room (particularly the problem that there's no backups to speak of, or anywhere to store them.)
 
Joined
Oct 2, 2019
Messages
9
With regards to the critical hosts, the plan is to utilize the replication side of Veeam to replicate them to the old (future backup) SAN so that if we lose a host or its datastore we can launch the synchronized replicants from that datastore with minimal downtime (we don't need HA, just "pretty quick".) Everything else will get backed up by Veeam to the old SAN. A few "disposable" hosts are being reconfigured to go fast (RAID-0 HDDs) for software guys to stress test VMs they blow away when they're done anyway, so they'll have speed without using one of the critical production hosts (with the SSDs) where resiliency doesn't matter. If those arrays croak we just pull the dead drive and start over. They only lose the time it takes to rebuild the VM from a template.
 
Joined
Oct 2, 2019
Messages
9
So I decided to try something else and I got it to work.

We have a couple hosts that aren't using every network connection they have, so I plugged in one cable to the open ports on each host and then into the "test" switch along with one cable from the FreeNAS.

Set them up in ESXi and it worked.

The question is, shouldn't I be able to put multiple SANs on the same network/switch and have the hosts see them both from the same iSCSI interfaces? Or does every SAN require its own network and its own host interfaces (because that doesn't really sound scale-able to me).

Also, we need a lot of other hosts connected and that means either getting them to authorize buying more NICs or reassigning some of the available interfaces if we can't "double-up" on the SANs.

I'd like to get this working as simply as possible with as much performance as I can given the limited hardware and non-existent budget - I bought the R510 running FreeNAS with my own money because I'm called in to solve problems, not sit around waiting for bean-counters to authorize an expense. I've since gotten authorization to expense it back to the company but as far as I was concerned I was going to either end up doing that or taking home an absolutely kick-ass Plex Media Server.

I have a few SFP+ cards at my disposal, and if I can figure out how to enable those ports on the switches I'll have at least some 10Gb/s (the new SAN and the two main production hosts). But that raises the question: given the speed limits of the physical disks themselves, how many/fast connections do we really need between these hosts and the SANs?
 
Top