SOLVED FreeNAS iSCSI read speed slow (15-30MB/s)

Incremental

Cadet
Joined
Feb 25, 2019
Messages
5
I've been banging my head against the wall trying to figure this out but I've run out of things to try. Hopefully someone has an idea of what it could be. This is a system built by ixsystems.

FreeNAS 11.1-U7
64GB of RAM
Twelve 12TB 7200 RPM disks in raidz2 config with two vdevs, 84TB
10Gbps Intel NIC pre-installed (X540, I think)

So the gist of it is that I cannot get this thing to continuous read data from the array (using either Windows or FastCopy) at more than about 30MB/s from a server running Server 2016 (or 2012R2 for that matter, which I also tried). This system is used as a storage for backups so the only thing that really matters is actual hardware read/write speed. ARC does nothing for me. Netdata shows my cache misses are close to 100% most of the time. This is because when something is written to disk, it's probably not going to be read again for a long time. So ultimately, what I care about is how fast the actual hardware and ZFS can deliver the data. No trickery.

The only reason I even care about the read speed is when I actually need to restore data. We had a client with an on-prem SAN experience simultaneous two-drive failure (didn't believe it either but that's what the logs showed). The part you'll really get a kick out of is that the SAN had two arrays on it: a RAID 5 and a RAID 10. You'll never guess which one failed.... (hint: it wasn't the RAID 5, surprising, right?). Anyway, they opted to have us move the downed server to our datacenter so I started copying the data off of the server that this freenas is connected to (which is actually on ESXi 6.7 but to keep things simple, I've ruled that all out as you'll see below).

It took about 6 hours to copy roughly 500GB (to a drive connected to a USB 3.0 interface). That seemed pretty slow. I took the backup to our datacenter and restored the backup to a new VM and that whole process took a little over an hour.... hmm.. Reading from one external USB drive was much faster than reading from the FreeNAS? Something must be going on.

Some time later, I decided to copy another backup chain off of the FreeNAS. A much bigger one (over 1TB). The estimate finally settled down to around 15 hours. That was just way too long. Luckily, we really believe in backups. So we ALSO backup everything on this FreeNAS to an older QNAP once a week. I wondered if I could get the data off of the QNAP any quicker. I plugged in my USB drive and copied that exact same 1TB+ server off in a little over 2 hours (!!). The QNAP by comparison, is running a single RAID6 array. It shouldn't be faster. Especially not to this degree.

So I've been troubleshooting and I've even gone as far as building a new physical server running Server 2016 with a Intel X550 10Gbps adapter and attaching it directly to the FreeNAS with a crossover cable and the performance does not change. BTW, my tests at this point, are always using FastCopy and I copy a random 3-4GB backup file (I never use the same one) to a RAM drive. If I do try to copy a file and then delete the RAM drive and copy the same file again I get a crazy fast speed. So fast, the copy is over in about a second. I think this is just ARC doing its thing and proving that the network connection is OK.

So it's gotta be the hard drives, right? You'd think so BUT, I can plug my laptop into the switch (or using a crossover direct to the FreeNAS) and using Windows 10's iSCSI Initiator get 100MB/s!! (limited only by the 1Gbps adapter in my laptop) But a clean, updated, install of Server 2016 is slow even when plugged in to the same exact crossover cable I used with the laptop. I've tried using the 1Gbps adapter on the motherboard of the server (also Intel) and I get the same poor performance.

It seems as though there is something holding the FreeNAS back. It's waiting for something. I've tried setting delayed ack off but it had minimal impact. Plus, I am not convinced it is a network issue, especially since I've tried numerous network cards and driver versions. I will say that the laptop that works is actually running an older driver than anything else I have. I tried comparing the advanced settings of that adapter with the newly built 2016 server and the settings don't completely compare. Some are the same but the notebook has many more options than the server (seems backwards, I know).

Oh, and I also have another almost identical (just half as much storage) FreeNAS (built by ixsystems) at our datacenter and I don't appear to have the same problem with that. I'm using the same Intel X550 NICs in the hosts. I've configured both with the same tunables, as recommended on this forum. Those settings seem to work fine at the datacenter, but not at our office. I had tried to turn on jumbo frames on the one at the office, but have since turned all of that off while troubleshooting. Jumbo is working fine at the datacenter, though, using identical switching (not that that matters since the problem exists even with crossover cable).

Any ideas?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
You might want to try this utility to test the performance of the drives. It will show you if any of them are under performing. A single slow drive in a RAIDz2 vdev can slow the vdev, which will slow the pool. I had a system where three drives were running slow and after I replaced them, it more than doubled the performance of the pool.

solnet-array-test (for drive / array speed) non destructive test
https://forums.freenas.org/index.php?resources/solnet-array-test.1/
 

Incremental

Cadet
Joined
Feb 25, 2019
Messages
5
So I ran the benchmark (twice) and the drives seem to be working normally . This is the result from the second test:

Performing initial serial array read (baseline speeds)
Tue Feb 26 13:07:08 PST 2019
Unable to determine disk da23 size from dmesg file (not fatal but odd!)
Tue Feb 26 13:34:12 PST 2019
Completed: initial serial array read (baseline speeds)
Array's average speed is 238.05 MB/sec per disk
Disk Disk Size MB/sec %ofAvg
------- ---------- ------ ------
da12 11444224MB 237 100
da13 11444224MB 243 102
da14 11444224MB 235 99
da15 11444224MB 255 107 ++FAST++
da16 11444224MB 239 100
da17 11444224MB 242 102
da18 11444224MB 234 98
da19 11444224MB 232 97
da20 11444224MB 233 98
da21 11444224MB 234 98
da22 11444224MB 236 99
da23 0MB 236 99
Performing initial parallel array read
Tue Feb 26 13:34:12 PST 2019
The disk da12 appears to be 11444224 MB.
Disk is reading at about 238 MB/sec
This suggests that this pass may take around 803 minutes
Serial Parall % of
Disk Disk Size MB/sec MB/sec Serial
------- ---------- ------ ------ ------
da12 11444224MB 237 237 100
da13 11444224MB 243 243 100
da14 11444224MB 235 233 100
da15 11444224MB 255 255 100
da16 11444224MB 239 239 100
da17 11444224MB 242 243 100
da18 11444224MB 234 234 100
da19 11444224MB 232 232 100
da20 11444224MB 233 232 100
da21 11444224MB 234 234 100
da22 11444224MB 236 236 100
da23 0MB 236 236 100
Awaiting completion: initial parallel array read
 

Incremental

Cadet
Joined
Feb 25, 2019
Messages
5
I have a question that I can't seem to find the answer for. You'll notice in the above results that the drive numbers start at 12 and end at 23. If I look at netdata, it shows disks 0 through 11 also exist, they're just inactive (mostly?). I only have 12 physical disks, so why does it look like I have 24? Is this a "feature" or a problem?
 

Incremental

Cadet
Joined
Feb 25, 2019
Messages
5
Oh, and if I boot from various disk utilities (such as gparted live), they also show 24 disks are present in the system. This is on a pre-configured ixsystems FreeNAS with a SuperMicro X10SRH-CLN4F with on-board LSI/Broadcom 3008 Sata3 controller. It would seem that the controller is presenting 24 disks to the operating system when there are actually only 12.
 

Incremental

Cadet
Joined
Feb 25, 2019
Messages
5
So, in case anyone else runs into this, I figured out why there are 24 disks instead of 12 and it was a rather simple problem: the server had been built incorrectly (from ixsystems). In fact, we have two of these systems sold to us at different times (about a 6-8 months apart) and it looks like they were both built incorrectly. Which leads me to believe they've been building them all incorrectly for some period of time and perhaps others haven't realized this yet?

The onboard LSI 3008 adapter on the motherboard has two internal headers. Both were plugged in to the same drive cage. In the BIOS of the adapter, you could see the adapter listed twice and under each adapter it showed all twelve drives. I removed the extra cable connecting the second header to the drive cage and now I see only 12 drives in gparted, operating systems, etc.

I've been trying to figure out if there was meant to be a reason for this. Redundancy? I don't know every nuance of FreeNAS but I don't see how you're supposed to configure an entire second set of hard drives to failover if the first set become inaccessible. If there is a way, maybe that just wasn't configured and is the source of my problem? And if it was for redundancy, it would seem to be a false sense of it. Even though there are two headers, there's really only one adapter. If that adapter fails, both connectors fail. For proper redundancy, there should be a second adapter in a slot.

My other theory is that the system builder misunderstood the markings on the motherboard. There are ranges next to each header. I don't remember the exact wording but it was something like '0-3' and '4-7'. Those could have been interpreted as, "I need one half of the drives attached to the first header and the second half to the second header." In fact, that's how the cables were connected. But there are multiple ports on the drive cage and it seems that every port can access every drive in the whole cage, so only one cable was needed. Maybe those number ranges make more sense for a different drive cage?

So anyway, after I removed the unnecessary cable and reinstalled FreeNAS, my original problem with the performance vanished. I can't say with 100% certainty if that was the cause, though, because I never tried a wipe and rebuild of the volume before fixing the cabling. I HAD tried a clean reload of just the FreeNAS OS but in that attempt I simply imported the existing volume. That had no effect. So, it is possible that a rebuild of the volume may have solved it without fixing the cabling; I just never tried that. The possible proof that it isn't related is that the other identical FreeNAS we have that still has the incorrect cabling did not have the same performance issue. I can now get the same performance numbers on both FreeNAS systems (when using the same RAID configuration, OS version, etc.).

However, during all of this troubleshooting, I also experimented by installing Server 2019 on the ixsystems hardware. Before I figured out the cabling problem, Server 2019 would show very strange behavior when trying to configure those drives. It also saw 24 drives, but you couldn't really use any of them. If you put 12 in a pool, suddenly another 12 would appear. And then sometimes drives would appear and disappear randomly. Once the extra cable was removed, everything behaved as it should.

With the cabling right, I started running some performance tests (fastcopy to RAM drive) with all sorts of array configurations (with both FreeNAS and Server 2019). Here's where it gets really interesting. With the tests I'm running, Server 2019 blows the doors off of FreeNAS! Like it's not even close. The same file copied to a RAM disk from a RAID 10 array with 12 drives gives me 1.3GB/s on the 2019 server but only 400MB/s on the FreeNAS (over 10Gbps iSCSI). This was running on the exact same hardware (specs listed at the start of this thread). I also rebooted the FreeNAS after copying the file to the volume to ensure there wasn't anything in the ARC (as far as I can tell there's minimal, if any, caching on Windows but I rebooted that as well). I didn't test the 2019 box with iSCSI because, I only really needed it on the FreeNAS since it wasn't Windows. With Windows as the OS, I can run my utilities directly on the storage server and shut down the VMware VM that I would otherwise need with FreeNAS. It's possible that iSCSI is limiting the speed to some degree, but a 300%+ hit?

Windows has several added advantages in my case. One of those is that it turns out it will save us money not having the pay the licensing costs on the VMs that I otherwise needed (as a service provider, we have to adhere to strict monthly reporting of usage to VMware). Another is that, since we're using these boxes to store backups, it's better to have the backups and recovery software separate from the live infrastructure and as easily accessible as possible when emergencies happen.

I also want to be clear that I'm not saying FreeNAS has no other uses. For instance, my priority in this case is on sequential read and write of large files, so that's the only thing I'm testing for. Other storage access patterns may be completely different and having a system with large amounts of read caching may be exactly what is needed.
 
Last edited:
Top