Incremental
Cadet
- Joined
- Feb 25, 2019
- Messages
- 5
I've been banging my head against the wall trying to figure this out but I've run out of things to try. Hopefully someone has an idea of what it could be. This is a system built by ixsystems.
FreeNAS 11.1-U7
64GB of RAM
Twelve 12TB 7200 RPM disks in raidz2 config with two vdevs, 84TB
10Gbps Intel NIC pre-installed (X540, I think)
So the gist of it is that I cannot get this thing to continuous read data from the array (using either Windows or FastCopy) at more than about 30MB/s from a server running Server 2016 (or 2012R2 for that matter, which I also tried). This system is used as a storage for backups so the only thing that really matters is actual hardware read/write speed. ARC does nothing for me. Netdata shows my cache misses are close to 100% most of the time. This is because when something is written to disk, it's probably not going to be read again for a long time. So ultimately, what I care about is how fast the actual hardware and ZFS can deliver the data. No trickery.
The only reason I even care about the read speed is when I actually need to restore data. We had a client with an on-prem SAN experience simultaneous two-drive failure (didn't believe it either but that's what the logs showed). The part you'll really get a kick out of is that the SAN had two arrays on it: a RAID 5 and a RAID 10. You'll never guess which one failed.... (hint: it wasn't the RAID 5, surprising, right?). Anyway, they opted to have us move the downed server to our datacenter so I started copying the data off of the server that this freenas is connected to (which is actually on ESXi 6.7 but to keep things simple, I've ruled that all out as you'll see below).
It took about 6 hours to copy roughly 500GB (to a drive connected to a USB 3.0 interface). That seemed pretty slow. I took the backup to our datacenter and restored the backup to a new VM and that whole process took a little over an hour.... hmm.. Reading from one external USB drive was much faster than reading from the FreeNAS? Something must be going on.
Some time later, I decided to copy another backup chain off of the FreeNAS. A much bigger one (over 1TB). The estimate finally settled down to around 15 hours. That was just way too long. Luckily, we really believe in backups. So we ALSO backup everything on this FreeNAS to an older QNAP once a week. I wondered if I could get the data off of the QNAP any quicker. I plugged in my USB drive and copied that exact same 1TB+ server off in a little over 2 hours (!!). The QNAP by comparison, is running a single RAID6 array. It shouldn't be faster. Especially not to this degree.
So I've been troubleshooting and I've even gone as far as building a new physical server running Server 2016 with a Intel X550 10Gbps adapter and attaching it directly to the FreeNAS with a crossover cable and the performance does not change. BTW, my tests at this point, are always using FastCopy and I copy a random 3-4GB backup file (I never use the same one) to a RAM drive. If I do try to copy a file and then delete the RAM drive and copy the same file again I get a crazy fast speed. So fast, the copy is over in about a second. I think this is just ARC doing its thing and proving that the network connection is OK.
So it's gotta be the hard drives, right? You'd think so BUT, I can plug my laptop into the switch (or using a crossover direct to the FreeNAS) and using Windows 10's iSCSI Initiator get 100MB/s!! (limited only by the 1Gbps adapter in my laptop) But a clean, updated, install of Server 2016 is slow even when plugged in to the same exact crossover cable I used with the laptop. I've tried using the 1Gbps adapter on the motherboard of the server (also Intel) and I get the same poor performance.
It seems as though there is something holding the FreeNAS back. It's waiting for something. I've tried setting delayed ack off but it had minimal impact. Plus, I am not convinced it is a network issue, especially since I've tried numerous network cards and driver versions. I will say that the laptop that works is actually running an older driver than anything else I have. I tried comparing the advanced settings of that adapter with the newly built 2016 server and the settings don't completely compare. Some are the same but the notebook has many more options than the server (seems backwards, I know).
Oh, and I also have another almost identical (just half as much storage) FreeNAS (built by ixsystems) at our datacenter and I don't appear to have the same problem with that. I'm using the same Intel X550 NICs in the hosts. I've configured both with the same tunables, as recommended on this forum. Those settings seem to work fine at the datacenter, but not at our office. I had tried to turn on jumbo frames on the one at the office, but have since turned all of that off while troubleshooting. Jumbo is working fine at the datacenter, though, using identical switching (not that that matters since the problem exists even with crossover cable).
Any ideas?
FreeNAS 11.1-U7
64GB of RAM
Twelve 12TB 7200 RPM disks in raidz2 config with two vdevs, 84TB
10Gbps Intel NIC pre-installed (X540, I think)
So the gist of it is that I cannot get this thing to continuous read data from the array (using either Windows or FastCopy) at more than about 30MB/s from a server running Server 2016 (or 2012R2 for that matter, which I also tried). This system is used as a storage for backups so the only thing that really matters is actual hardware read/write speed. ARC does nothing for me. Netdata shows my cache misses are close to 100% most of the time. This is because when something is written to disk, it's probably not going to be read again for a long time. So ultimately, what I care about is how fast the actual hardware and ZFS can deliver the data. No trickery.
The only reason I even care about the read speed is when I actually need to restore data. We had a client with an on-prem SAN experience simultaneous two-drive failure (didn't believe it either but that's what the logs showed). The part you'll really get a kick out of is that the SAN had two arrays on it: a RAID 5 and a RAID 10. You'll never guess which one failed.... (hint: it wasn't the RAID 5, surprising, right?). Anyway, they opted to have us move the downed server to our datacenter so I started copying the data off of the server that this freenas is connected to (which is actually on ESXi 6.7 but to keep things simple, I've ruled that all out as you'll see below).
It took about 6 hours to copy roughly 500GB (to a drive connected to a USB 3.0 interface). That seemed pretty slow. I took the backup to our datacenter and restored the backup to a new VM and that whole process took a little over an hour.... hmm.. Reading from one external USB drive was much faster than reading from the FreeNAS? Something must be going on.
Some time later, I decided to copy another backup chain off of the FreeNAS. A much bigger one (over 1TB). The estimate finally settled down to around 15 hours. That was just way too long. Luckily, we really believe in backups. So we ALSO backup everything on this FreeNAS to an older QNAP once a week. I wondered if I could get the data off of the QNAP any quicker. I plugged in my USB drive and copied that exact same 1TB+ server off in a little over 2 hours (!!). The QNAP by comparison, is running a single RAID6 array. It shouldn't be faster. Especially not to this degree.
So I've been troubleshooting and I've even gone as far as building a new physical server running Server 2016 with a Intel X550 10Gbps adapter and attaching it directly to the FreeNAS with a crossover cable and the performance does not change. BTW, my tests at this point, are always using FastCopy and I copy a random 3-4GB backup file (I never use the same one) to a RAM drive. If I do try to copy a file and then delete the RAM drive and copy the same file again I get a crazy fast speed. So fast, the copy is over in about a second. I think this is just ARC doing its thing and proving that the network connection is OK.
So it's gotta be the hard drives, right? You'd think so BUT, I can plug my laptop into the switch (or using a crossover direct to the FreeNAS) and using Windows 10's iSCSI Initiator get 100MB/s!! (limited only by the 1Gbps adapter in my laptop) But a clean, updated, install of Server 2016 is slow even when plugged in to the same exact crossover cable I used with the laptop. I've tried using the 1Gbps adapter on the motherboard of the server (also Intel) and I get the same poor performance.
It seems as though there is something holding the FreeNAS back. It's waiting for something. I've tried setting delayed ack off but it had minimal impact. Plus, I am not convinced it is a network issue, especially since I've tried numerous network cards and driver versions. I will say that the laptop that works is actually running an older driver than anything else I have. I tried comparing the advanced settings of that adapter with the newly built 2016 server and the settings don't completely compare. Some are the same but the notebook has many more options than the server (seems backwards, I know).
Oh, and I also have another almost identical (just half as much storage) FreeNAS (built by ixsystems) at our datacenter and I don't appear to have the same problem with that. I'm using the same Intel X550 NICs in the hosts. I've configured both with the same tunables, as recommended on this forum. Those settings seem to work fine at the datacenter, but not at our office. I had tried to turn on jumbo frames on the one at the office, but have since turned all of that off while troubleshooting. Jumbo is working fine at the datacenter, though, using identical switching (not that that matters since the problem exists even with crossover cable).
Any ideas?