I figured I'd do an update as I have been busy.
1. I won't go into the hardware RAID discussion, but I will say that FreeNAS Certified and TrueNAS system do not ship with hardware RAID.
2. slogs that are mirrored should perform only modestly more slow. For mirrored slogs, both devices have to return the writes as complete before the sync write can be acknowledged. This is likely only microseconds different for fast slogs, but could be milliseconds if your slog device sucks as badly as the one I listed in my example.
3. To answer the question about whether 200us versus 300us (insert any random comparison for time here) matters, the issue is a lot deeper. Until you start hitting upper limits for writes, you probably aren't going to see the difference unless you are talking extremely large latency differences. I promise you'll tell the difference between a device that is 300us and one that is 3ms, even before it is fully loaded. But if it's 100us difference between devices, you'll probably never know (or at least not until you start hitting the upper limits of what the devices can do).
4. I'm gonna paraphrase a little here, so I apologize if the info isn't correct or is a bit confusing.
NFS is the typical workload for sync writes. Because of this it is necessary to consider the NFS client as well as the NFS server in the equation. FHA (File Handle Association) allows NFS to do some pretty crazy things. I won't go into what it does as it gets complicated and someone else probably has a better way of explaining it than I do. But FHA serializes NFS write (or read) requests. When everything is going smoothly, this is good for reads, but not-so-good for writes. Having multiple slogs striped means we'd want multiple "streams" and having FHA enabled will limit the performance. Until FreeNAS/TrueNAS 11.1, you could enable or disable it, and it affected all NFS workloads (both reads and writes). So you had to pick your workload to optimize. The tunable was vfs.nfsd.fha.enable, and the default was 1. This did mean that if you had more than 1 slog device striped, and if you were using FHA that you likely weren't getting "much more than a little better than 1 device" because of the serialization. In my own testing, having 2 slog devices only gave you 20-40% more throughput in a write workload for NFS than having a single slog device. Adding more devices was even more diminishing in returns. Unfortunately, turning this off had serious performance limitations for many read workloads, so our
@mav@ went to work to fix it. There are now 3 tunables as of FreeNAS/TrueNAS 11.1:
vfs.nfsd.fha.write=0
vfs.nfsd.fha.read=1
vfs.nfsd.fha.enable=1
The numbers above are the default in 11.1-U4 based on my test system (these are the new defaults since 11.1). This allows us to enjoy the performance benefits of FHA for reads while removing the bottlenecks for writes. The above are probably the appropriate settings for the vast majority of FreeNAS servers out there, and shouldn't be changed. A reboot is required to change these values though if you want to do testing.
Until this change (it was my bug ticket that originally brought this problem onboard for us) having more than 2 slog devices, even if striped, really didn't give the kind of benefits we wanted.
As the bug ticket is internal to us, here's the summary from the developer:
Prior to going public, he wrote:
I've committed to nightly train patch adding two more sysctls to control NFS FHA. I expect with that patch setting vfs.nfsd.fha.write=0 should dramatically improve synchronous write performance with multiple parallel requests, especially if NFS servers configured with sufficiently large number of threads.
The final code comments are:
Improve FHA locality control for NFS read/write requests.
This change adds two new tunables, allowing to control serialization for read and write NFS requests separately. It does not change the default behavior since there are too many factors to consider, but gives additional space for further experiments and tuning.
The main motivation for this change is very low write speed in case of ZFS with sync=always or when NFS clients requests sychronous operation, when every separate request has to be written/flushed to ZIL, and requests are processed one at a time. Setting vfs.nfsd.fha.write=0 in that case allows to increase ZIL throughput by several times by coalescing writes and cache flushes. There is a worry that doing it may increase data fragmentation on disks, but I suppose it should not happen for pool with SLOG.