SLOG benchmarking and finding the best SLOG

Chris Moore · Jun 2, 2018

Elliot Dierksen said:
One of them arrived this morning. Here are the same tests from that.

This is an excellent illustration of why not to use a SATA SSD to be a SLOG device. It is about 3.5 times slower than the SAS SSD that I tested up here:
https://forums.freenas.org/index.ph...-and-finding-the-best-slog.63521/#post-455075

Elliot Dierksen · Jun 2, 2018

Chris Moore said:
This is an excellent illustration of why not to use a SATA SSD to be a SLOG device. It is about 3.5 times slower than the SAS SSD that I tested up here:

Yes, the numbers are pretty compelling. What do you think about the single device versus HW RAID1 mirroring versus FreeNAS mirroring or striping for the SLOG device?

Chris Moore · Jun 2, 2018

Elliot Dierksen said:
Yes, the numbers are pretty compelling. What do you think about the single device versus HW RAID1 mirroring versus FreeNAS mirroring or striping for the SLOG device?

I would not use hardware RAID with FreeNAS. For SLOG, mirror vdevs are suggested, but I am going to try striped mirrors.

Sent from my SAMSUNG-SGH-I537 using Tapatalk

js_level2 · Jun 5, 2018

I figured I'd do an update as I have been busy.

1. I won't go into the hardware RAID discussion, but I will say that FreeNAS Certified and TrueNAS system do not ship with hardware RAID.
2. slogs that are mirrored should perform only modestly more slow. For mirrored slogs, both devices have to return the writes as complete before the sync write can be acknowledged. This is likely only microseconds different for fast slogs, but could be milliseconds if your slog device sucks as badly as the one I listed in my example.
3. To answer the question about whether 200us versus 300us (insert any random comparison for time here) matters, the issue is a lot deeper. Until you start hitting upper limits for writes, you probably aren't going to see the difference unless you are talking extremely large latency differences. I promise you'll tell the difference between a device that is 300us and one that is 3ms, even before it is fully loaded. But if it's 100us difference between devices, you'll probably never know (or at least not until you start hitting the upper limits of what the devices can do).
4. I'm gonna paraphrase a little here, so I apologize if the info isn't correct or is a bit confusing.

NFS is the typical workload for sync writes. Because of this it is necessary to consider the NFS client as well as the NFS server in the equation. FHA (File Handle Association) allows NFS to do some pretty crazy things. I won't go into what it does as it gets complicated and someone else probably has a better way of explaining it than I do. But FHA serializes NFS write (or read) requests. When everything is going smoothly, this is good for reads, but not-so-good for writes. Having multiple slogs striped means we'd want multiple "streams" and having FHA enabled will limit the performance. Until FreeNAS/TrueNAS 11.1, you could enable or disable it, and it affected all NFS workloads (both reads and writes). So you had to pick your workload to optimize. The tunable was vfs.nfsd.fha.enable, and the default was 1. This did mean that if you had more than 1 slog device striped, and if you were using FHA that you likely weren't getting "much more than a little better than 1 device" because of the serialization. In my own testing, having 2 slog devices only gave you 20-40% more throughput in a write workload for NFS than having a single slog device. Adding more devices was even more diminishing in returns. Unfortunately, turning this off had serious performance limitations for many read workloads, so our @mav@ went to work to fix it. There are now 3 tunables as of FreeNAS/TrueNAS 11.1:

vfs.nfsd.fha.write=0
vfs.nfsd.fha.read=1
vfs.nfsd.fha.enable=1

The numbers above are the default in 11.1-U4 based on my test system (these are the new defaults since 11.1). This allows us to enjoy the performance benefits of FHA for reads while removing the bottlenecks for writes. The above are probably the appropriate settings for the vast majority of FreeNAS servers out there, and shouldn't be changed. A reboot is required to change these values though if you want to do testing.

Until this change (it was my bug ticket that originally brought this problem onboard for us) having more than 2 slog devices, even if striped, really didn't give the kind of benefits we wanted.

As the bug ticket is internal to us, here's the summary from the developer:

Prior to going public, he wrote:
I've committed to nightly train patch adding two more sysctls to control NFS FHA. I expect with that patch setting vfs.nfsd.fha.write=0 should dramatically improve synchronous write performance with multiple parallel requests, especially if NFS servers configured with sufficiently large number of threads.

The final code comments are:
Improve FHA locality control for NFS read/write requests.

This change adds two new tunables, allowing to control serialization for read and write NFS requests separately. It does not change the default behavior since there are too many factors to consider, but gives additional space for further experiments and tuning.

The main motivation for this change is very low write speed in case of ZFS with sync=always or when NFS clients requests sychronous operation, when every separate request has to be written/flushed to ZIL, and requests are processed one at a time. Setting vfs.nfsd.fha.write=0 in that case allows to increase ZIL throughput by several times by coalescing writes and cache flushes. There is a worry that doing it may increase data fragmentation on disks, but I suppose it should not happen for pool with SLOG.

Elliot Dierksen · Jun 8, 2018

I am duplicating this from a different thread. The two Hitachi SSD's arrived today, and they definitely performed better than the SATA ones. No surprise there. The write performance was just about a straight line, but two different peaks. The first peak was 3.0G with ~=2.7G sustained. Then it peaked at 4.3G sustaining around 4.0G. I suspect that is once the ARC allocated as much as it could have the 64G RAM this box has. With the two SSD's in a striped SLOG, the performance was not the far off from the Intel 900P.

Chris Moore · Jun 8, 2018

Elliot Dierksen said:
With the two SSD's in a striped SLOG, the performance was not the far off from the Intel 900P.

That is what I was curious about up there:

Chris Moore said:
Do you think that, if you were to stripe a couple of these together, they might be as fast as the P3700, with a lot less cost?

It might be the value option for adding a SLOG and if you had four of them, I wonder if you could use two for SLOG and two for L2ARC and how that would behave...

Elliot Dierksen · Jun 8, 2018

Chris Moore said:
It might be the value option for adding a SLOG and if you had four of them, I wonder if you could use two for SLOG and two for L2ARC and how that would behave...

Pardon my ignorance, but I have never had the occasion to add a dedicated L2ARC. What usage conditions would make that advantageous, and how does that help? The SLOG seemed a pretty self-explanatory thing.

Chris Moore · Jun 8, 2018

Elliot Dierksen said:
Pardon my ignorance, but I have never had the occasion to add a dedicated L2ARC. What usage conditions would make that advantageous, and how does that help?

There are conditions where you can end up with some or all of the data (the working set) you use on a regular basis in the L2ARC (on SSD) and accelerate the responsiveness of the pool. Here is a very well done video that talks about it in detail and gives some of the custom settings that make the L2ARC work even better than default configuration would allow.
https://www.youtube.com/watch?v=oDbGj4YJXDw&t

Elliot Dierksen · Jun 8, 2018

Chris Moore said:
There are conditions where you can end up with some or all of the data (the working set) you use on a regular basis in the L2ARC (on SSD) and accelerate the responsiveness of the pool.

That is interesting. I don't think it is applicable for how my FreeNAS gets used, but I am glad to have a better understand of what the L2ARC does. I have heard other storage platforms talk about different tiers of storage depending on how frequently certain parts of the data are accessed. This sounds very much like that sort of thing.

Chris Moore · Jun 8, 2018

Elliot Dierksen said:
That is interesting. I don't think it is applicable for how my FreeNAS gets used, but I am glad to have a better understand of what the L2ARC does.

It is a fairly special case that it would be useful. Most systems would benefit more by just having more RAM to work with.

sfcredfox · Jun 9, 2018

js_level2 said:
This change adds two new tunables, allowing to control serialization for read and write NFS requests separately. It does not change the default behavior since there are too many factors to consider, but gives additional space for further experiments and tuning.

@js_level2,
Does this have any effect on ctld for those that are using iSCSI or FibreChannel?

I saw your description mention NFS read/writes, but you also mentioned instances where sync=always, which is typically the case when someone chooses to use their system for an iSCSI target and wants the higher level of data protection.

sfcredfox · Jun 9, 2018

js_level2 said:
vfs.nfsd.fha.write=0
vfs.nfsd.fha.read=1
vfs.nfsd.fha.enable=1

I suspect if I wanted to test this if it does apply to ctl, I could:

Code:

vfs.nfsd.fha.write=1

And see the difference in high write activity against my four stripped SLOG devices?

EDIT: I'm not great at FreeBSD, but I'm feeling like this isn't going to apply to ctl since this tunable is vfs.NFSD.xxx right?

Stilez · Jun 9, 2018

This web page on serverthehome is worth some serious highlighting if you're looking at performance SLOG rather than budget SLOG. Most benchmarking of SSD/NVMe isn't really much use because SLOG is all about very specific patterns - and especially latency related to those patterns - and because latency of any kind is under-reported (latency faced with SLOG use patterns pretty much isn't reported at all...).

The link above benchmarks a number of well-known past and present SLOG devices (Intel S3700/P3700/Optane, Wsamsung, HGST) specifically for their performance stats as SLOG devices on FreeBSD 11.1. It's the only such test I've found covering this sector and providing metrics needed for FreeNAS use.

Can it be added to the resource?

js_level2 · Jun 9, 2018

sfcredfox said:
I suspect if I wanted to test this if it does apply to ctl, I could:

Code:
vfs.nfsd.fha.write=1

And see the difference in high write activity against my four stripped SLOG devices?

EDIT: I'm not great at FreeBSD, but I'm feeling like this isn't going to apply to ctl since this tunable is vfs.NFSD.xxx right?

You are correct, it does not apply. ctl shouldn't have the same performance bottlenecks that nfs did as the problem was with how nfs handled reads and writes internally.

@Stilez
The webpage lists the diskinfo slogbenchmark, which if you follow the links on that page, takes you to http://freshbsd.org/commit/freebsd/r321928. The user that is listed on that change is none other than our own mav, which you might recognize as the same name in the forums here. mav is an iXsystems employee, and that patch was added because of all of this work with nfs, and my request to have some kind of test that can provide some kind of useful numbers for comparing devices with regards to slog workloads.

So to make it clear as this is probably confusing now:

1. I submitted a bug ticket on TrueNAS for odd performance issues with nfs with sync writes and more than 1 slog device not performing as well as expected.
2. Mav write an application internal to ixsystems to test this. I tell him that it would be amazing if that could be included in FreeNAS and TrueNAS.
3. He took my idea and went pro, incorporating the test into diskinfo, which is available on several platforms including FreeBSD. (I'm in no way saying I deserve credit for the idea, just that he made it available to an even wider audience, which I totally endorse!)
4. Diskinfo was updated and released. The serverthehome guys posted an article using that exact test.
5. FreeNAS and TrueNAS 11.1 come out, which include the new version of diskinfo and the new nfsd sysctl.
5. Months go by....
6. I started this thread to help assist the community because there seemed to be confusion about how to "test" slog devices in a meaningful way. dd and others don't accurately allow you to compare slog workloads effectively, and it is not useful to compare numbers that don't really mean anything.

In essence, you and I are talking about the exact same thing. You are just
recommending we go to servethehome, while I had no idea that the servethehome topic covered this at all, and so I thought that this topic deserved discussion in this forum. Some might get some value because they know how all of this came about.

For some, this thread will also validate that the servethehome guys aren't just doing random useless benchmarks to compare things that don't matter. They are very good at what they do, and I'm glad to see that we are all re-enforcing each other on this topic.

Chris Moore · Jun 9, 2018

Stilez said:
The link above benchmarks a number of well-known past and present SLOG devices (Intel S3700/P3700/Optane, Wsamsung, HGST) specifically for their performance stats as SLOG devices on FreeBSD 11.1. It's the only such test I've found covering this sector and providing metrics needed for FreeNAS use.

It is the same test. They just made some pretty graphs out of the numbers

Stilez · Jun 10, 2018

js_level2 said:
@Stilez
In essence, you and I are talking about the exact same thing. You are just
recommending we go to servethehome, while I had no idea that the servethehome topic covered this at all, and so I thought that this topic deserved discussion in this forum. Some might get some value because they know how all of this came about.
For some, this thread will also validate that the servethehome guys aren't just doing random useless benchmarks to compare things that don't matter. They are very good at what they do, and I'm glad to see that we are all re-enforcing each other on this topic.

Chris Moore said:
It is the same test. They just made some pretty graphs out of the numbers

Thanks @js_level2 and @Chris Moore - both for the explanation, the background - and for adding to the way each community supports the others. It's good to hear. Thank you - and thanks mav as well!

Stilez · Jun 10, 2018

Incidentally, I have a P3700 400GB (PCIe card format) with Supermicro X10 (Xeon E5-16xx v4). The Intel docs suggest that the PCIe card 900p 280GB is compatible, and the 900p isn't that expensive, so these benchmarks make it quite appealing to sell the P3700 and trade up.

The main advantage of the P3700 is its sheer low-latency consistency under load. Is this really a card that'll get me significantly higher performance/lower ZIL latency across the board for sync write, under pretty much all sync loads?
Are there any known downsides to the 900p card compared to the P3700 card (I can't see any in tech media coverage)?

(I'm on 10G here, and ESXi routinely writes 500GB+ across the LAN, so it could be useful info!)

tazinblack · Jul 2, 2018

Interesting discussion. Here are my results with a Intel SSD DC S4600, unfortunately with SATA interface

:

Code:

smartctl -a /dev/ada0
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:	 INTEL SSDSC2KG240G7
Serial Number:	PHYM808201WZ240AGN
LU WWN Device Id: 5 5cd2e4 14f1b96fe
Firmware Version: SCV10121
User Capacity:	240,057,409,536 bytes [240 GB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	Solid State Device
Form Factor:	  2.5 inches
Device is:		Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Mon Jul  2 17:09:37 2018 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Code:

diskinfo -wS /dev/ada0
/dev/ada0
		512			 # sectorsize
		240057409536	# mediasize in bytes (224G)
		468862128	   # mediasize in sectors
		4096			# stripesize
		0			   # stripeoffset
		465141		  # Cylinders according to firmware.
		16			  # Heads according to firmware.
		63			  # Sectors according to firmware.
		INTEL SSDSC2KG240G7	 # Disk descr.
		PHYM808201WZ240AGN	  # Disk ident.
		Not_Zoned	   # Zone Mode

Synchronous random writes:
		 0.5 kbytes:	 56.6 usec/IO =	  8.6 Mbytes/s
		   1 kbytes:	 55.6 usec/IO =	 17.6 Mbytes/s
		   2 kbytes:	 55.0 usec/IO =	 35.5 Mbytes/s
		   4 kbytes:	 54.7 usec/IO =	 71.4 Mbytes/s
		   8 kbytes:	 65.5 usec/IO =	119.2 Mbytes/s
		  16 kbytes:	 87.4 usec/IO =	178.8 Mbytes/s
		  32 kbytes:	147.8 usec/IO =	211.5 Mbytes/s
		  64 kbytes:	296.2 usec/IO =	211.0 Mbytes/s
		 128 kbytes:	553.3 usec/IO =	225.9 Mbytes/s
		 256 kbytes:   1161.6 usec/IO =	215.2 Mbytes/s
		 512 kbytes:   2354.0 usec/IO =	212.4 Mbytes/s
		1024 kbytes:   4350.4 usec/IO =	229.9 Mbytes/s
		2048 kbytes:   9574.0 usec/IO =	208.9 Mbytes/s
		4096 kbytes:  18095.0 usec/IO =	221.1 Mbytes/s
		8192 kbytes:  36743.2 usec/IO =	217.7 Mbytes/s

tazinblack · Jul 2, 2018

If low latency is advantageous, you should look for Intel Optane devices.
But these aren't classic NAND based SSDs any more. They are SCM (storage class memory).
They have very low latency and the memory cells are not aging with writes. (Thats the promise)

Chris Moore · Jul 2, 2018

tazinblack said:
If low latency is advantageous, you should look for Intel Optane devices.
But these aren't classic NAND based SSDs any more. They are SCM (storage class memory).
They have very low latency and the memory cells are not aging with writes. (Thats the promise)

The Optane drives are still really expensive, so not many people are posting results yet, but it will probably give you the best results.

Sent from my SAMSUNG-SGH-I537 using Tapatalk

Important Announcement for the TrueNAS Community.

SLOG benchmarking and finding the best SLOG

Hall of Famer

Guru

Hall of Famer

Dabbler

Guru

Hall of Famer

Guru

Hall of Famer

Guru

Hall of Famer

Patron

Patron

Guru

Dabbler

Hall of Famer

Guru

Guru

Explorer

Explorer

Hall of Famer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "SLOG benchmarking and finding the best SLOG"

Similar threads