Very sporadic write performance

Status
Not open for further replies.

BrentI

Dabbler
Joined
Nov 29, 2014
Messages
12
We have recently upgraded to v9.3.
The system is a Dual E5430, 32GB RAM, 256GB L2ARC, LSI 9211-8i, 12x WD20EARX.
During this test I am using a single Samsung 850 Pro SSD connected to the LSI 9211-8i to ensure that the speed issues aren't due to slow HDD's.
I have set sync=disabled to try and optimise NFS write speed.
When writing using NFS from linux using FIO to simulate random read/writes I get the following behaviour where performance is good but it drops off to 0KB/s at times while ZFS seems to write the data to disk and then resumes.
This is giving latency in the order of 8 seconds (not milliseconds) at times.
Below is a small extract of the fio output.
When it first starts it is running around 27MB/s writes and zpool iostat shows 0Mbps. Then after say 5-10 seconds it will spike up to 200-400MB/s which is around the time that the throughput drops to 0KB/s via NFS and the latency occurs.

Is there any reason it blocks I/O completely? I have done a lot of searching about the Write throttling and tweaks in relation to it but I cannot get it to perform. I don't mind sacrificing speed to have consistent latency.

Jobs: 4 (f=4): [mmmm] [3.7% done] [31404KB/32224KB/0KB /s] [7851/8056/0 iops] [eJobs: 4 (f=4): [mmmm] [5.3% done] [31480KB/33208KB/0KB /s] [7870/8302/0 iops] [eJobs: 4 (f=4): [mmmm] [6.8% done] [30977KB/30701KB/0KB /s] [7744/7675/0 iops] [eJobs: 4 (f=4): [mmmm] [8.1% done] [27724KB/26984KB/0KB /s] [6931/6746/0 iops] [eJobs: 4 (f=4): [mmmm] [9.3% done] [28616KB/27900KB/0KB /s] [7154/6975/0 iops] [eJobs: 4 (f=4): [mmmm] [10.7% done] [27184KB/27872KB/0KB /s] [6796/6968/0 iops] [Jobs: 4 (f=4): [mmmm] [12.0% done] [28288KB/28732KB/0KB /s] [7072/7183/0 iops] [Jobs: 4 (f=4): [mmmm] [13.5% done] [33672KB/34484KB/0KB /s] [8418/8621/0 iops] [Jobs: 4 (f=4): [mmmm] [15.1% done] [32600KB/32704KB/0KB /s] [8150/8176/0 iops] [Jobs: 4 (f=4): [mmmm] [16.4% done] [31476KB/30424KB/0KB /s] [7869/7606/0 iops] [Jobs: 4 (f=4): [mmmm] [17.8% done] [29768KB/29420KB/0KB /s] [7442/7355/0 iops] [Jobs: 4 (f=4): [mmmm] [19.2% done] [32288KB/32280KB/0KB /s] [8072/8070/0 iops] [Jobs: 4 (f=4): [mmmm] [19.2% done] [1124KB/1120KB/0KB /s] [281/280/0 iops] [eta Jobs: 4 (f=4): [mmmm] [19.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 01m:08s] Jobs: 4 (f=4): [mmmm] [19.3% done] [10836KB/10652KB/0KB /s] [2709/2663/0 iops] [Jobs: 4 (f=4): [mmmm] [20.7% done] [26028KB/26228KB/0KB /s] [6507/6557/0 iops] [Jobs: 4 (f=4): [mmmm] [21.9% done] [29872KB/30468KB/0KB /s] [7468/7617/0 iops] [Jobs: 4 (f=4): [mmmm] [23.4% done] [31564KB/31676KB/0KB /s] [7891/7919/0 iops] [Jobs: 4 (f=4): [mmmm] [24.8% done] [31040KB/30268KB/0KB /s] [7760/7567/0 iops] [Jobs: 4 (f=4): [mmmm] [26.2% done] [30408KB/30292KB/0KB /s] [7602/7573/0 iops] [Jobs: 4 (f=4): [mmmm] [27.6% done] [32504KB/32856KB/0KB /s] [8126/8214/0 iops] [Jobs: 4 (f=4): [mmmm] [29.1% done] [30764KB/31092KB/0KB /s] [7691/7773/0 iops] [Jobs: 4 (f=4): [mmmm] [30.7% done] [31288KB/31304KB/0KB /s] [7822/7826/0 iops] [Jobs: 4 (f=4): [mmmm] [32.0% done] [31580KB/31412KB/0KB /s] [7895/7853/0 iops] [Jobs: 4 (f=4): [mmmm] [33.7% done] [31096KB/30440KB/0KB /s] [7774/7610/0 iops] [Jobs: 4 (f=4): [mmmm] [35.4% done] [31540KB/30516KB/0KB /s] [7885/7629/0 iops] [Jobs: 4 (f=4): [mmmm] [35.4% done] [12536KB/12124KB/0KB /s] [3134/3031/0 iops] [Jobs: 4 (f=4): [mmmm] [36.0% done] [17272KB/17252KB/0KB /s] [4318/4313/0 iops] [Jobs: 4 (f=4): [mmmm] [37.4% done] [29872KB/29816KB/0KB /s] [7468/7454/0 iops] [Jobs: 4 (f=4): [mmmm] [38.8% done] [31060KB/30644KB/0KB /s] [7765/7661/0 iops] [Jobs: 4 (f=4): [mmmm] [40.6% done] [31196KB/30200KB/0KB /s] [7799/7550/0 iops] [Jobs: 4 (f=4): [mmmm] [41.2% done] [8276KB/8032KB/0KB /s] [2069/2008/0 iops] [etJobs: 4 (f=4): [mmmm] [41.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:59s] Jobs: 4 (f=4): [mmmm] [41.7% done] [2584KB/2384KB/0KB /s] [646/596/0 iops] [eta

Here is the summary output showing the latency
random-readwrite: (groupid=0, jobs=1): err= 0: pid=2001: Fri Jan 9 13:52:48 2015
read : io=524704KB, bw=4972.4KB/s, iops=1243, runt=105524msec
slat (usec): min=247, max=9841.7K, avg=788.99, stdev=44548.90
clat (usec): min=22, max=9890.9K, avg=12179.98, stdev=166754.67
lat (usec): min=716, max=9891.4K, avg=12969.34, stdev=172636.71
write: io=523872KB, bw=4964.5KB/s, iops=1241, runt=105524msec
slat (usec): min=5, max=51600, avg= 9.68, stdev=142.78
clat (usec): min=3, max=9889.8K, avg=12764.23, stdev=184508.29
lat (usec): min=9, max=9889.8K, avg=12774.06, stdev=184511.08

Any help or pointing in the right direction would be greatly appreciated.
 
Last edited:

BrentI

Dabbler
Joined
Nov 29, 2014
Messages
12
My apology that didn't format well. But basically it runs at around 20-30MB/s for about 10 seconds and then drops to 0KB/s for a few seconds and then resumes.
 

BrentI

Dabbler
Joined
Nov 29, 2014
Messages
12
In all guides I have read 32GB of RAM is not a problem for a 256GB L2ARC. Have I been mistaken with that?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
That is probably mistaken. 1:4 or maybe 1:5 ratio for RAM:L2ARC, and shouldn't be with less than 64GB RAM. Once you understand the workload and have analyzed it over time by watching the ARC and L2ARC, you can tinker considerably with the ratio to make for happier situations. Try actually disabling (removing) the L2ARC as a first step.

How full is the pool?

There are some notes near the end of bug 1531 that may still be relevant. I haven't tried this under 9.3 at all so I don't have much more to say.
 

BrentI

Dabbler
Joined
Nov 29, 2014
Messages
12
Yes bug 1531 was pretty much exactly the same symptoms.
Unfortunately it appears that the write_limit_shift that was used to tweak the performance back then is no longer available in v9.3
The pool is at 1% usage. It is a 256GB SSD, and there is only 4GB of data on it.
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
testssd 236G 4.02G 232G - 25% 1% 1.00x ONLINE /mnt

I should have been clearer sorry, the L2ARC is not on the same pool as there was no point as it is already an SSD.
I will however state that during the testing/issues there was no I/O on the other pool that has the L2ARC.
As the dataset is so small it would have all been cached in ARC.

Sequential writes to the pool are fast with 450-500MB/s throughput. It's only the random writes that seem to cause the long pauses.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I got nothin'. On the flip side, I've got hardware in the shop to be building a 9.3 based E5 filer for VM storage, and I intend to "start small" and see what's what in current code. I haven't seriously tried to create a VM storage pool (as in "buying optimal hardware") since 8.mumble.... so I'll probably run into any real problems too.
 
Status
Not open for further replies.
Top