How to interpret zilstat output?

Status
Not open for further replies.
Joined
Dec 2, 2015
Messages
730
I'm pondering the pros and cons of adding a SLOG to my server. Looking at how I use the server, I don't expect many sync writes, so in
theory there should be no point to adding a SLOG. But looking at the output of zilstat, while I often see nothing but zeros, there are periods when I see writes to the ZIL.

The server serves several purposes:

  1. Document archiving, for documents that are not actively used. The documents are on SMB shares.
  2. Data storage for Apple iTunes and Photos libraries (very large numbers of small files).
  3. Plex server, with plex running in a jail, and the media stored on the server.
  4. Apple Time Machine backups from three computers, using AFP.

There are no NFS shares, and no VMs. ZFS sync is set to "standard". There are two pools. The main pool serves the above purposes. The backup pool is a complete copy of the main pool, kept in sync with zfs replication. There is also a second server that has another complete copy, kept in sync with zfs replication from the main pool on the main server.

The problem is that I have no idea magnitude of ZIL activity is significant. I'd appreciate any comments from those who know how to interpret the output of zilstat. Note: although my workload shouldn't benefit from a SLOG, according to accepted wisdom, I have seen at least one report of a benefit when copying large numbers of small files, which is something that I do on occasion when using iTunes.app and Photos.app.

I'd appreciate any knowledgeable comment on the following zilstat output.

Code:
zilstat -t 60
TIME					N-Bytes  N-Bytes/s N-Max-Rate	B-Bytes  B-Bytes/s B-Max-Rate	ops  <=4kB 4-32kB >=32kB
2018 Oct  3 20:46:50   24008128	 400135	 973624  105529344	1758822	4980736   1370	  0	 61   1309
2018 Oct  3 20:47:50   24481248	 408020	1138984  120053760	2000896	4395008   1440	  0	 58   1382
2018 Oct  3 20:48:50   31471344	 524522	1654392  126468096	2107801	5033984   1314	  0	 68   1246
2018 Oct  3 20:49:50   37227904	 620465	2068936  137191424	2286523	6455296   1533	  0	125   1408
2018 Oct  3 20:50:50   40671992	 677866	1849368  154189824	2569830	6946816   1529	  0	 33   1496
2018 Oct  3 20:51:50   45287640	 754794	1943512  176201728	2936695	7593984   1579	  0	  2   1577
2018 Oct  3 20:52:50   40446360	 674106	1941648  163663872	2727731	5570560   1528	  0	 14   1514
2018 Oct  3 20:53:50   31088448	 518140	1826176  110940160	1849002	5111808   1048	  0	 21   1027
2018 Oct  3 20:54:50   28072592	 467876	1227072  110628864	1843814	4845568   1026	  0	 39	987
2018 Oct  3 20:55:50   23556800	 392613	1366704   95031296	1583854	4112384	974	  0	 13	961
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Hi Kevin,

FreeNAS 11.1 merged Samba 4.7.3 which changed the default optons on Samba to strict sync = yes

https://forums.freenas.org/index.php?threads/smb-share-slow-and-capped-at-30mbps.63748/#post-456938

Strict sync is discussed here:
https://www.samba.org/samba/docs/current/man-html/smb.conf.5.html

While Windows is still writing async, we've seen reports from OSX users that it is requesting (and using) sync writes, which FreeNAS 11.1 and later will honor, thus using your SLOG.

For your data, your interval is 60 seconds - thankfully, zilstat also reports the values per second so we don't have to do any math ourselves.

You're only writing an average of 2.5MB/s combined between the data (N-Bytes) and buffer (B-Bytes) on average, with the max rate hitting less than 10MB/s combined. I assume that's because your client machines aren't requesting a lot of actual sync write activity - if all of your traffic was being choked down to under 10MB/s you'd notice that on the progress bars.

I would suggest using a smaller interval number on the zilstat script, and try to identify exact which workload is generating the sync writes. If you feel its performance could be improved, an SLOG device might be beneficial - check the link in my signature for some options.
 
Joined
Dec 2, 2015
Messages
730
...

I would suggest using a smaller interval number on the zilstat script, and try to identify exact which workload is generating the sync writes. If you feel its performance could be improved, an SLOG device might be beneficial - check the link in my signature for some options.
Thanks for the detailed and very useful reply. I'll reduce the interval, and try to identify which workloads create the sync writes.
 
Status
Not open for further replies.
Top