ZILSTAT Missing

jlw52761

Explorer
Joined
Jan 6, 2020
Messages
87
TrueNAS-SCALE-22.12.3.3
zilstat does not exist on a fresh install. Looked under /usr/sbin, /sbin, /bin, and /usr/bin. Now before folks post about "maybe it's your PATH", that has nothing to do with doing a listing of a directory specifically. I only say this because there's a ton of past posts that that have this comment first and foremost.
So, where is zilstat and if not already on the system, the where is it?
 
Joined
Oct 22, 2019
Messages
3,641
Maybe the version of zfsutils-linux that ships with Debian doesn't include the zilstat executable?
 

jlw52761

Explorer
Joined
Jan 6, 2020
Messages
87
Yep, zilstat is not part of zfsutils-linux. So, how does one add this to TrueNAS Scale? I don't actually see a package that distributes this oddly enough.
I did find a script called zilstat that requires ksh (K Shell) in order to run, but looks like it does some sampling of the ZIL device itself and looks at the raw data. Probably the best would get if we could add ksh to TrueNAS Scale. Otherwise, gonna have to rewrite it for bash or something.
 

jlw52761

Explorer
Joined
Jan 6, 2020
Messages
87
Unfortunately that doesn't help much because I can't find what OID to use to pull the data. Seems like a slip-up on iX's part for getting Scale out there.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hey @jlw52761

The zilstat script in place on CORE relies heavily on stats from dtrace which can't be queried on Linux.

You can pull raw statistics from /proc/spl/kstat/zfs/zil but I'll admit it's not as well formatted as the zilstat output.

What metrics specifically are you looking for insight into?
 

jlw52761

Explorer
Joined
Jan 6, 2020
Messages
87
Knowing what the usage of the slog is in relation to incoming data can help determine if there’s bottlenecks there and if adding a second slog for increased IO is needed. I did cat the path you mentioned, but there’s no documentation on what each item is measuring so writing a script isn’t going to happen.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Usage and throughput can probably be brought out of iostat -x - you can have it repeat every few seconds as well as filter by device by using iostat -x sdf sdg 5 for those example devices and a 5s reporting window.

Code:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.31    0.00    0.80    0.00    0.00   97.89

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdf              0.00      0.00     0.00   0.00    0.00     0.00   20.00    104.80     0.00   0.00    0.04     5.24    0.00      0.00     0.00   0.00    0.00     0.00    0.40    0.00    0.00   0.32
sdg              0.00      0.00     0.00   0.00    0.00     0.00   22.00    119.20     0.00   0.00    0.04     5.42    0.00      0.00     0.00   0.00    0.00     0.00    0.40    0.00    0.00   0.32
 

jlw52761

Explorer
Joined
Jan 6, 2020
Messages
87
I suppose that helps to understand the IO and MB/s going to and from the disk that houses the ZIL, which based on my benchmark I did of the disk before adding it to the pool and the numbers I'm seeing things should be good.
Gor reference, I benchmarked about 5200 sequential read and write IOPS, and 1200 random read and write IOPS. I figure the ZIL should be more sequential than random as it's more of a FIFO buffer. So me seeing w/s and r/s of 1200 under heavy load with await numbers being > 0.2, the disk is being used but not being overwhelmed so adding a second disk isn't required at this point.
That's really what I was going for overall, and maybe getting this data into Grafana so I can track over time, but not sure how to get that data at the moment as I'm not sure if that's exposed via SNMP.
 
Top