SLOG on 2x Optane - mirrored vs striped SLOG peculiarity

bonox · Dec 12, 2021

Point of interest though, when you've got a single 300GB file like these, (any suitably large file really, ie anything bigger than your SLOG size) you're really only home once the whole file makes it to disk. A SLOG won't help you if the write fails anywhere in the middle - from the first byte to the last - which is true of all default values (64GB txg trigger or 5 seconds etc) or stuffing the whole thing asynchronously into RAM (with much bigger values than the defaults like 320GB and 300 seconds) and writing it out to disk slowly as the performance of the pool will allow.

I've not seen anywhere in the literature a suggestion that the SLOG should be at minimum the size of the largest file you want to be sure you don't lose to a kernel panic or power out. This should surely mean that a SLOG is only of value to any synchronous write smaller than the 10%/4GB margin in general or at least up to the size of the SLOG. The Jim Salter article notes that the SLOG only needs to be small - to account for a few seconds worth of writes.

It [SLOG] also doesn’t need to be very large – just enough to hold a few seconds’ worth of writes is the quote.

Any single file larger than that will automatically be corrupted with a failure, SLOG or not. And a continuation of my earlier thoughts on "you'll lose what's in a large dirty RAM cache if you get a power fail/kernel panic etc", yes that's true, but i still don't see a difference between losing a complete file in RAM (after a full bore transfer from the file source) on the filer part way through the pool write, and the alternative of having the server panic and stop part way through a file write when the data is still trickling in more slowly over the network with the filer throttling the input to control a slow writing pool. They will both fail to write the file and a verify from the file source will pick up that failure. That's true of both asynchronous and sync/slog behaviour isn't it?

Have I understood that right or not? Is ZFS's SLOG capable for example of picking exactly where you left off, so a fail in the last half of a sync 60GB file write would still succeed on restart with a 32GB SLOG? But a fail in the first half would always produce a corrupted result.

I realise i've wandered in and out of sync/async in my rambling above, hopefully you work out what i mean....

sretalla · Dec 12, 2021

bonox said:
I've not seen anywhere in the literature a suggestion that the SLOG should be at minimum the size of the largest file you want to be sure you don't lose to a kernel panic or power out.

Probably because it isn't a factor as you're theorizing...

ZFS works in blocks, not files, so blocks are what are reported back as written to the requesting service when sync writes are requested.

As long as the calling service isn't told that the entire file has finished having all its blocks written, all is good in the world since it can't possibly think it was finished, so should not have let go of it's end of feeding the data in, feeling like it was done.

The only danger we've been talking about here is when you tell the requesting service you've written to disk (but haven't really) and then lose that content.

bonox said:
Is ZFS's SLOG capable for example of picking exactly where you left off, so a fail in the last half of a sync 60GB file write would still succeed on restart with a 32GB SLOG?

Maybe... it's 100% dependent on the calling service (and application behind it). If it can handle some blocks already being there and continue, no problem. ZFS doesn't care... all the blocks it said were on disk are there.

HoneyBadger · Dec 12, 2021

bonox said:
Have I understood that right or not? Is ZFS's SLOG capable for example of picking exactly where you left off, so a fail in the last half of a sync 60GB file write would still succeed on restart with a 32GB SLOG? But a fail in the first half would always produce a corrupted result.

That's going to be more based on the client side and if the protocol you're using tolerates remote failures gracefully, and with enough retries/duration for you to detect the failure, complete the reboot, and reimport the pool. If it's a particularly patient client and a fast server, you might get it done before the timeout, but I believe the popular expression these days is "press X to doubt" if I'm honest.

To paraphrase, having an SLOG is more about having the server and client agree on the state of the file. In a sync-write scenario the client will at least have the same idea if the file is safe or corrupted. Async, you could write the entire file to the server, the client would think it's safe, and then on recovery be very disappointed to find out that isn't true.

jgreco · Dec 12, 2021

bonox said:
The Jim Salter article notes that the SLOG only needs to be small - to account for a few seconds worth of writes.

Which is how you know not to take the article particularly seriously. There's no characterization of what the factors are, just things he read in random places on the Internet.

bonox said:
I've not seen anywhere in the literature a suggestion that the SLOG should be at minimum the size of the largest file you want to be sure you don't lose to a kernel panic or power out

Correct. The ZIL/SLOG has *nothing* to do with files or file sizes. It has to do with protecting individual data or metadata blocks being written to the pool, for the reasons @HoneyBadger discusses above. Otherwise, how would you ever write a 1TB file to your filer without a 1TB SLOG device? That'd be crazy.

bonox said:
Any single file larger than that will automatically be corrupted with a failure, SLOG or not.

False, unless you're defining "corrupted" to include truncated, where it COULD be true in some cases. But as @HoneyBadger says, this is really more about the protocol and resilience. The very REASON for SLOG devices to exist is to prevent the loss of write data, so that you can be writing with NFS or iSCSI (let's say from a hypervisor, but it really doesn't matter), the filer panicks, reboots, reimports the pool, commits the ZIL, and picks up exactly where it left off, as long as the client reconnects and retries the unacknowledged block in process. Zero loss, zero corruption, zero truncation. That is the purpose of sync writes, and it is GUARANTEED.

bonox · Dec 12, 2021

Thanks all. I do regard a truncated file as corrupted but i've obviously been erroneously labouring under the misapprehension that NFS and SMB are file and not block related mechanisms like iSCSI. Certainly my experience in general has been that none of my clients would wait 20+ minutes for this file server to restart so any large file caught in that trap would be lost regardless of slog. It may be possible I suppose, but i've also never seen a built in windows OS file copy pick up a partial file and continue like FTP/rsync can. Much to learn I have.

jgreco · Dec 12, 2021

bonox said:
i've obviously been erroneously labouring under the misapprehension that NFS and SMB are file and not block related mechanisms

Well, they ARE file mechanisms, obviously.

bonox said:
Certainly my experience in general has been that none of my clients would wait 20+ minutes for this file server to restart

NFS will *absolutely* be happy to stall until your fileserver recovers unless you have soft mount enabled. As a matter of fact, you cannot get any other behaviour out of a hard mount, and only a reboot of the client will cause the stuck processes to clear, in most cases. If you are able to interrupt, you may have soft mounts enabled. This would allow for NFS client I/O on the NFS filesystem to be interrupted.

However, underneath it all, a filesystem essentially works on TOP of a block data store. This is very clear with things such as UFS/FFS. It's less clear with ZFS, because the storage manager and the filesystem are merged in a way that allows them to work cooperatively. Still, certain aspects of ZFS, such as how the ZIL/SLOG works, are really block-level issues and have very little to do with files.

sretalla · Dec 12, 2021

bonox said:
i've also never seen a built in windows OS file copy pick up a partial file and continue like FTP/rsync can

Robocopy is a built-in utility... does that count?

jgreco · Dec 12, 2021

sretalla said:
Robocopy is a built-in utility

It is? Well, $#!+. You learn something new every day.

HoneyBadger · Dec 12, 2021

jgreco said:
It is? Well, $#!+. You learn something new every day.

To be fair, it's a Microsoft utility. Can't really have expected you to know that one.

jgreco · Dec 13, 2021

HoneyBadger said:
To be fair, it's a Microsoft utility. Can't really have expected you to know that one.

Well, it's not that I'm totally unfamiliar with CP/M and DOS, but I try hard not to be a Windows washer, even if I have written some pretty complicated scripting to do semi-automated Windows installs.

One of my clients used to be Exec-PC BBS, at the time the quite possibly the largest file exchange and download site on the planet for PC's, and PKware (of PKZIP fame) was local too, so I often heard about this-amazing-thing or that-amazing-thing for DOS. I know I'd heard of robocopy a bunch of times. Google says it was the heir-apparent to XCOPY like 25 years ago, but, honestly, I'm still using XCOPY.

Pretty much anyone watching the forums for any length of time knows I don't really do the SMB, Samba, or AD questions, or NTFS/Windows ACL questions, I figure many of the posters asking questions probably know more about the bits they're asking about than I do.

So I'm a weird mess on the Windows stuff.

Important Announcement for the TrueNAS Community.

SLOG on 2x Optane - mirrored vs striped SLOG peculiarity

bonox

Dabbler

sretalla

Powered by Neutrality

HoneyBadger

actually does care

jgreco

Resident Grinch

bonox

Dabbler

jgreco

Resident Grinch

sretalla

Powered by Neutrality

jgreco

Resident Grinch

HoneyBadger

actually does care

jgreco

Resident Grinch

Similar threads

Important Announcement for the TrueNAS Community.

SLOG on 2x Optane - mirrored vs striped SLOG peculiarity

Dabbler

Powered by Neutrality

actually does care

Resident Grinch

Dabbler

Resident Grinch

Powered by Neutrality

Resident Grinch

actually does care

Resident Grinch

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "SLOG on 2x Optane - mirrored vs striped SLOG peculiarity"

Similar threads