Time to Reevaluate the "Mirror your Slog (ZIL Drive)" recommendation?

mattlach · Sep 11, 2014

Seemingly the rest of the ZFS community outside of FreeNAS has done just this.

Back in the v15 pool days, it was CERTAINLY a good idea to mirror your SLOG, as if you lost it your pool was garbage.

These days this is no longer the case, but for some reason FreeNAS still recommend this as best practice. The question is, is it time to reevaluate this recommendation?

TO support this argument, let's go over again how the ZIL (ZFS Intent Log) works:

First off, the SLOG is NOT a cache device and thus the ZIL has nothing to do with cache. It has no cache function what so ever.

The ZIL is never read from during normal use. (and it is never read from or written to for async writes)

If a SLOG is present, the system ZIL is placed on it.

During async writes, ZFS will accept data for a write to RAM, and immediately report back (lie) to the writing device that the data has been committed to disk, even though it hasn't, and a power outage could still cause data loss.

This is why sync writes are recommended for important data. With the default FreeNAS implementation, the ZIL resides on the regular pool. When a sync write is received, it writes an fast log write to the ZIL, then reports back that data has been committed to disk. This is a duplicate copy, the data still resides in RAM and is written to the main pool the next write cycle. Once the data has been written to the main pool, the ZIL data is no longer necessary, and is thus discarded.

The ZIL is never read from UNLESS there is a power outage or other interruption before the data is written to the main pool. Then the ZIL data is read the next time the pool is mounted, and the data in the ZIL is recreated and written to the pool.

When you have a SLOG (a separate log device) it still works the same, but the ZIL is then moved to the separate device, which is hopefully faster than your spinning disks in your pool. With a fast enough SLOG, your sync writes can start to approximate async writes, because as soon as the data has been committed to the ZIL, the system can continue.

So, how would one lose data if a SLOG is lost?

Well, the SLOG would have to die. But just the SLOG dying is not enough. (remember that the data is still in RAM, and written to the pool from there). In addition to the SLOG dying, you would also need to lose power (or otherwise freeze the system) in the second or so before the next write cycle from RAM completes.

How likely is this anyway?

On a stable system with a UPS, I would consider it VERY unlikely of this to happen.

If your FreeNAS server is air starved, and likely to overheat or you have bad non-ECC RAM or you don't have a UPS, that likelihood goes up, but you'd still have to have the SLOG die and the system hang/reset almost at exactly the same time.

I'm starting to think that for a system with ECC RAM, that is well cooled and has a UPS, there is absolutely no reason at all to mirror your slog. Save the $200 bucks you'd spend on an Intel s3700 SLOG device and take your significant other out to a nice dinner instead. :p

Thoughts? (I'm sure Cyberjock will have some)

aufalien · Sep 11, 2014

Yes, I haven't mirrored my ZIL in well, since I've been rolling out several ZFS servers. I do have ECC and rather beefy UPS with the ability to shut down systems gracefully upon power cut. But my UPSs have an hour run time and SMS pretty much wakes me from a deep slumber, as does my team.

I've tried to corrupt my pools by simply yanking power chords during writes etc... to no avail, the pool is always fine. However a note about saving $200, its $500ish for me but I digress.

Also, Nexenta systems have a single ZeusRAM for SLOG on there systems as well. One can argue, "hey is a $1K device so it better never go bad" but it all fails, just a matter of when. I've seen cheap stuff outlast expensive stuff and visa versa.

cyberjock · Sep 11, 2014

First, good writeup.

Second, we've been around this before... like 2 years ago.

Everything you said is completely valid and not inaccurate at all. The statistical chances are low.

But at the end of the day you can ask yourself this very simplified question:

If you are putting redundancy on your data, isn't your slog considered 'your data'. If so, don't you want redundancy?

This is no different than arguing ECC vs non-ECC. Or RAIDZ1 versus RAIDZ2. There's tradeoffs with what choices you make. The "future you" may not be so happy if you decide to do 10 disks in a stripe. You're literally setting your "fate" level (if you believe in fate) based on how you configure your system.

The biggest problem I'm seeing with SSDs on slogs is the fact that they seem to have a propensity to work just fine and appear to be healthy, until they are power cycled. Then they are never detected again.

You also should consider the people that benefit from the slog the most. They typically have the largest systems and are using critical applications (usually VMs) from their FreeNAS box. If you are already dropping that kind of cash for L2ARCs, 128+GB of RAM, etc adding a second slog is not adding excessive cost.

There are some people that have tried to argue about mirroring L2ARCs. The basis: If you lose an L2ARC your pool's performance will tank. This will hurt VM performance and some VMs may end up offlined as a result.

The reality is that this is a discussion in which there is no 'yes/no' answer. You and I can discuss everything until we're blue in the face. It probably won't change either of our minds.

At the end of the day, it's your data and your risk. If you feel the money on the extra L2ARC or slog is better off in your pocket, then go for it. You'll be the one that misses you data if things go bad.

Carl Thompson · May 28, 2017

I'll add to this old topic to put things more concisely. If you need a SLOG device it's not just because you need sync writes. ZFS can do sync writes just fine without a SLOG. If you need a SLOG it's because you need to do sync writes more quickly than your pool could on its own.

So the point of mirroring your SLOG devices isn't so that you won't lose data on your ZFS server if the SLOG device dies. The point of mirroring is to ensure that if you need sync write IOPS to be >= X for your storage then even if a one of the component SLOG devices dies sync write IOPS stay >= X. If your storage can run sufficiently without the SLOG device why would you have one in the first place?

So mirrored SLOG devices aren't to protect data on the ZFS server. They're to protect everything else that uses the storage server.

Carl

Stux · May 28, 2017

And if your data is that critical that you need to ensure that tranacstions aren't lost in the event of a sudden system restart/failure combined with failed SLOG, then you really need to have an HA system anyway.

kdragon75 · Oct 10, 2017

Carl Thompson said:
I'll add to this old topic to put things more concisely. If you need a SLOG device it's not just because you need sync writes. ZFS can do sync writes just fine without a SLOG. If you need a SLOG it's because you need to do sync writes more quickly than your pool could on its own.

So the point of mirroring your SLOG devices isn't so that you won't lose data on your ZFS server if the SLOG device dies. The point of mirroring is to ensure that if you need sync write IOPS to be >= X for your storage then even if a one of the component SLOG devices dies sync write IOPS stay >= X. If your storage can run sufficiently without the SLOG device why would you have one in the first place?

So mirrored SLOG devices aren't to protect data on the ZFS server. They're to protect everything else that uses the storage server.

Carl

I know this is an older thread but I was thinking about your statement. If I my design requires a slog and that requirement is satisfied by one SSD (or similar device) and I have the cash for two, ... nevermind, I just checked and it would seem open-zfs only supports one slog device.

My thought was if ZFS could support 2 or more devices, it could stripe the and if one fails, automagicly remove it from the pool of slog devices.

Sorry now I'm a bit off track but with a pools of slogs and multiple pools, one could scale a single ZFS server further and allow for zpools with varying performance targets. I'm curios as to why this has not been implemented. I'm still new to ZFS but this would seem to be not too far fetched. Please enlighten me!

Stux · Oct 10, 2017

Fairly certain ZFS supports mirrored slog.

kdragon75 · Oct 10, 2017

I was thinking of striping not mirroring. This would have the benefit of double the write performance. I understand mirroring for redundancy, you would have to plan to have a stripe member fail and still have aceptable performance. It would be nice if that choice was there for the architect.

toadman · Oct 10, 2017

kdragon75 said:
you would have to plan to have a stripe member fail and still have aceptable performance.

How would you implement that?

Ericloewe · Oct 11, 2017

If part of a stripe fails, performance is the least of your concerns...

kdragon75 · Oct 11, 2017

toadman said:
How would you implement that?

If your refering to the planning, you would plan on a set of SLOG devices fast enough for any single device to provide the minimum needed performance. The idea it to get the redundancy of multiple SLOG devices AND the extra performance when everything is running as expected.

Ericloewe said:
If part of a stripe fails, performance is the least of your concerns...

if your SLOG device fails, that not an issue, except for performance. As Long as ZFS knows the device is bad, my understanding is that it will use the pool for the LOG. If ZFS is managing the stripe of SLOGs and sees one go offline or sees a SMART failure etc. it can drop the failed drive from the stripe and keep using the good one. Much like if you used RAID 0 for two SSDs and used that for a SLOG device but it automagically removed the bad drive and kept working. As noted above the LOG is always in memory anyway. The LOG/SLOG on disk is just incase the power fails/kernel panics/CPU halts/etc. Therefore, if the last 5 seconds of LOG needed to be rewritten from memory to the SLOG due to a detected failed stripe, the risk would be MINIMAL. Keep in mind I'm not talking about a traditional RAID 0 stripe, I'm saying ZFS would manage this and have full controle over the disks. Also I understand for this to work correctly you would be limited to capacity of members/number of members (assuming there all the same size)

I understand this is all theoretical and academic at this point. I'm just trying to test my sanity.

toadman · Oct 11, 2017

kdragon75 said:
If your refering to the planning, you would plan on a set of SLOG devices fast enough for any single device to provide the minimum needed performance. The idea it to get the redundancy of multiple SLOG devices AND the extra performance when everything is running as expected.

No, I meant implementing the case where a piece of a stripe going down and having the SLOG continue with the other pieces. Seems like that would be difficult/impossible.

I suppose the ultimate answer to mirroring or not, or striping, depends on use case, as most answers do. There are a bunch of scenarios where not mirroring can cause data loss. So if it that is a concern, I would continue to mirror. If a bit of data loss on a pool is ok in a failure scenario, go for it, mirror away (with some HW that allows it).

Important Announcement for the TrueNAS Community.

Time to Reevaluate the "Mirror your Slog (ZIL Drive)" recommendation?

mattlach

Patron

aufalien

Patron

cyberjock

Inactive Account

Carl Thompson

Dabbler

Stux

MVP

kdragon75

Wizard

Stux

MVP

kdragon75

Wizard

toadman

Guru

Ericloewe

Server Wrangler

kdragon75

Wizard

toadman

Guru

Similar threads

Important Announcement for the TrueNAS Community.

Time to Reevaluate the "Mirror your Slog (ZIL Drive)" recommendation?

Patron

Patron

Inactive Account

Dabbler

MVP

Wizard

MVP

Wizard

Guru

Server Wrangler

Wizard

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Time to Reevaluate the "Mirror your Slog (ZIL Drive)" recommendation?"

Similar threads