Time to Reevaluate the "Mirror your Slog (ZIL Drive)" recommendation?

Status
Not open for further replies.

mattlach

Patron
Joined
Oct 14, 2012
Messages
280
Seemingly the rest of the ZFS community outside of FreeNAS has done just this.

Back in the v15 pool days, it was CERTAINLY a good idea to mirror your SLOG, as if you lost it your pool was garbage.

These days this is no longer the case, but for some reason FreeNAS still recommend this as best practice. The question is, is it time to reevaluate this recommendation?

TO support this argument, let's go over again how the ZIL (ZFS Intent Log) works:

First off, the SLOG is NOT a cache device and thus the ZIL has nothing to do with cache. It has no cache function what so ever.

The ZIL is never read from during normal use. (and it is never read from or written to for async writes)

If a SLOG is present, the system ZIL is placed on it.

During async writes, ZFS will accept data for a write to RAM, and immediately report back (lie) to the writing device that the data has been committed to disk, even though it hasn't, and a power outage could still cause data loss.

This is why sync writes are recommended for important data. With the default FreeNAS implementation, the ZIL resides on the regular pool. When a sync write is received, it writes an fast log write to the ZIL, then reports back that data has been committed to disk. This is a duplicate copy, the data still resides in RAM and is written to the main pool the next write cycle. Once the data has been written to the main pool, the ZIL data is no longer necessary, and is thus discarded.

The ZIL is never read from UNLESS there is a power outage or other interruption before the data is written to the main pool. Then the ZIL data is read the next time the pool is mounted, and the data in the ZIL is recreated and written to the pool.

When you have a SLOG (a separate log device) it still works the same, but the ZIL is then moved to the separate device, which is hopefully faster than your spinning disks in your pool. With a fast enough SLOG, your sync writes can start to approximate async writes, because as soon as the data has been committed to the ZIL, the system can continue.

So, how would one lose data if a SLOG is lost?

Well, the SLOG would have to die. But just the SLOG dying is not enough. (remember that the data is still in RAM, and written to the pool from there). In addition to the SLOG dying, you would also need to lose power (or otherwise freeze the system) in the second or so before the next write cycle from RAM completes.

How likely is this anyway?

On a stable system with a UPS, I would consider it VERY unlikely of this to happen.

If your FreeNAS server is air starved, and likely to overheat or you have bad non-ECC RAM or you don't have a UPS, that likelihood goes up, but you'd still have to have the SLOG die and the system hang/reset almost at exactly the same time.

I'm starting to think that for a system with ECC RAM, that is well cooled and has a UPS, there is absolutely no reason at all to mirror your slog. Save the $200 bucks you'd spend on an Intel s3700 SLOG device and take your significant other out to a nice dinner instead. :p

Thoughts? (I'm sure Cyberjock will have some)
 

aufalien

Patron
Joined
Jul 25, 2013
Messages
374
Yes, I haven't mirrored my ZIL in well, since I've been rolling out several ZFS servers. I do have ECC and rather beefy UPS with the ability to shut down systems gracefully upon power cut. But my UPSs have an hour run time and SMS pretty much wakes me from a deep slumber, as does my team.

I've tried to corrupt my pools by simply yanking power chords during writes etc... to no avail, the pool is always fine. However a note about saving $200, its $500ish for me but I digress.

Also, Nexenta systems have a single ZeusRAM for SLOG on there systems as well. One can argue, "hey is a $1K device so it better never go bad" but it all fails, just a matter of when. I've seen cheap stuff outlast expensive stuff and visa versa.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
First, good writeup.

Second, we've been around this before... like 2 years ago.

Everything you said is completely valid and not inaccurate at all. The statistical chances are low.

But at the end of the day you can ask yourself this very simplified question:

If you are putting redundancy on your data, isn't your slog considered 'your data'. If so, don't you want redundancy?

This is no different than arguing ECC vs non-ECC. Or RAIDZ1 versus RAIDZ2. There's tradeoffs with what choices you make. The "future you" may not be so happy if you decide to do 10 disks in a stripe. You're literally setting your "fate" level (if you believe in fate) based on how you configure your system.

The biggest problem I'm seeing with SSDs on slogs is the fact that they seem to have a propensity to work just fine and appear to be healthy, until they are power cycled. Then they are never detected again.

You also should consider the people that benefit from the slog the most. They typically have the largest systems and are using critical applications (usually VMs) from their FreeNAS box. If you are already dropping that kind of cash for L2ARCs, 128+GB of RAM, etc adding a second slog is not adding excessive cost.

There are some people that have tried to argue about mirroring L2ARCs. The basis: If you lose an L2ARC your pool's performance will tank. This will hurt VM performance and some VMs may end up offlined as a result.

The reality is that this is a discussion in which there is no 'yes/no' answer. You and I can discuss everything until we're blue in the face. It probably won't change either of our minds.

At the end of the day, it's your data and your risk. If you feel the money on the extra L2ARC or slog is better off in your pocket, then go for it. You'll be the one that misses you data if things go bad.
 

Carl Thompson

Dabbler
Joined
May 22, 2017
Messages
15
I'll add to this old topic to put things more concisely. If you need a SLOG device it's not just because you need sync writes. ZFS can do sync writes just fine without a SLOG. If you need a SLOG it's because you need to do sync writes more quickly than your pool could on its own.

So the point of mirroring your SLOG devices isn't so that you won't lose data on your ZFS server if the SLOG device dies. The point of mirroring is to ensure that if you need sync write IOPS to be >= X for your storage then even if a one of the component SLOG devices dies sync write IOPS stay >= X. If your storage can run sufficiently without the SLOG device why would you have one in the first place?

So mirrored SLOG devices aren't to protect data on the ZFS server. They're to protect everything else that uses the storage server.

Carl
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
And if your data is that critical that you need to ensure that tranacstions aren't lost in the event of a sudden system restart/failure combined with failed SLOG, then you really need to have an HA system anyway.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
I'll add to this old topic to put things more concisely. If you need a SLOG device it's not just because you need sync writes. ZFS can do sync writes just fine without a SLOG. If you need a SLOG it's because you need to do sync writes more quickly than your pool could on its own.

So the point of mirroring your SLOG devices isn't so that you won't lose data on your ZFS server if the SLOG device dies. The point of mirroring is to ensure that if you need sync write IOPS to be >= X for your storage then even if a one of the component SLOG devices dies sync write IOPS stay >= X. If your storage can run sufficiently without the SLOG device why would you have one in the first place?

So mirrored SLOG devices aren't to protect data on the ZFS server. They're to protect everything else that uses the storage server.

Carl
I know this is an older thread but I was thinking about your statement. If I my design requires a slog and that requirement is satisfied by one SSD (or similar device) and I have the cash for two, ... nevermind, I just checked and it would seem open-zfs only supports one slog device.

My thought was if ZFS could support 2 or more devices, it could stripe the and if one fails, automagicly remove it from the pool of slog devices.

Sorry now I'm a bit off track but with a pools of slogs and multiple pools, one could scale a single ZFS server further and allow for zpools with varying performance targets. I'm curios as to why this has not been implemented. I'm still new to ZFS but this would seem to be not too far fetched. Please enlighten me!
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Fairly certain ZFS supports mirrored slog.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
I was thinking of striping not mirroring. This would have the benefit of double the write performance. I understand mirroring for redundancy, you would have to plan to have a stripe member fail and still have aceptable performance. It would be nice if that choice was there for the architect.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
If part of a stripe fails, performance is the least of your concerns...
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
How would you implement that?
If your refering to the planning, you would plan on a set of SLOG devices fast enough for any single device to provide the minimum needed performance. The idea it to get the redundancy of multiple SLOG devices AND the extra performance when everything is running as expected.

If part of a stripe fails, performance is the least of your concerns...
if your SLOG device fails, that not an issue, except for performance. As Long as ZFS knows the device is bad, my understanding is that it will use the pool for the LOG. If ZFS is managing the stripe of SLOGs and sees one go offline or sees a SMART failure etc. it can drop the failed drive from the stripe and keep using the good one. Much like if you used RAID 0 for two SSDs and used that for a SLOG device but it automagically removed the bad drive and kept working. As noted above the LOG is always in memory anyway. The LOG/SLOG on disk is just incase the power fails/kernel panics/CPU halts/etc. Therefore, if the last 5 seconds of LOG needed to be rewritten from memory to the SLOG due to a detected failed stripe, the risk would be MINIMAL. Keep in mind I'm not talking about a traditional RAID 0 stripe, I'm saying ZFS would manage this and have full controle over the disks. Also I understand for this to work correctly you would be limited to capacity of members/number of members (assuming there all the same size)

I understand this is all theoretical and academic at this point. I'm just trying to test my sanity.
 

toadman

Guru
Joined
Jun 4, 2013
Messages
619
If your refering to the planning, you would plan on a set of SLOG devices fast enough for any single device to provide the minimum needed performance. The idea it to get the redundancy of multiple SLOG devices AND the extra performance when everything is running as expected.

No, I meant implementing the case where a piece of a stripe going down and having the SLOG continue with the other pieces. Seems like that would be difficult/impossible.

I suppose the ultimate answer to mirroring or not, or striping, depends on use case, as most answers do. There are a bunch of scenarios where not mirroring can cause data loss. So if it that is a concern, I would continue to mirror. If a bit of data loss on a pool is ok in a failure scenario, go for it, mirror away (with some HW that allows it).
 
Status
Not open for further replies.
Top