SLOG Device

Status
Not open for further replies.

hoboville

Dabbler
Joined
May 4, 2013
Messages
14
Hello, I've read several of jgreco's posts regarding SLOG devices for sync writes. If the SLOG device fails, is ZFS smart enough to know to start doing ZIL writes back to the pool or is there a danger of data loss and corruption?

People talk of mirroring ZIL devices, are they referring to mirrored SLOG devices?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The requirement for mirrored SLOG is historical; with older ZFS pool versions, loss of SLOG == loss of pool so there was a little incentive there to worry.

That is not supposed to be a problem anymore, and, yes, it is supposed to revert to using in-pool ZIL writes.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
But, with newer versions if your ZIL fails and you actually need it due to an unplanned shutdown(which is basically the whole reason why you do a ZIL and not just set sync=disabled and play the game of chance) then you'll have to accept the dataloss and force the pool to become consistent with the data lost.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Incorrect, or, mostly incorrect.

If your SLOG device fails and then the system fails quickly thereafter, yes, data loss. But remember the SLOG is only needed to roll out sync transactions for at most two (IIRC) transaction groups. So that's a very small window.

If you have SLOG fail, then a minute passes, and the system's writing ZIL updates to the pool now, then you are protected. It should only be that little window that is a problem.

By way of comparison sync=disabled is almost always guaranteed to lose you some data for the obvious reasons.

I will note I haven't actually tested fallback of SLOG to in-pool ZIL.
 

diehard

Contributor
Joined
Mar 21, 2013
Messages
162
I don't mean to thread hijack at all (its basically the same topic) but is sync really required on a system with redundant backplanes, controllers, PDU's, power supplies and backup generator power? If you are essentially 99.9999% sure it won't lose power?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
If you don't care about your data, then why don't you just plug your filer into the wall?

What happens if you have a system panic? Or someone hits RESET?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Actually, what has happened to a few users(which is where I got some experience with this) is that you will lose data if you don't have a redundant ZIL and have an unexpected loss of power. When power is restored if your ZIL doesn't exist the pool will refuse to mount. It'll require you to do some parameter that basically kicks the missing/corrupt ZIL out of the pool, remove the incomplete transaction data, and continue on its way. So you'll lose the data in the ZIL when the power loss/kernel panic occurs. That' not usually much data, but.. read below.

I don't mean to thread hijack at all (its basically the same topic) but is sync really required on a system with redundant backplanes, controllers, PDU's, power supplies and backup generator power? If you are essentially 99.9999% sure it won't lose power?

That's for you to decide. There's no right or wrong answer, except for that 0.0001% chance that you'll still lose power and therefore lose data.

Imagine if your bank transferred $1000 to Apple for a computer you had just purchased, but because Apple thought that the 0.00001% chance was zero and decided to disable all sync and has no record of your purchase, but your bank gave Apple the money and Apple has your money but they have no way of proving what you ordered, if you got your order etc. You'd be a little pissed that you are now out $1000 because you can't prove anything.

ZFS is about having complete trust that the data either IS there or ISN'T there. There is no "might". When you do sync=disabled, you create a huge "might". It's your choice whether you like that "might" or not.

But think about this.. didn't you go to ZFS because you wanted faith that your data wasn't being corrupted? So why would you go to ZFS for that kind of faith, then deliberately destroy that faith? Sounds kind of stupid to go with ZFS, but then to disable sync. Either you care about your data(which is why you went to ZFS in the first place) or you can't figure out what your priorities are(in which case you need to walk away from the keyboard and mouse and get your priorities straight first).
 

diehard

Contributor
Joined
Mar 21, 2013
Messages
162
I think if you have people hitting reset buttons on servers in your datacenter you probable have larger problems lol

Have you seen a system panic on freenas with server grade hardware? (genuine question)

Absolutely Cyberjock, just some "what if" questions ... not challenging anything , just curious. I just think there is a higher chance of say .. two drive failing in a raidz2 or even two drives in the same vdev in a raid10 than said system losing power.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Actually, what has happened to a few users(which is where I got some experience with this) is that you will lose data if you don't have a redundant ZIL and have an unexpected loss of power.

And for the TL;DR'ers, I will point out that I am very strategic about use of specific phrases like "supposed to." This is something that you should validate during testing.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Have you seen a system panic on freenas with server grade hardware? (genuine question)

Yes. Yesterday. HP MicroServer N36L. Upgraded from 8.3.1 to 9.2.1.2 and it panicked on pool import. Had to roll it back actually.

We could argue of course that the N36L isn't server grade or that a panic during startup ought not count. But you asked. :smile:
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I paniced my FreeNAS box twice last year because I ran 2 commands from the CLI and had a single character out of place in the parameters I was entering. So it does happen, no matter how much you try to engineer it out of the equation. That's also the only 2 panics I've ever had on my box. :)
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Actually, what has happened to a few users(which is where I got some experience with this) is that you will lose data if you don't have a redundant ZIL and have an unexpected loss of power. When power is restored if your ZIL doesn't exist the pool will refuse to mount. It'll require you to do some parameter that basically kicks the missing/corrupt ZIL out of the pool, remove the incomplete transaction data, and continue on its way. So you'll lose the data in the ZIL when the power loss/kernel panic occurs. That' not usually much data, but.. read below.

This is how most people find out their SSD is dead - it won't be recognized on boot, or after a power outage. On a consumer desktop where it maybe held your OS, some applications, a game or two - you gnash your teeth and wail, then replace and reinstall. When that's your ZIL, you might be in for a little bit more hurt depending on how much data was in flux. That all depends on how hard you were hitting that array and how critical those writes were.

Mirrored SLOG is really about strengthening a link in the chain. Regular SLOG with UPS is very resilient. Mirrored SLOG just makes it that much stronger. Neither is weak, but if you can be "99.99% safe" or "99.999% safe" ... well, to some people that extra "9" is worth the second SLOG.
 
Status
Not open for further replies.
Top