SSD SLOG better than pool always?

Status
Not open for further replies.

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525
I have an old SSD I plan to use for SLOG, it isn't the fastest drive, but its well into SATA III land. Its not 870 Evo or Optane drive, but I figure any SSD based SLOG has to be better than having your SLOG on a RAID Z2 array, correct? I don't strictly speaking think I need it, but theoretically it can't hurt if I understand it right, right?
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
I don't strictly speaking think I need it, but theoretically it can't hurt if I understand it right, right?
If you're not performing any synchronous writes, then a SLOG device is not needed.
 

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525
If you're not performing any synchronous writes, then a SLOG device is not needed.

Correct. The ZIL is only used for sync writes. I guess I am not sure what applications of mine even use sync vs async, but I figured some writes must be sync, and thus it can’t hurt to have. But that said, how do I go about determining what is sync vs async.


Sent from my iPhone using Tapatalk
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Anything typical like SMB shares, jails, etc. will usually perform asynchronous writes. A synchronous write is something like a ESXI datastore connected via NFS.
 

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525
Anything typical like SMB shares, jails, etc. will usually perform asynchronous writes. A synchronous write is something like a ESXI datastore connected via NFS.
I mostly do use SMB, with writes from syncthing jail as well. But I am under ESXi and everything is SMB mounted... So does this negate the need for a fast SLOG totally? Is there a way to see how active the ZIL is...?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Some reliability issues;
  • SLOGs without power loss protection can lead to data loss on un-expected power failure.
  • Un-mirrored SLOGs can also lead to data loss on block or device failure.
Remember, a separate intent log is not just about speed. If you want speed, you can turn off lots of things to tune your pool, (and tune it into a disaster).

That said, for pools that contain only datasets and no Zvols, on SLOG failure during crash recovery, (for either of the 2 causes above), you only loose data in-flight. The pool will not be left in an inconsistant state.

But, if you have VMs inside your Zvols, that's a whole other disaster waiting to happen. Those VMs in Zvols can be left in a state where a full restore, (or return to earlier snap-shot), is required. (Just note my wording, can be... not certainity.)
 

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525
Some reliability issues;
  • SLOGs without power loss protection can lead to data loss on un-expected power failure.
  • Un-mirrored SLOGs can also lead to data loss on block or device failure.
Remember, a separate intent log is not just about speed. If you want speed, you can turn off lots of things to tune your pool, (and tune it into a disaster).

That said, for pools that contain only datasets and no Zvols, on SLOG failure during crash recovery, (for either of the 2 causes above), you only loose data in-flight. The pool will not be left in an inconsistant state.

But, if you have VMs inside your Zvols, that's a whole other disaster waiting to happen. Those VMs in Zvols can be left in a state where a full restore, (or return to earlier snap-shot), is required. (Just note my wording, can be... not certainity.)

Thankfully I have no VM worries. I’m running all VM’s outside freenas under ESXi. Not passing back storage via iscsi either. Thus my data should be pretty safe during a power loss, theoretically.

But I’m also not sure if an ssd SLOG is actually required for my use case. The one I’m planning to use is just sitting around not doing anything tho...


Sent from my iPhone using Tapatalk
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,079
I have an old SSD I plan to use for SLOG, it isn't the fastest drive, but its well into SATA III land. Its not 870 Evo or Optane drive, but I figure any SSD based SLOG has to be better than having your SLOG on a RAID Z2 array, correct? I don't strictly speaking think I need it, but theoretically it can't hurt if I understand it right, right?
You might want to take a look at the results of these tests that I did:

https://forums.freenas.org/index.ph...-partitioned-for-two-pools.62787/#post-483761
 

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
  • Un-mirrored SLOGs can also lead to data loss on block or device failure.
I have been think about it recently. Many argued that a mirrored slog is not warranted because you will need a unclean shutdown and slog failure at the same time before any data losses. Because these two events are unlikely on themselves chances they happen simultaneously is so slow that it become an acceptable risk. While this makes lots of sense, I am wondering if these two are truly independent events? e.g. Is device failure more frequent during power cycle? I know HDD is because the stress on stopping/starting the motor, just not sure about the SSDs.

Edit: power losses——> data losses
 
Last edited:

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@Ender117, It's all about the level of risk. You have to decide. That's why I put the information in. It may generate confusion at times, but I believe that more information is better than less.
 

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
@Ender117, It's all about the level of risk. You have to decide. That's why I put the information in. It may generate confusion at times, but I believe that more information is better than less.
Agreed. But to make a educated decision, I would like to know the interplay between SSD reliability and power cycle (e.g. are they more prone to failure at times of sudden power loss and power on?). Most of the articles seems only talk about how much writes you can put on them. If you can point me to these resources that would be helpful.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@Ender117, what I meant about information is what I put in my post. The actual statistics, I don't have those details.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Failure on power-on is statistically more of an issue for devices that have moving parts (eg: hard drives) but as @Arwen says, it's ultimately a risk acceptance decision.

Using an SLOG with PLP is probably 99.9% safe - maybe better.

But let me ask you this, @Ender117 - have you ever rolled snake eyes (two 1's) or boxcars (two 6's) twice in a row? If the answer is "yes" then congratulations; you successfully got something with a 1 in 1296 chance - less than 0.1%.

So you have to ask - what is the cost of a failed array in terms of time, money, reputation, etc - versus the cost of a second SLOG device?
 

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525
This is all great discussion, but I’m still curious about the original question. I am still not sure if I would actually gain anything from an SSD slog if I chose to accept stated risks. I also haven’t had time to check out Chris Moore’s link, I’ll have to give that a read as well in order to understand the potential performance differences.


Sent from my iPhone using Tapatalk
 

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
This is all great discussion, but I’m still curious about the original question. I am still not sure if I would actually gain anything from an SSD slog if I chose to accept stated risks. I also haven’t had time to check out Chris Moore’s link, I’ll have to give that a read as well in order to understand the potential performance differences.


Sent from my iPhone using Tapatalk
Run zilstat for your pool, if it's close to 0 (I bet yours is) then you won't see benefits of slog
 

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525
Checked our Chris’s post, doesn’t show any data on pool only vs SSD SLOG. But if all of my data transfer is async anyways, guess it doesn’t much matter. How do I determine this?

Or should I just assume since all writes are done only 2 ways, SMB from multiple VM’s/main windows machine, and syncrhing running as a jail, all of my writes must be async and thus an SSD slog will have no affect?


Never mind, answer was given as I typed this. I will check that later todsy, thanks!

Sent from my iPhone using Tapatalk[/s]
 

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
Failure on power-on is statistically more of an issue for devices that have moving parts (eg: hard drives) but as @Arwen says, it's ultimately a risk acceptance decision.

Using an SLOG with PLP is probably 99.9% safe - maybe better.

But let me ask you this, @Ender117 - have you ever rolled snake eyes (two 1's) or boxcars (two 6's) twice in a row? If the answer is "yes" then congratulations; you successfully got something with a 1 in 1296 chance - less than 0.1%.

So you have to ask - what is the cost of a failed array in terms of time, money, reputation, etc - versus the cost of a second SLOG device?
Yeah it's all about statistics and probability, but it's full of things you can overlook here and there:

1. the correlation (or lack thereof) between power failure and SSD failure I mentioned above.
2. You mentioned the dice example, but how often do you roll them? e.g. having an UPS could mean the difference between rolling it once a month and once a year.
3. write endurance of SSDs. If you just mirror them, they will see the same amount of writes and wear at the same rate that may eventually fail at close enough succession and make mirror useless. OTOH you may stripe them and reduce the wear and MAYBE achieve a better reliability for a given period.

I am sure there is a lot more. Ultimately this is case dependent but I feel more through discussion and investigation is justified.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
And some ssds fail by never coming back after a reboot once their write capacity has been exceeded...

So, if this is a problem, have hot swap
Mirrored ssds, and maintain them by replacing them before they reach EOL
 
Status
Not open for further replies.
Top