NVMe SLOG without PLP (power loss protection) - mitigation?

IOSonic

Explorer
Joined
Apr 26, 2020
Messages
54
Hello,

I have a single NVME port available on my mobo, and I want to use it as SLOG for a disk pool. As I see it my options are:

1. Enterprise NVME drive with powerless protection. No-go...too expensive.

2. Consumer, high-performance NVME drive. Contains DRAM cache which may not be flushed to disk following a power-loss event, i.e., lost data & corrupted pools. No-go.

Now, how about this:

3. Consumer, low performance NVME drive. I would pick a drive like the WD Blue that does not have a DRAM cache or use host memory for cache (Host Memory Buffer), eliminating the chance that data would be lost in a disk buffer during a power loss event. Such a drive would be more than fast enough to support the needs of my mechanical drive pool, and it's so cheap that I don't care about wearing it out with SLOG writes.

Can someone do a sanity check on my reasoning here?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Ask yourself first if you have the right pool design, since the SLOG only really helps with a well designed pool (usually a bunch of pairs of mirrors) and in the case where you do lots of sync writes, otherwise, you're wasting your time and money in trying to make something faster in a way that won't help.

Clearly you know enough to think about the power loss situation, so hopefully you've done your research to the point that you know SLOG will help... in the case you put forward, you're effectively saying you want to put money above data security, which is fine if you understand what you're doing.

You can't entirely avoid undesired outcomes in the case of unexpected power loss with your design, but it seems like the best compromise.

More than 30GB of SLOG is wasted, so consider under-provisioning the drive to increase its lifespan.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
SLOG can only accelerate a sync write workload, so if you don't have one, it's pointless to add one.

Depending on your performance and endurance requirements, you might find that there are economical SLOG options within your reach.

Regarding devices without power-loss-protection for data in cache - they are still safe SLOGs, but they won't be fast.
 

IOSonic

Explorer
Joined
Apr 26, 2020
Messages
54
Hey there,

First and foremost, thank you for weighing in on this. I am going to call WD tomorrow to confirm that there are no volatile storage areas of this particular drive, and I'll report back for anyone wondering the same.

since the SLOG only really helps with a well designed pool (usually a bunch of pairs of mirrors) and in the case where you do lots of sync writes

Yes, my spinning disk pools will be RAIDZ1 & RAIDZ2 pools made up of 8 disks, so I am already expecting some write penalty. They will also be hosting NFS for VMs, a DB that will grow with time, and FTP for backup targets that I eventually expect to be used by multiple people, so lots of sync writes there. My thinking is that SLOG will alleviate this.

you're effectively saying you want to put money above data security, which is fine if you understand what you're doing.s

Yes sir. However, I don't want to subject it to unnecessary risk, given my constraints. Assuming that my understanding of this particular NVME drive design is correct (again, I will be confirming with WD that there is no volatile RAM in use for it), my understanding is that the only data-loss potential comes in the event of needing to replay transactions from the SLOG following a power event/crash, and then having your SLOG die before that can happen. In my mind, the chances of that happening are so small, that I am willing to risk it. If I have failed to consider something, please let me know.

More than 30GB of SLOG is wasted, so consider under-provisioning the drive to increase its lifespan.

An excellent idea. I will actually carve up a few small partitions for use as L2ARC and SLOG, respectively.

Regarding devices without power-loss-protection for data in cache - they are still safe SLOGs, but they won't be fast.

For my education, are you saying ZFS is smart enough to wait for acknowledgment of a write for drives that lack PLP? In the case of a drive without a DRAM cache like this WD blue, that's exactly what I'm wanting it to do. I realize a drive with some sort of power-protected DRAM cache would be waaay faster--this WD blue is a dog compared to many other NVME drives. However, my thinking was that it represented such a massive improvement over the spinny disks, and the cost was so cheap, it was a worthwhile improvement.

If I'm right that it uses no volatile disk caching mechanism at all (I'm pretty sure it just uses a bit of SLC for cache), for my purposes it could make things faster without introducing addtional risk. Let me know what you think. I'll get back to you about that drive.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Even if WD support won't, I can tell you the WD Blue NVMe (SN550) is a DRAMless TLC device (Edit: Recent reports indicate they've pulled another bait-and-switch to QLC, check your drives!) that uses spare area as pseudo-SLC.

Yes, my spinning disk pools will be RAIDZ1 & RAIDZ2 pools made up of 8 disks, so I am already expecting some write penalty. They will also be hosting NFS for VMs, a DB that will grow with time, and FTP for backup targets that I eventually expect to be used by multiple people, so lots of sync writes there. My thinking is that SLOG will alleviate this.

RAIDZ doesn't handle heavy random workloads like that very well - an SLOG will act as a band-aid, but if your back-end vdevs have difficulty sustaining the necessary IOPS then ultimately you'll feel the hurt. I'd strongly recommend a pool of mirrors for your VMs and DB - your backup target can be RAIDZ though, and strictly speaking doesn't need to have sync enabled at all.

Yes sir. However, I don't want to subject it to unnecessary risk, given my constraints. Assuming that my understanding of this particular NVME drive design is correct (again, I will be confirming with WD that there is no volatile RAM in use for it), my understanding is that the only data-loss potential comes in the event of needing to replay transactions from the SLOG following a power event/crash, and then having your SLOG die before that can happen. In my mind, the chances of that happening are so small, that I am willing to risk it. If I have failed to consider something, please let me know.

Your analysis is correct here - frequent backups will also help alleviate this.

An excellent idea. I will actually carve up a few small partitions for use as L2ARC and SLOG, respectively.

Underprovision for longevity, but don't share a single device between L2ARC and SLOG. The only device type that can handle that well is Optane, which is overkill for L2ARC purposes anyways.

For my education, are you saying ZFS is smart enough to wait for acknowledgment of a write for drives that lack PLP? In the case of a drive without a DRAM cache like this WD blue, that's exactly what I'm wanting it to do. I realize a drive with some sort of power-protected DRAM cache would be waaay faster--this WD blue is a dog compared to many other NVME drives. However, my thinking was that it represented such a massive improvement over the spinny disks, and the cost was so cheap, it was a worthwhile improvement.

If I'm right that it uses no volatile disk caching mechanism at all (I'm pretty sure it just uses a bit of SLC for cache), for my purposes it could make things faster without introducing addtional risk. Let me know what you think. I'll get back to you about that drive.

ZFS is always waiting for that "write acknowledgement" regardless of the device - a fully PLP-enabled device is simply able to respond faster, because its own internal firmware says "The data is in RAM, but I have sufficient capacitor/etc devices to flush my RAM contents to NAND."

The questions to ask yourself are

1. What recordsize of sync writes are you pushing? (Sort of answered above with NFS for VMs and DBs - mostly 4K/8K for the VMs, DB can vary wildly depending on the DB used and operation)

2. How fast do you need those sync writes to go? Are you expecting 1Gbps (~100MB/s) or 10Gbps (~1000MB/s)

3. How much data (in terms of GB/day) are you planning to sync-write to the unit? (A couple GB per day? Easy. Multiple TBs? Harder. This lines up with "your backups don't necessarily have to be done sync.")

That will determine what kind of device you'll need.

But I'd suggest having a look at the Optane M10 memory cards. The 16GB one will handle gigabit speed writes, but is only rated for about 365TBW (terabytes written) total lifespan. With that said, you can also buy them surplus online for about $20 at most. So feel free to burn through them like popcorn if you can handle downtime. ;)
 
Last edited:

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
Top