SLOG m.2 OR HHHL ?

amp88

Explorer
Joined
May 23, 2019
Messages
56
If an SLOG breaks absolutely nothing bad will happen to your data. As long as the system is still running.
The SLOG is only ever (!) read at crash recovery, e.g. an unexpected reboot, power loss, ...

HTH,
Patrick
If that's the case then I apologise for providing incorrect information.

However, I'd still make the case for mirrored SLOG (bear with me!). With a single SLOG device, if it fails FreeNAS will revert to writing its ZIL on your storage pool. In your specific case I believe you're planning on using an all-flash pool, so the performance impact of this would be lower than if you had spinning drives in your pool. However, there would still be a performance impact (there's a reason you're planning on using a super fast SLOG in your system). It's possible this reduction in performance could be significant, especially in an environment with heavy VM usage. You'd also have to replace the missing SLOG device (which I guess would probably arrive on the next business day?), which would require system downtime to swap in a new M.2 PCIe device. You have to consider whether the cost of an extra drive for the SLOG is worth it for your use case.
 

patrickjp93

Dabbler
Joined
Jan 3, 2020
Messages
48
You shouldn't recommend devices without PLP (Power Loss Protection) for SLOG use, as they're inherently unsafe in the case of a sudden power loss or system crash/panic.
Crucial SSDs all have PLP to the same degree non-enterprise Optane drives do. It's useless for in-flight data, but you can't get write corruption for data on the drive. And if it's not for an enterprise deployment with Service Level Agreements, paying 3x more for an E2E PLP solution is generally (but not always) an exorbitasnt waste of money for no gains.
 

patrickjp93

Dabbler
Joined
Jan 3, 2020
Messages
48
If that's the case then I apologise for providing incorrect information.

However, I'd still make the case for mirrored SLOG (bear with me!). With a single SLOG device, if it fails FreeNAS will revert to writing its ZIL on your storage pool. In your specific case I believe you're planning on using an all-flash pool, so the performance impact of this would be lower than if you had spinning drives in your pool. However, there would still be a performance impact (there's a reason you're planning on using a super fast SLOG in your system). It's possible this reduction in performance could be significant, especially in an environment with heavy VM usage. You'd also have to replace the missing SLOG device (which I guess would probably arrive on the next business day?), which would require system downtime to swap in a new M.2 PCIe device. You have to consider whether the cost of an extra drive for the SLOG is worth it for your use case.
There are PCIe riser cards with NVMe M.2 hotplug capability. No downtime necessarily required.
 

amp88

Explorer
Joined
May 23, 2019
Messages
56
Crucial SSDs all have PLP to the same degree non-enterprise Optane drives do. It's useless for in-flight data, but you can't get write corruption for data on the drive. And if it's not for an enterprise deployment with Service Level Agreements, paying 3x more for an E2E PLP solution is generally (but not always) an exorbitasnt waste of money for no gains.
Isn't there a significant performance impact with that implementation (where the host has to wait for the data to be written from the drive's DRAM cache to the NAND before it reports to the host that it's completed)? Especially in comparison to Optane, but even in comparison to an M.2 solution with full PLP?
 

amp88

Explorer
Joined
May 23, 2019
Messages
56
There are PCIe riser cards with NVMe M.2 hotplug capability. No downtime necessarily required.
Does the Supermicro AOC-SLG3-2M2 the OP said they intended to use support that?
 

patrickjp93

Dabbler
Joined
Jan 3, 2020
Messages
48
Isn't there a significant performance impact with that implementation (where the host has to wait for the data to be written from the drive's DRAM cache to the NAND before it reports to the host that it's completed)? Especially in comparison to Optane, but even in comparison to an M.2 solution with full PLP?
Define full PLP. The Optane M.2s don't have the same PLP level that your traditional SAS Enterprise HDDs do. If the Power Supply dies in a native PCIe NVMe deployment, the motherboard has to have PLP, not the drive, to protect data in flight over the bus. That's not true in SAS where the power provision circuit flows both directions, and a capacitor on the drive can power the entire bus too.

And no, the performance impact on Micron's partial PLP solution is a complete myth. If there's data waiting to be written in the DRAM cache in a power loss event, it gets written to the SLC write cache and is safe. It will be written in MLC/TLC form on the next power up.
 

patrickjp93

Dabbler
Joined
Jan 3, 2020
Messages
48
Does the Supermicro AOC-SLG3-2M2 the OP said they intended to use support that?
Nope. https://www.supermicro.com/manuals/other/AOC-SLG3-2M2.pdf. No mention of "hot" and only 2 mentions of "plug" in reference to power.

It's one reason I actively avoid Supermicro and AOC products. Overpriced and lacking basic features.

If you have a 5.25" bay available, or a suitable adapter... https://www.icydock.com/nvme_m2

Otherwise, find a native U.2 port.

Or, wait for M.3 to come out of the labs.
 

amp88

Explorer
Joined
May 23, 2019
Messages
56
Define full PLP. The Optane M.2s don't have the same PLP level that your traditional SAS Enterprise HDDs do. If the Power Supply dies in a native PCIe NVMe deployment, the motherboard has to have PLP, not the drive, to protect data in flight over the bus. That's not true in SAS where the power provision circuit flows both directions, and a capacitor on the drive can power the entire bus too.

And no, the performance impact on Micron's partial PLP solution is a complete myth. If there's data waiting to be written in the DRAM cache in a power loss event, it gets written to the SLC write cache and is safe. It will be written in MLC/TLC form on the next power up.
With regards to Optane, the Serve The Home review has this to say:
One question we get often is around the Intel Optane 905P and 900P power loss protection. Officially, Intel keeps the spec for its highest-end Optane DC P4800X parts. If you have the budget, and your job depends on it, just get the P4800X. If you are on a tight budget, the lower-end Intel Optane drives are great.

The reason for this is that unlike NAND-based write SSDs, the Intel Optane drives to not have large DRAM caches on-device. Without that DRAM, host writes are acknowledged when data is written to the device media. For Optane, this is a direct write, as they do not need buffering through DRAM like NAND SSDs.

I'd say "full PLP" means that once the device has confirmed to the host the data is on the device, a sudden power failure will not result in any lost data on the drive. This means for devices which use a volatile cache/buffer (e.g. the majority of NAND SSDs) they must either have an in-built mechanism to cope with sudden power loss (e.g. capacitors or battery backup), or they must wait until the data is written to a non-volatile storage medium on the drive. Does the "power loss immunity" protection offered on crucial M.2 drives (e.g. the MX500) actually flush data the DRAM cache/buffer to NAND in the event of power loss? I'm not sure that it does, and in a few reviews it doesn't appear that's how it works either. For example, from AnandTech:
Micron's partial power loss protection feature for data at rest is preserved, but implemented in a different fashion; they're now also branding it as "power loss immunity". The impact is still the same: you don't get the full protection that is standard for enterprise SSDs, but data that has already been written to the flash will not be corrupted if the drive loses power while writing a second pass of more data to the same cells.

If that's accurate, then in order for the drive to work well as a SLOG device it must wait for data to be flushed from the DRAM buffer/cache to the NAND before reporting the write is complete to the host. I don't know if it does that.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
It's one reason I actively avoid Supermicro and AOC products. Overpriced and lacking basic features.
Which server vendor would you recommend? We are running about a hundred 1U systems and switched from Fujitsu to Supermicro two years ago. Main reason being that Fujitsu along with all the other "big" brands (HP, Dell, Lenovo ...) won't ship functional empty drive cages with their systems so you are forced to buy disk drives and ssds from them at insane prices (and often ridiculously small capacity).

Thanks,
Patrick
 

patrickjp93

Dabbler
Joined
Jan 3, 2020
Messages
48
With regards to Optane, the Serve The Home review has this to say:


I'd say "full PLP" means that once the device has confirmed to the host the data is on the device, a sudden power failure will not result in any lost data on the drive. This means for devices which use a volatile cache/buffer (e.g. the majority of NAND SSDs) they must either have an in-built mechanism to cope with sudden power loss (e.g. capacitors or battery backup), or they must wait until the data is written to a non-volatile storage medium on the drive. Does the "power loss immunity" protection offered on crucial M.2 drives (e.g. the MX500) actually flush data the DRAM cache/buffer to NAND in the event of power loss? I'm not sure that it does, and in a few reviews it doesn't appear that's how it works either. For example, from AnandTech:


If that's accurate, then in order for the drive to work well as a SLOG device it must wait for data to be flushed from the DRAM buffer/cache to the NAND before reporting the write is complete to the host. I don't know if it does that.
Notice how I specified data in flight isn't protected for either Optane or Crucial? The Power Loss Immunity on the Crucial drives flushes the DRAM to SLC Cache on power failure. You're free to read Crucial's spec on this. The reason it isn't considered true PLP is because the write isn't truly finalized to MLC/TLC form (which depends on the drive). It's a distinction without a difference.

Crucial and Optane have the same level of Power Loss Protection, which is less than any enterprise SAS drive which can maintain power to the entire circuit whereas in NVMe they cannot. In that case, if you are worried about data in flight to an NVMe drive, you need a dedicated PLP motherboard; and those, such as the Asrock Rack board I have in my "Lightweight, Speedy Proposal", cost north of $500 just for an M-ITX.
 

patrickjp93

Dabbler
Joined
Jan 3, 2020
Messages
48
Which server vendor would you recommend? We are running about a hundred 1U systems and switched from Fujitsu to Supermicro two years ago. Main reason being that Fujitsu along with all the other "big" brands (HP, Dell, Lenovo ...) won't ship functional empty drive cages with their systems so you are forced to buy disk drives and ssds from them at insane prices (and often ridiculously small capacity).

Thanks,
Patrick
Asrock Rack hasn't let my company down before deploying x86 systems at reasonable prices (we just use AWS for anything ARM-specific), and the prices have been reasonable. As for functional empty drive cages, that depends on how many drives you want. We've deployed roughly 80 of these pretty much pain free. https://www.45drives.com/products/storinator-xl60-configurations.php

If you can order 50 or more in a batch, they start providing bulk discounts, a benefit from using an up-and-coming vendor in this space.

If you're 100% limited to buying 1U chasis for your storage solutions, all I can do is offer my condolences.
 

amp88

Explorer
Joined
May 23, 2019
Messages
56
Notice how I specified data in flight isn't protected for either Optane or Crucial? The Power Loss Immunity on the Crucial drives flushes the DRAM to SLC Cache on power failure. You're free to read Crucial's spec on this. The reason it isn't considered true PLP is because the write isn't truly finalized to MLC/TLC form (which depends on the drive). It's a distinction without a difference.

Crucial and Optane have the same level of Power Loss Protection, which is less than any enterprise SAS drive which can maintain power to the entire circuit whereas in NVMe they cannot. In that case, if you are worried about data in flight to an NVMe drive, you need a dedicated PLP motherboard; and those, such as the Asrock Rack board I have in my "Lightweight, Speedy Proposal", cost north of $500 just for an M-ITX.
Optane doesn't have "in flight" data, since there's no volatile storage on the drive (i.e. there's no DRAM buffer to flush). That was the point of the quote from the Serve The Home review. If the quote from AnandTech is accurate, Crucial don't flush the DRAM buffer to flash in the event of a power loss. If you have a source from Crucial/Micron which says consumer M.2 drives (e.g the MX500) do in fact flush the DRAM buffer to flash in the event of a power loss, I'd appreciate a link.
 

patrickjp93

Dabbler
Joined
Jan 3, 2020
Messages
48
Optane doesn't have "in flight" data
Yes it does. What do you call data sitting on the wire between the CPU and the PCB of the Optane drive? This goes back to why I mentioned the technological and electrical differences between SAS and NVMe. The data on that wire is in flight. In the implementations of SAS, the drive not only has enough power in the capacitors to ensure data in-cache (DRAM) is written to the disks themselves, along with graceful shutdown and stowing of the spindle and head, but it actually has enough to ensure the data signal from the CPU TO THE DISK is also maintained until THAT data is written to the disk as well. NVMe has no such capability, and the reason for it believe it or not has more to do with hardware security on the PCIe bus than anything else (you can look into Thunderbolt exploits if you really want to go down the rabbit hole).

Optane as far as I know cannot save your in-flight data any more than the Crucial drive can.

Optane without a certified End to End PLP motherboard does not have the capability of saving on-bus in-flight data in a power loss event the way your traditional SAS drives do. End of discussion.

I believe there is provision for this in the NVMe 2.0 standard, but it's under incredible contention right now.

If the quote from AnandTech is accurate,
The Anandtech quote is wrong and I've sent a letter to the author to correct it.

Crucial don't flush the DRAM buffer to flash in the event of a power loss. If you have a source from Crucial/Micron which says consumer M.2 drives (e.g the MX500) do in fact flush the DRAM buffer to flash in the event of a power loss, I'd appreciate a link.
Yes it does.


HoneyBadger has experience with this and has some NDA access to the official documentation which I now lack for position title reasons (yay Strategic Architects?...). The DRAM cache is flushed to SLC cache for a proper rewrite to MLC on the next power up event.
 
Last edited:

dror

Dabbler
Joined
Feb 18, 2019
Messages
43
If that's the case then I apologise for providing incorrect information.

However, I'd still make the case for mirrored SLOG (bear with me!). With a single SLOG device, if it fails FreeNAS will revert to writing its ZIL on your storage pool. In your specific case I believe you're planning on using an all-flash pool, so the performance impact of this would be lower than if you had spinning drives in your pool. However, there would still be a performance impact (there's a reason you're planning on using a super fast SLOG in your system). It's possible this reduction in performance could be significant, especially in an environment with heavy VM usage. You'd also have to replace the missing SLOG device (which I guess would probably arrive on the next business day?), which would require system downtime to swap in a new M.2 PCIe device. You have to consider whether the cost of an extra drive for the SLOG is worth it for your use case.

Thanks for your response.
You're right, I'll have to shut down the server if I want to replace the failed disk.
But I think I still going to mirror them because the data is very important (DB).
I'd rather not but it's the capabilities that a server currently has and I don't think with at least 10DWPD (905p 380GB) it will fail any time soon.
The environment usually consists of web hosting servers that include databases.
I'm not expecting a load like the Giants (Google Cloud, Amazon Cloud etc) :)

I personally love working with Supermicro servers.
I think they are very reliable and have a lot of support.


I think, though, I'll go for the 905P because right now it's the fastest and relatively cheap.
The P4801X looks more reliable and of course they are an enterprise version but slower and insanely expensive!

what do you think ?
 

patrickjp93

Dabbler
Joined
Jan 3, 2020
Messages
48
Thanks for your response.
You're right, I'll have to shut down the server if I want to replace the failed disk.
But I think I still going to mirror them because the data is very important (DB).
I'd rather not but it's the capabilities that a server currently has and I don't think with at least 10DWPD (905p 380GB) it will fail any time soon.
The environment usually consists of web hosting servers that include databases.
I'm not expecting a load like the Giants (Google Cloud, Amazon Cloud etc) :)

I personally love working with Supermicro servers.
I think they are very reliable and have a lot of support.


I think, though, I'll go for the 905P because right now it's the fastest and relatively cheap.
The P4801X looks more reliable and of course they are an enterprise version but slower and insanely expensive!

what do you think ?
I think you've let some of the new and old guard slug it out in your thread with grace and have made a sensible decision. The 905P will serve you well, probably for a ridiculously long time :D
 

amp88

Explorer
Joined
May 23, 2019
Messages
56
Yes it does. What do you call data sitting on the wire between the CPU and the PCB of the Optane drive? This goes back to why I mentioned the technological and electrical differences between SAS and NVMe. The data on that wire is in flight. In the implementations of SAS, the drive not only has enough power in the capacitors to ensure data in-cache (DRAM) is written to the disks themselves, along with graceful shutdown and stowing of the spindle and head, but it actually has enough to ensure the data signal from the CPU TO THE DISK is also maintained until THAT data is written to the disk as well. NVMe has no such capability, and the reason for it believe it or not has more to do with hardware security on the PCIe bus than anything else (you can look into Thunderbolt exploits if you really want to go down the rabbit hole).

Optane as far as I know cannot save your in-flight data any more than the Crucial drive can.

Optane without a certified End to End PLP motherboard does not have the capability of saving on-bus in-flight data in a power loss event the way your traditional SAS drives do. End of discussion.

I believe there is provision for this in the NVMe 2.0 standard, but it's under incredible contention right now.


The Anandtech quote is wrong and I've sent a letter to the author to correct it.


Yes it does.


HoneyBadger has experience with this and has some NDA access to the official documentation which I now lack for position title reasons (yay Strategic Architects?...). The DRAM cache is flushed to SLC cache for a proper rewrite to MLC on the next power up event.
I take your point re: data "in flight" including data that's on the bus on the way to the drive; you are correct. If the Crucial/Micron "power loss immunity" feature does indeed guarantee the capability to flush data from the DRAM buffer to NAND then it would offer the same level of protection of an Optane drive. I'd be interested to see if AnandTech updates the article in the future, as more clarity on what features devices support is always welcome.
 
Top