905P Optane Fusion Drive Mirror Sufficient?

LuxTerra

Dabbler
Joined
Dec 11, 2016
Messages
17
Variations of this topic get posted from time to time, but I didn't find my exact question. Sorry for the long post.

tldr;
1) Intel 905P Optane in only a mirror configuration vs triple mirror as a metadata only Fusion Drive.
2) A single mirror vdev is easy, but I could probably fit a few more as striped mirrors or a pair of triple mirrors. Thoughts on configuration and reliability are appreciated.


I'm considering grabbing some 960GB/U.2 Intel 905P Optanes for a Fusion Drive or L2ARC (metadata only?). These are on sale atm and while not the datacenter grade, I believe they will be sufficient for my use case, unless there's a solid argument against them. My primary interest is making huge directory searches, listings, thumbnails, etc. much faster than the HDD pool can support. I believe one of these two are the best option beyond RAM (which is maxed out on RDIMMs for me and I don't want to pay the premium for used 8x 256GB DIMMS).

I have a server chassis with 45 drives (3 rows of 15 drives, 2x ZRAID2 of 7 drives plus 1 spare per row), EPYC w/512GB RAM, 2x 25GbE bonded. VMs are on a separate stripped mirror pool of SSDs. I have three PCIe slots left and a pair of 4x Oculink internal ports. My metadata runs typically <0.5% of the on zpool data. Almost everything is async, so no SLOG. I also have no L2ARC right now, but have considered it. I have a pair of 2.5" drive slots left, but beyond that, I have to get creative (which is fine if the value is there).

The easy thing to do is to put a pair of 960GB Optanes as a mirror and move my metadata only onto them (ZFS send since I have room on the HDD array to hold two copies temporarily). Conceptually, with a metadata use rate of <0.5%, that would support approximately 200TB (my current capacity). However, this would only be a mirror and it's often recommended to use a triple mirror for metadata. In theory these drives are so much more reliable than HDDs I should be well protected. The typical argument against a mirror is that once one drive fails, you don't have two copies to compare and so any errors propagate forward. However, given the reliability of Optane and ZFSs robust checksums, I assume a mirror is a reasonable risk for my pool configuration?

Alternatively, I could put these as a striped L2ARC (not pool critical) or I could get a bit creative with adding more internal U.2 drive slots, a Broadcom 9600 HBA, and several more 905Ps to get a triple mirror; at that point, it's probably worth going with four or six mirrored drives.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
1) Intel 905P Optane in only a mirror configuration vs triple mirror as a metadata only Fusion Drive.
That depends how paranoid you are… But since a failure of the special vdev would take down the whole pool, I'd be very, very wary here.

I have a server chassis with 45 drives (3 rows of 15 drives, 2x ZRAID2 of 7 drives plus 1 spare per row), EPYC w/512GB RAM, 2x 25GbE bonded. VMs are on a separate stripped mirror pool of SSDs.
Nice setup!

Almost everything is async, so no SLOG. I also have no L2ARC right now, but have considered it.
With 512 GB RAM, it's not obvious what a regular L2ARC would bring.

My primary interest is making huge directory searches, listings, thumbnails, etc. much faster than the HDD pool can support.
However if your purpose is to speed up reads and writes are less important, a persistent metadata L2ARC is an alternative to the special vdev. No need for redundancy here.

Alternatively, I could put these as a striped L2ARC (not pool critical) or I could get a bit creative with adding more internal U.2 drive slots, a Broadcom 9600 HBA, and several more 905Ps to get a triple mirror; at that point, it's probably worth going with four or six mirrored drives.
Not sure what you're aiming for here? An Optane-only pool to break speed records?
 

LuxTerra

Dabbler
Joined
Dec 11, 2016
Messages
17
That depends how paranoid you are… But since a failure of the special vdev would take down the whole pool, I'd be very, very wary here.
Yes, that's a bit of my worry. Would be a good reason to consider a L2ARC as metadata only option. I think there's some scrips out there to touch each file. That would populate the ARC/L2ARC with metadata, but wouldn't be required for each boot given persistent L2ARC. This isn't a common solution AFAIK?

Nice setup!
Thanks. I find that at the end of a socket/memory technology is a great time to buy a mix of new and used parts and build a great home server that will last. This is a recent upgrade; my previous was a DDR3 Xeon for the same reasons. Just have to be careful to buy from reputable parts sources. Also, there will tend to be used parts for another 3+ years as DDR4 servers that were new last year get decommissioned. e.g. I should be able to get 256GB LRDIMMs in a few years for relatively cheap. That's at least how I've cost optimized my home lab. HDDs get upgraded once or twice between server builds.

Not sure what you're aiming for here? An Optane-only pool to break speed records?
Sorry if that wasn't clear. The older Optane SSDs like the 905P that are sale max out at 1.5TB, but those tend to not be as good cost per GB as the <1TB SSDs. Given my ~0.5% metadata use today, a mirrored vdev of 960GB 905Ps would only support a ~200TB array and that's roughly what I have today. Thus, any upgrade of HDD capacity beyond that would require more U.2 SSDs to stripe the Fusion vdevs.

Since I need to at least mirror them for reliability, without getting creative, I only have two 2.5" slots left which I'd run off the unused dual 4x Oculink ports. Beyond that, I'd need a PCIe tri-mode HBA to get more PCIe lanes for more U.2 SSDs beyond. I could go cheaper and use a PCIe reclocker card to get to a U.2 of some sort. The goal here would be to support larger Fusion Drive capacity with these "cheap" 905P SSDs by stripping several mirrors or triple mirrors. e.g. a triple stripe of the SSDs would give ~3TB and support arrays up to ~600TB giving me more headroom. Optane for QD1 performance and latency, not for bandwidth; stripping for capacity.

However if your purpose is to speed up reads and writes are less important, a persistent metadata L2ARC is an alternative to the special vdev. No need for redundancy here.
Yes, this is the crux of my question I guess. Persistent metatdata L2ARC vs metadata special vdev using Optane 905P. My worst use cases tend to be, write large quantities of data (1-10 TB) and then read multiple as I process it down to a much smaller final product. However, since those many TB can consist of a large number of files that need sorted, pre-processed, etc. and they greatly exceed any quantity of RAM I can afford, getting the small random I/O off the HDD and onto SSDs makes sense.

I'm now thinking persistent metadata L2ARC makes more sense?

Edit: my worst case workflows are what I describe above and it's a scientific/video/photo hobby. Everything else is more or less irrelevant wrt hardware capability. I'll typically spin up a Linux VM to pre-process some of the data and take advantage of the 512GB RAM and then finish up on my 5950X w/128GB RAM.
 
Last edited:

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Based on what I have read above - the L2ARC, metadata only makes a lot of sense. Its not pool critical and doesn't require lot of space. Note that setting up a special vdev will only contain metadata by default. It won't contain small files until you tell it to contain small files, and then somehow populate it.

L2ARC (metadata only) caches metadata, so presumably (and I don't know) will contain all the metadata
 

LuxTerra

Dabbler
Joined
Dec 11, 2016
Messages
17
Based on the discussion, I'm leaning towards using the two oculink ports to stripe two 960GB 905P Optanes for L2ARC (probably metadata only). However, here's a cheaper (no Broadcom HBA required) creative solution to get a special vdev setup working.

Use the two Oculink ports and one of these four port PCIe 16x to SFF-8643 adapters (Amazon Link - $22) since my motherboard supports full 4x 4lane PCIe Bifurcation (EPYC FTW) and then put 6x Optanes in a triple mirror stripe resulting in 2TB of highly redundant metadata storage. I think I can solve the 2.5" U.2 format mounting with a simple internal bracket (this is home lab, not datacenter, and there's room in my 4U chassis to do so). Now, this is far more expensive (effectively 3x for six Optanes instead of two) and becomes pool critical, but opens up small block if I desire. I'd put two drives per triple mirror on the PCIe adapter and one each on the pair of Oculinks to avoid a single point of failure.

Besides cost, for my configuration above, any thoughts on a stripe pair for 2TB of L2ARC (probably metadata only as my worst case working sets exceed RAM + L2ARC capacity) vs a stripe triple mirror of 2TB special vdev?

Edit: conceptually, I'd mount something like this in an empty space internal to the 4U chassis to get the needed 2.5" slots (IcyDock Link). There's space and airflow isn't an issue.
 
Last edited:

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
This is way way waaaaaay overkill for a home lab - I thought my setup was overkill (it is)
L2ARC Metadata will fit on a single Optane (many times over) and it doesn't matter if the drive dies - the pool will still work. Metadata is a few GB
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Persistent L2ARC only speeds up reads but is safe, requires a single drive and is reversible.
Special vdev speeds up reads and writes but is pool-critical, requires redundancy and, because the main storage is on raidz2, is not reversible. If the pool is to be moved to another system (or if the motherboard is replaced), the new system will need enough connectivity for the many NVMe devices.
 
Top