HDD's or SSD's for 24-bay 2.5", and which models?

Phlesher

Dabbler
Joined
Jan 9, 2022
Messages
16
I have a 24-bay (2.5") SuperMicro chassis that I'm about to put to good use. The question I'm hitting as I'm about to pull the trigger on storage is what the recommended path is as of 2022 for filling this thing up.

My understanding on general guidelines is: 5400 RPM spinners are still the standard, while SSD's have gotten very competitive. With the 2.5" limitation, SSD's are easier for me to find, but I want the highest capacity at reasonable cost.

Is the recommendation for 2.5" to still try to get spinners at maximum capacity, and if so, which specific brands/models are considered best of breed in that form factor? I see the general recommendations for WD Red Plus (or should I be looking at Pro or Ultrastar?), but the 2.5" options appear to be kind of small.

Or, given the form factor, should I be opting for pure SSD's -- and then if so, again, which brands/models are considered best of breed?

Thank you for any pointers!
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
2.5" is basically SSD territory these days.

High-RPM HDDs are hot, expensive, small and not really fast by any modern standard.
Laptop-style HDDs are limited at 1 TB, all the 2 TB stuff is SMR.

So, that leaves SSDs. SATA or NVMe, depending on what your chassis supports.
 

Phlesher

Dabbler
Joined
Jan 9, 2022
Messages
16
That helps narrow it down for sure! So I'll go SSD SATA.

How much do I need to concern myself with brand/model on these? Is consumer stuff going to be garbage, or good enough with appropriate spares? Do I need to look to enterprise-level stuff? Any pointers on the best $/TB (and reliability) would be appreciated. :)
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
SSD's -- ... which brands/models are considered best of breed?
Best is subjective depending on what you're after... capacity, IOPS, endurance....

Intel has models that would win most endurance contests.

Samsung has fast options with OK endurance

For capacity only (without much change to data in the pool... hence OK to think about low endurance), you could even consider QVO, which have OK IOPS/performance and great capacity for the cost.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
We run an entire DC full of "prosumer" Samsung 860 PRO. The "PRO" is important, here. Quite satisfied, no lost drive so far. A TBW of 1200 per TB capacity isn't too shabby. I'd stay away from everything that has EVO/QVO written on it for DC use.
 

Phlesher

Dabbler
Joined
Jan 9, 2022
Messages
16
@sretalla -- Thanks for the reply. I am mostly concerned with capacity, within reason. A failed drive every six months is not what I want, but this is also not something that has to be untouched for 5+ years. And realistically, the kinds of write loads are not going to be dramatic -- there will be a lot of long-term storage (write-once, read-often-especially-for-taking-backup-snapshots). I may also decide to use this pool for VM storage too, though, in which case those loads would be significantly more write-heavy. But not crazy HPC kinds of things going on. So let's say, "capacity first, performance/endurance tied for second." If performance has to be increased, especially for writing, I assume I can add a SLOG.

@Patrick M. Hausen thanks for the advice on the 860 PRO -- I'll take a look at those.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
We run an entire DC full of "prosumer" Samsung 860 PRO. The "PRO" is important, here

Well more than a hundred 860/870 EVO's deployed here. I deploy these things in RAID1 but the Samsung EVO's basically never fail (maybe 1%?).

On the other hand, out of a small set of 980 PRO's, I just had one that corrupted a few VM's. It had seen nearly zero usage and I just had some test VM builds on it in the last few weeks.

The success you will enjoy with consumer/prosumer grade SSD's depends highly on proper characterization of your workload, redundancy/backup coverage, etc.

More discussion at

 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Isn't the 980 PRO an entirely new architecture with rather aggressive cacheing and a slower "mass storage" device behind that? I remember reading a piece on Ars or StH and deciding those were not for me. And yes, I have a bunch of 970 EVO Plus, too.
 

Phlesher

Dabbler
Joined
Jan 9, 2022
Messages
16
Let's assume I can do a 2 TB SSD as the base storage layer. I have 24 front-accessible bays on the primary controller, and 2 back-accessible bays attached to SATA ports on the motherboard most likely.

I believe the best balance of capacity, reliability, and performance is a RAID-Z2, so I'm looking at either 6-drive or 10-drive vdevs for the primary storage. For the best granularity, I'm considering the 6-drive option. Assuming that, my initial setup would be:
  • 24 front bays:
    • 6 bays -- 4 x 2 TB (with 2 parity) = 8 TB in vdev 1
    • 12 bays -- open for additional vdevs 2 & 3
    • 1 bay -- automatic failover drive
    • 5 bays -- future failover drives or other special drives
  • 2 rear bays -- mirror vdev for boot
  • Optional: add SLOG mirror in PCI-E (i.e., dual Optanes? or is a single Optane with two drives recommended?)
Is there an obvious better setup for me to start with? E.g., is it more intelligent to plan for 10-drive vdevs (x2) and get the higher capacity with less granularity, thus leaving 4 bays for hot swap?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
If you are planning to do VM storage, RAIDZ2 is a poor choice.

 

Phlesher

Dabbler
Joined
Jan 9, 2022
Messages
16
If you are planning to do VM storage, RAIDZ2 is a poor choice.


Hmm -- makes sense. Am I going to be forced into a multi-pool scenario, then? E.g., 3x 6-disk vdevs for bulk storage in pool 1, a 3-way mirror for VM's in pool 2, 3 bays left over for hot spares?
 

Phlesher

Dabbler
Joined
Jan 9, 2022
Messages
16
Or, maybe better: 2x 10-disk vdevs for pool 1, 3-way mirror for pool 2, 1 bay for a hot spare (and keep another drive or two on hand, but not plugged in)?
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Hmm -- makes sense. Am I going to be forced into a multi-pool scenario, then?
If you want capacity storage and VM/block storage, yes. SSD are more reliable than HDD, so 2-way mirrors would be fine for your VM, with the number of vdevs depending on your need for IOPS. For storage, take raidz1 as the base unit.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Isn't the 980 PRO an entirely new architecture with rather aggressive cacheing and a slower "mass storage" device behind that? I remember reading a piece on Ars or StH and deciding those were not for me. And yes, I have a bunch of 970 EVO Plus, too.
The 980 Pro would've been called the 980 Evo if Samsung had kept the naming consistent. TLC instead of MLC. Otherwise, they're pretty similar.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The 980 Pro would've been called the 980 Evo if Samsung had kept the naming consistent. TLC instead of MLC. Otherwise, they're pretty similar.

It's based on a new PCIe 4 controller, the Samsung Elpis. There are some so-called "pseudo-SLC" hacks but yes it's a TLC device.

I have a bunch of 970 EVO Plus, too.

It's worth noting that the brouhaha about 970 EVO Plus involved swapping from the original Phoenix controller to an Elpis S4LV003 controller -- the same one used on the 980 PRO. So if you bought them in the latter half of 2021, you may have gotten a 970 EVO Plus that is similar to the 980 PRO in controller terms.

Additionally, the 980 (non-PRO non-EVO) seems to differ primarily from the 980 PRO in the controller. The flash is supposedly the same between the 980 and the 980 PRO, same write endurance too.

Obviously this is just all a jumbled confusing mess. I don't appreciate the rebranding either.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Non-pro 980 is DRAM-less, relying on the magic of DMA to have a meaningfully-sized cache in the host's memory.
In practical terms, for client applications, the difference is in sustained writes. For server applications, I imagine that many SSDs eating some of the host's memory can start adding up to something meaningful.
 

Phlesher

Dabbler
Joined
Jan 9, 2022
Messages
16
If you want capacity storage and VM/block storage, yes. SSD are more reliable than HDD, so 2-way mirrors would be fine for your VM, with the number of vdevs depending on your need for IOPS. For storage, take raidz1 as the base unit.

Okay, so my assumption has been that I don't want to rely on redundancy only up to 1 disk failure. The idea of losing everything if 2 disks fail might keep me up at night. Or are you saying: RAID-Z1 is fine because you'll always have a hot spare ready and it'll be rapidly assimilated into the vdev that needs it?

On the VM's, it's sort of the same thing for me. I don't want to lose everything if both drives fail semi-concurrently. I would have to know that there's a hot spare ready to assimilate, and that it will happen quickly and without a likely secondary failure, to feel okay about it, I think.

Under your guidance, I could go...
  • 2x 9-disk RAID-Z1 for bulk storage pool
  • 2x 2-way mirror for VM pool
  • 2x hot spares (able to be pulled into any vdev across either pool, dynamically as needed, assuming all disks are the same?)
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
The thinking was that SSD have URE rates of 1E-17 rather than 1E-14 or 1E-15 as HDDs, so "raid5 (raidz1) is dead (above 1TB)" does not apply to SSDs (and the corollary for 2-way mirrors around 10TB does not apply either). And then not spending more than necessary for one level of redundancy, considering the cost per TB of SSDs.
"ZFS is not a backup", so even with raidz3 you'd still want some other external backup so you don't lose everything in the event of an unlikely catastrophe (2 disks failing simultaneously, server going up in flames, etc.).

9-wide is perhaps a little wide (but not obviously WAY too wide) but indeed 2*(9-wide Z1) or 3*(6-way Z1), plus 2*(2-way mirror) and spares (not sure they could be "server-wide spares" rather than pool-specific spares) looks like a reasonable configuration.

If you don't mind the costs, of course. With 2 TB SSDs the above would give 15*2 or 16*2 TB of raw storage space (24-25 TB usable before the pool has to be expanded): One could get the same storage space from just 4*16 TB HDDs in raidz2 (and twice as much space from a 6-wide raidz2…), and these, together with the 2*2 SSD VM pool, would fit in a much smaller case.
Capacity is 3.5" territory.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
"raid5 (raidz1) is dead (above 1TB)" does not apply to SSDs (and the corollary for 2-way mirrors around 10TB does not apply either).

Matter of opinion. When SSD's fail, they often fail completely. This presents a more significant rebuild challenge than HDD's.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
For fear that the "matter of opinion" is rather a matter of assumptions or, worse, of incomplete reasoning, let's put it straight and in full.

"ZFS is not a backup", so there should always be at least one backup of the data, in any form, so that it is possible to restore after a catastrophe—even if that would be a major pain in the lower back…
The goal is for the main NAS to sustain common failures and low probability events, NOT to protect against ultra-low probability and fringe events—these are when the above pain would kick in. (The wordings "low", "ultra-low" and "fringe" are intentionally vague and subjective.)
Drives do fail: It is expected that one drive will fail.
What next?
With large HDDs and u=1e-14, the probability of getting an URE during a raidz1 resilver is considered "significant", i.e. raidz1 does not meet the above goal (even if the result would be the loss of select files to the URE and not of the entire pool as with raid5). There's no need to further consider the risk of a second drive failure during resilver—and possibly caused by strain from the resilver.
With SSDs and u=1e-17, the probability of an URE during a raidz1 resilver is considered "fringe". SSDs are faster than HDDs, I expect a SSD resilver to proceed faster and complete faster than a similar resilver on HDDs, i.e to be a matter of hours, not a matter of days (wrong assumption?). If so, the probability of a second SSD failure in the same vdev during the limited 'at-risk' time window appears to be somewhere between "ultra-low" and "fringe". (This may assume that the initial failure was accidental rather than due to wear, underestimate the strain caused by raidz resilver and/or fail to consider the risk that the surviving SSDs wear out and fail due to the resilver itself.)
Based on the above, in a scenario where a failing drive can be replaced quickly, in particular where there is a hot spare, I expect raidz1 to meet the security goal. Specifically, should the failure occur at night (while scrubbing?), I expect that the NAS administrator would find out the next morning that a drive failed, that the spare kicked in, that resilver is complete and all data is fine. (This is a home NAS: The user/administrator does NOT sleep with a pager and does NOT wake up to fix issues on the spot!)

Any obvious shortcoming here?
 
Top