Dedup seems like a really expensive idea. Two times 480gb Optane is around usd 1200, plus memory, I’m assuming 256gb, that buys a lot of storage.
Not to say it can’t be worth it - but the space savings have to be above the roughly 60ish TiB you could add to your pool with that.
A fair idea, but flawed. There are 2 questions and these are their tl;dr answers, elaborated a bit below:
- SSD quality - Is Optane really needed
Is Optane really needed? Dunno but seems plausible *in my case* given Im seeing 1/4M IOPS on each in routine use - but quite likely to be "YMMV" for many other people's servers. Note that my backup server is happy with Samsung Pro or even EVO - but it's also not under the same complex mixed in-use demands and if an SSD dies I don't care as it's unlikely my main server will die as well in the 36 hours it'll take to fit a new cheap SSD and re-replicate.
- Cost effectiveness - cost of dedup vs. cost of extra disks without dedup, if so
Is it better simply to buy HDDs and not use dedup? No. Flat out impractical for many reasons - cost, feasibility and plain jawdropping disk count. Your estimate of the equivalent disk count is an entire order of magnitude out, see below. So dedup's a given, here, unfortunately. The only question is cheapest way to make it work fast, not whether or not to use it.
SSD QUALITY
I've said elsewhere, it may be that good SSD is enough. I'm doing a replication to a test pool with Samsung Pro's instead, and its not showing any signs of stalling.
What I don't know is what's actually needed, to be *sure* it'll keep up in actual use where there might be heavy R and W in parallel (2TB file transfer against background scrub or 2nd big read?). I'm not a tech lab, or a ZFS dev. I just like it to work well, and be sure what's in there isn't a problem under load. Would usual enthusiast SSDs be okay? They might well be fine. Nobody's actually got data any way, so when I bought I played safe. Theres no data to say anything else.
Overkill or not, is unknown at this point. The advantages are clear, but are they advantages a pool *needs*? Enough to be worth the extra price tag? I wish I could say. I don't have a clue yet. I'm not even sure I know how to test it, except by time and circumstance, because i want it to work for my workload and if there are tests of optane va enthusiast for workloads like mine on deduped fast nearline mirrored ZFS with periodic parallel use, im not sure what they are or how to construct them.
COST EFFECTIVENESS - DEDUP+SSD+RAM VS MORE HDD?
Your estimate is really badly out. Like, massively. By an entire order of magnitude. Because (to use your figure) you aren't just adding 60TB storage space. You're adding 60TB of *redundant* *used* space on a *single server*.
Lets look at that. To add 1 TB of usable pool space with 3 way mirrors (the only sensible choice if one is avoiding RaidZ parity calcs and other RaidZ disadvantages) is 3TB raw space. But ZFS likes to run <=50-60% full (it slows down if it gets past about 60-70%, theres stats on that even with big pools) so really you need to add 5-6TB raw space to get that 1TB extra capacity. Now double it because you do take backups too, and of course the backup server is also similarly redundant. So there's a 10-12x multiple going on. Add 1TB actual pool capacity in use = add 10-12 TB raw disk space. Plus HBA capacity, power use, and of course HDDS have an additional ongoing replacement cycle too.
Your "just add 60TB and don't dedup" has just added about 600-700 TB of raw storage, a ton of support hardware (PSU/HBA/backplanes) - and a commitment to buy that size of HDD space every 5 years or whatever it is when HDDs wear out and the warranty has expired. Even that frankly jawdropping figure ignores the fact that new data dedups against old, which is probably another 30-50% reduction/expansion.
I posted the real-world calculation for my own pool in another thread a while back.
Dedup's use cases are *extreme* data size reduction. In my case, for example, the maths goes like this (roughly)
- 40 TB of data now. Say 80TB in a while. ZFS likes to run with quite a lot of free space, so that 80% should still ideally only be about 60% full . So about 133 TB raw pool capacity. I like to run 3 way mirrors. That's about 400 TB raw disk space. Double it because backup server. 800 TB raw capacity in theory between them. 0.8 PB for a home server and backup? Ridiculous. Using say enterprise 8 TB disks? That's 100 disks at £200 each. And connectors/backplanes. And power costs.
- Enable dedup? Now 13 TB, 1/3 of the size. But my future writes will also dedup more (more likely to already have copies in the existing 40TB) so my deduped size wont double in 5 years. It might go up by 50%, say 18 TB deduped. At 60% full and 3 way mirrors, thats 90TB, or 11 disks. Maximum 22 disks inclusing a 2nd backup server.
- This is when and why one uses dedup. Not to just shave off a few tens of percent. When storage cost and scale is actually prohibitive otherwise.
- Other use cases are limited disk size, or limited bandwidth (historically one could cut data sent by a huge amount if replicating or backing up/restoring)
The point is, this is inherent in the data size, choice of mirroring with 2 failure tolerance in a set as protection, and the fact disks die/fail so redundancy is needed across 2 servers in order to never need to worry about it. Short of skipping to RaidZ (a Bad Idea for performance and resilver speed) there's little one can do to avoid a lot of disks for that level of safety across 2 servers.
Running dedup is the only way I know to make it practical, and still get 250-500 MB/sec to the server when I'm moving 1 TB directories around. If it's not clear what quality SSDs are needed, that's a secondasry problem not a primary one. But Optane also adds a redundant ultra-low latency SLOG and guarantees lowest latency on all metadata, all DDT. Given that they are pulling 1/4 *million* IOs each at times, I'd say it was a decent call compared to another 80-100 HDDs and their HBAs/PSUs.