SOLVED Write cache aside from RAM

sidjpr

Dabbler
Joined
Jul 20, 2021
Messages
10
Hello guys,

I am new to TrueNAS and as far as my research went, there seems to be no way to install an additional SSD write cache. I know, that the RAM is used for this task, but why not also have the possiblitiy to use way larger and cheaper SSDs for that task once RAM cache is full? Sure, they are not nearly as fast as RAM, but still way faster than spinning rust. Am I missing something?
Btw. I know that ZFS LOG devices are a thing, but as far as I unterstood the matter it is only for synchronus writes and can not be treated like a general write cache.

So is there really no way other than increasing RAM size or adding more HDDs to improve write speeds? I am using 6x8TB HDDs in a 3 vdev á 2 mirror layout (Raid10).

Thank you guys, I really want to wrap my head around this.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Can you describe your workload? It would be interesting to understand where ZFS' way causes you issues.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hello,

You're correct here, ZFS has no provisions for a general "SSD write cache" and only does the short-term caching/logging in RAM (and if synchronous writes, to the ZIL)

Increasing RAM and adjusting the tunables will help absorb "larger bursts" of writes, but the pool vdevs have to be able to commit those larger bursts to disk or else it will end up having to apply a write throttle to say "your pool isn't fast enough, please wait" to incoming data.

Unfortunately you can't simply say "write to these large SSDs first and then spool off asynchronously to disk later" - consider investigating something like bcachefs if this is a critical use-case, but make sure you understand the limitations (data placement isn't quite as guaranteed as in ZFS, removing devices still seems to require manually re-replicating metadata before and after, erasure coding isn't ready yet AFAIK)

As @ChrisRJ asked - describing your workload might be worthwhile to see if it's something inherent to ZFS causing you these issues or if there's a way that it can be adjusted/optimized to fit your needs better.
 

sidjpr

Dabbler
Joined
Jul 20, 2021
Messages
10
Thank you guys for your answers. I am building this NAS out of scrap parts since I have literally zero money left, so I have to work with what I've got. The system is intended for helping my company over a hard time and then replace it with proper hardware once its possible again. So please be gentle with me, I am well aware, that the hardware is far from optimal not to say complete crap. I am trying to get the most out of it nonetheless.

CPU: i7 860 (no I did not forget some numbers it is really this old :wink:)
RAM: 16GB non ECC
HDD: 6x 8TB TOSHIBA MG05ACA800E
SATA SSD: 4x 1TB Samsung SSD 870 QVO 1TB
NIC: Asus XG-C100C


It is used by ~5 people working with media files. So lots of reads with files >50 MB. Scrolling through timelines, rendering and compositing stuff. This kind of workload is probably one among the most demanding regarding NAS-Systystem as far as I can tell.

Unfortunately you can't simply say "write to these large SSDs first and then spool off asynchronously to disk later" - consider investigating something like bcachefs if this is a critical use-case, but make sure you understand the limitations (data placement isn't quite as guaranteed as in ZFS, removing devices still seems to require manually re-replicating metadata before and after, erasure coding isn't ready yet AFAIK)

Thank you for the brief overview. I will not do this since my question was more of accademical nature and the downsides you just listet are not worth the effort in my case. I think my write speeds are okay for what I need, but more without hassle would alwas be appreciated.

By the way another rookie question:
Is there a smart way to do versioned backups of the NAS to a Cloudprovider (Dropbox Buissniess Advanced in my case)? I know I can do snapshots on my local NAS, but there seems no way to upload these to a Cloud (other than rsync.net which I dont want to change to). I am specifically looking for the funktionality QNAPs HBS3 has with its Smart Versioning feature --> https://www.qnap.com/en/how-to/faq/article/how-works-hbs3-with-smart-versioning-enabled

Additionally: Is there any benefit in packing in the 4 TB SSD read cache? As far as I understood the matter with a 3x 2-way mirror my perfmance should be:
  • Read IOPS: (N * Read IOPS of a single drive)*M
  • Write IOPS: (Write IOPS of a single drive)*M
  • Streaming read speed: (N * Streaming read speed of a single drive)*M
  • Streaming write speed: (Streaming write speed of a single drive)*M
  • Storage space efficiency: 50%
  • Fault tolerance: 1 disk per vdev
with N=2 and M=3 in my case. So if my SSD has slower read speeds than (N * Streaming read speed of a single drive)*M, which would be the case in my scenario I guess, then my read cache would actually be slower than my HDD pool and thus decreasing read performance. Same for IOPS reads, though I am not sure if the IOPS of a SSD is higher than 6 HDDS combined.

Thank you again for your kind answers, you already helped me a lot. Sry if this post blew up a litte.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919

sidjpr

Dabbler
Joined
Jul 20, 2021
Messages
10
Yes I know, sadly I have no choice. I got the NIC running and its seeing 10 GBit.

Here are some results:

CrystalDiskMark default setting:
Default - 3 HDD Mirror + 2 cache + log mirror.png


CrystalDiskMark Realworld setting (32 and 8 GiB):
RealWorld - 3 HDD Mirror + 4 cache 32GB.png
RealWorld - 3 HDD Mirror + 4 cache.png
 

sidjpr

Dabbler
Joined
Jul 20, 2021
Messages
10
So does anybody has suggestions to my other questions in post #4?

By the way another rookie question:
Is there a smart way to do versioned backups of the NAS to a Cloudprovider (Dropbox Buissniess Advanced in my case)? I know I can do snapshots on my local NAS, but there seems no way to upload these to a Cloud (other than rsync.net which I dont want to change to). I am specifically looking for the funktionality QNAPs HBS3 has with its Smart Versioning feature --> https://www.qnap.com/en/how-to/faq/article/how-works-hbs3-with-smart-versioning-enabled

Additionally: Is there any benefit in packing in the 4 TB SSD read cache? As far as I understood the matter with a 3x 2-way mirror my perfmance should be:

  • Read IOPS: (N * Read IOPS of a single drive)*M
  • Write IOPS: (Write IOPS of a single drive)*M
  • Streaming read speed: (N * Streaming read speed of a single drive)*M
  • Streaming write speed: (Streaming write speed of a single drive)*M
  • Storage space efficiency: 50%
  • Fault tolerance: 1 disk per vdev
with N=2 and M=3 in my case. So if my SSD has slower read speeds than (N * Streaming read speed of a single drive)*M, which would be the case in my scenario I guess, then my read cache would actually be slower than my HDD pool and thus decreasing read performance. Same for IOPS reads, though I am not sure if the IOPS of a SSD is higher than 6 HDDS combined.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I am not sure if the IOPS of a SSD is higher than 6 HDDS combined
HDD IOPS usually from 100-300, SSD IOPS from 10'000-500'000+

SSD will always be better for IOPS.

But can you really take advantage of it with only 16GB of RAM if you add it as L2ARC... I suspect not.

Depending on the file structure you have, if it's low numbers of media files which are individually large, then there's a small chance you can see an improvement in reads (at the cost of ARC to map the tables, which may slow you down).
 

sidjpr

Dabbler
Joined
Jul 20, 2021
Messages
10
Alright thank you. Maybe I've got something wrong, but wouldn't a SSD read cache always boost performance given the IOPS are as high as you said? Why is this dependend on the amount of RAM aka ARC? When ARC is full, L2ARC will jump in. This process would be same with 16GB or 512GB right?
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
L2ARC also needs RAM. Taking the latter away from an already small ARC is overall not helpful. I don't have L2ARC, but the consensus seems to be that the minimum amount of RAM for bringing L2ARC into the game is 64 GB.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Depending on what your use case really is and where your bottleneck is coming from, there are some things to say here about what may work or not. I'm taking into account your statement about $0 extra to spend.

As you're talking about mostly reading being the workload and you're suggesting that there are large files (and not a large number of small ones), there are a few points that will contribute to what might be best:

ARC of 16GB isn't going to hold very many of those files in their entirety (assuming 2+GB per file, with 5 people accessing just one file each, that's all your ARC gone). That's even assuming no RAM for system, so you can already think about 8GB as the real number)

If you add L2ARC, you'll be giving away (a smaller amount of) ARC to map the L2ARC blocks... but maybe that will work in your favour here if the speed of L2ARC is enough for your workflow.

If you assume that none of your files will be held in ARC and only the L2ARC tables, then all the files should be in L2ARC and served from SSD.

SSDs have superior read speeds to HDDs (in addition to the massive IOPS advantage), but one or two of them may not exceed the throughput of your RAIDZ disks (since you're working with large files which may be read into cache on the client anyway, maybe that's not going to make any difference at all).

If it turns out that your clients don't cache anything, there may be an IOPS intensive load on the server, which L2ARC may help to boost (once your content is all in there... look up persistent L2ARC to see about how that may help on that too).

There may be a case for using the L2ARC as a metadata only cache if the workload is actually focused on IOPS rather than throughput.

EDIT: For the point about using QVO SSD drives: you may find that these are OK if your workload is mostly reads (it's write endurance where the QVO is massively disadvantageous). If your pool has a lot of changes going on then those QVO drives will wear out quickly as suggested by others. In the case of metadata only, you may find that they can last a lot longer, but it may not serve your use case for seeking within large files.

If you've already got the users doing what they do, you can start by looking into arc_summary and try to get an idea of what's currently happening, then add the L2ARC and see what changes. You can always remove the L2ARC if it doesn't work out.
 
Last edited:

sidjpr

Dabbler
Joined
Jul 20, 2021
Messages
10
Thank you guys, very helpful answers! I expected more of a shitstorm when I posted my config, but you guys are really accepting my boundaryconditions here. Thanks again! In a few month this thing will get replaced anyway. But I learned a lot of it.
Cheers
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Just noticed those Samsung QVO's
Based on my experience - no - just no. They are not robust enough.
A friend put them into his NAS (not TrueNAS) and used them as a VM Store. They died after a month or two. Good job he had backups
OK - its one experience out of many - but they are cheap consumer drives for a reason
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Just noticed those Samsung QVO's
Based on my experience - no - just no. They are not robust enough.
Seconding this, QVOs are QLC drives with very low endurance. Using them in any kind of frequent write scenario is going to end poorly as @NugentS points out. For a "write-few read-many" workflow they may hold up, such as if you are expecting to have collaborative read efforts against a set of files that doesn't change too often, but if you planned to use them as the "scratch" or "working" area they will quickly burn through their P/E cycles.
 

sidjpr

Dabbler
Joined
Jul 20, 2021
Messages
10
Alright, good to know thanks! Which kind of SSDs would be more suitable, just an example or is there an entre class of SSDs for that purpose that slipped my fingers?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
SLC (Single-level cell) is "best" for endurance, but those aren't mainstream.

TLC/MLC are next and more readily available in the EVO line of Samsung (usually three-level).

QLC (Quad-level cells) are in the QVO line and as we said aren't great fo write endurance.

Look for the TBW (Terabytes Written) figure from the manufacturer on the drive.

My local reseller quoted the 1TB QVO as 360 TBW, whereas the 1TB EVO is 600 TBW.

Intel is the way to go if you want super-high endurance.

For example, the Intel DC S3710 (1200GB) has a TBW of 24300.

The much cheaper Intel D3-S4510 (960GB) has a TBW of 3400.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Alright thank you. Maybe I've got something wrong, but wouldn't a SSD read cache always boost performance given the IOPS are as high as you said? Why is this dependend on the amount of RAM aka ARC? When ARC is full, L2ARC will jump in. This process would be same with 16GB or 512GB right?

This has been answered elsewhere in extreme depth, but for the sake of completeness here:

ZFS has to identify what would be useful and helpful to place in L2ARC. In order to do this, it wants to put cached blocks that have been accessed multiple times but not accessed recently into the L2ARC. The less ARC you have, the harder it is to correctly identify such blocks, and the more likely it is that ZFS just ends up pushing less-useful stuff into the L2ARC, needlessly wearing out your SSD.

This means that your ARC really needs to be proportionally sized to the amount of data flowing through your NAS such that frequently accessed data has a chance to be cached. If ARC cannot do a good job of caching it, L2ARC is going to be wasted. This is highly dependent on your workload.

We generally find that 16GB systems are too small to do a good job of this, though if you really aren't doing much with your filer, or working exclusively with lots of small files, it's totally possible for 16GB to be able to support L2ARC. It seems like 64GB is the minimum realistic ARC size for many general workloads at which point L2ARC becomes useful, but you really need to look at your ARC statistics to see if there's a lot of cache wins going on to know for sure.
 

sidjpr

Dabbler
Joined
Jul 20, 2021
Messages
10
Alright, you guys are super helpful. I did know of SLC TLC/MLC and QLC, but I really didn't expect differeneces in the order of magnitudes. @jgreco Nice short and on point explanation, got it!
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Alright, good to know thanks! Which kind of SSDs would be more suitable, just an example or is there an entre class of SSDs for that purpose that slipped my fingers?

I have had good results with the Crucial MX series. I had some in a Raid5 (non ZFS) Pool for 5 years using them as a VM Datastore under ESXi and they weren't worn out. Having said that they may not last quite so well. I did post the smart stats on reddit - but can't find the post now.

Now I would be looking at 2nd hand SSD's (Intel DC) from ebay etc. Check the model numbers some are endurance drives and some are read optimised (aka less write endurance)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
but they are cheap consumer drives for a reason

Don't disrespect the cheap consumer drives. Well, yes, the QVO's are read-mostly drives with a tragically low write endurance... but out of the hundreds of drives I've deployed over the last year, almost all to hypervisor environments, 98%+ are Samsung EVO, Pro, or WD Blue, i.e. consumer grade drives.

There are definitely people who write craptons of stuff to VM datastores in a frenzy that can cause excessive wear; the people who like to create and destroy lots of VM's every other hour are bad candidates for what I'm talking about here, granted. But that's not true for a lot of VM uses, and I've talked in the past about how I was throwing lots of traffic at Intel 535 480GB drives (73TBW endurance) and even with that paltry endurance, didn't manage to kill them all (was throwing north of 100GB/day at some of them).

Now the thing is, I'm running them in RAID1, with a warm spare, which works out to have better reliability at my write levels than a data center grade SSD does, so it's definitely not for everyone...
 
Top