L2ARC vs Fusion pool

2fanatyk2 · Dec 13, 2020

Hi,

First of all I've been using forum for over 3 years - but this is my first post, since I really try to do my research before posting new threads. Thanks to all of you for valuable insights and sharing knowledge!

My question is the following: I have 2 sata ssd's (~500MB/s read speed, ~300 MB/s write speed, 256gb and 128gb drives) that I'd like to improve performance of my main server (setup in description). Here's what I'm considering:

Option 1: Just put one or 2 of them as L2arc stripe
Option 2: Mirror them as metadata / small file storage in existing pool (Fusion pool)

In both cases I'd attach each to it's own SATA 6gbps port directly on mainboard.

In your opinion what would give me better results? My usage case is as follows:

I mostly care about video editing performance. Currently in my workflow I dump ~40-50 GB of new video files on the server and edit it straight away. Out of that ~15-20GB are files that end up in final video and I keep working with them, and rest are bad takes - I just watch them once or twice, but they are not used much after that. I guess due to size of my ARC most of project files should end up in RAM and be served straight from there.

Also mind that while the workload seems to be sequential, there are functionalities of my software that run on all project files at once and read just tiny bits of it (e.g. to generate media thumbnails / audio waveforms). Speed of these processes is important to me as well. Usually I have 50-120 files in each project.

However some projects I work with are much bigger, and in their case "hot data" may be in excess of 80GB. I also open older projects to re-use some of the footage/animations, and due to that I guess some additional unnecessary files may be loaded to L2arc in the process taking up space. I do experience occasional lag while working, that I'd like to eliminate. I guess increasing L2arc would help in that case (and increasing it's fill-rate to e.g. 100-200MB/s, I'm not that concerned with SSD longetivity since I don't edit all the time and from my calculation it should be fine for at least 5-6 years). There's also higher chance that when I have a break waiting for customer feedback for 5-6 days, the project will still be in L2arc and will work better when I get back to it, especially for media thumbnail generation I mentioned.

I believe in that case increasing RAM wouldn't help much, since it already fits majority of "RED-HOT" files, and I'm more concerned with "fairly warm" files - so I'm happy to sacrifice some "max performance" for having "high performance" more often. L2arc size in that case would be ~240GB (twice size of smaller drive).

There's also one more consideration. My editing software (Davinci Resolve) creates it's own "Disk database" with all project information - just a bunch of files organized in folder structure. Since it auto-save every 5-minutes, I want those saves to be as quick as possible, since it stutters software for a couple of seconds. Currently my main database is ~1.5GB, and average file size is ~1.5MB. Here to my best understanding ZIL will not help at all (async writes), and I guess L2arc will not help either (I believe it is for reads only).

And here come the Fusion pools. My thinking is the following:

Metadata on SSD improves access time to video file on HDD's. That would help with responsiveness for files, that are not yet in ARC. It would also help with medie thumbnails generation, since access to each file would be faster. It would also help with directory listing speed, which also helps quite a bit.

Also what I'd do is setting it up small file storage on SSD's as well - to accelerate opening&saving project files in my editing software - as far as I understand that is doable with fusion pools. Since average project size is 1.5MB, I'd set it to store files smaller than 2MB. Majority of project files should fall into that category.

Currently I have 1 014 837 files on my NAS taking up ~12TB. Out of that I can eliminate large video files, photos (mostly>3MB) what leaves me at ~870000 files and ~1.8TB (average size: 2,2MB). A lot of those are 3-7MB photographs

Assuming 1/2 of those files is under 256kB, 1/4 of those files is under 256kB, another 1/6 of them under 1 MB, and 1/8 under 2 MB it adds up to ~460 GB of data eligible for SSD storage as of now. I'm not planning to re-create data, so old data wouldn't go to SSD storage immediately, but rather new files would start filling it up. However considering planned server storage will be twice current and at some point I will most likely recreate the pool from backup (e.g. while adding new vdev), I need to take it under consideration and plan for ~1TB per 24TB of HDD storage (after RAIDZ2 "losses").

So while for current usage I can just mirror 2 ssd's I have currently (giving me ~120GB of storage), I have to plan for exchanging them for 500GB-1TB drives in not-so-far future, which becomes fairly costly. Or decrease max file size, what would not help that much with my editing software saves&loads, but I guess would make system snappier thanks to metadata storage on SSD.

Sorry for long analysis, but I hope on one hand it explains situation fairly well, on the other gives some food for thought to people considering Fusion / metadata pools. What I believe would be even more beneficial is the discussion that (hopefully) will emerge below.

Final question to those experienced with L2arc and/or Fusion pools: Is my thinking solid, or did I miss any important factors? What real impact I can expect in both scenarios and which one would you choose: L2arc or Fusion? How do you define SSD vdev size compared to HDD storage and what file size limit did you set? What is your actual SSD vdev usage?

Or maybe do you suggest SSD vdev of 2x120GB SSD drives + 250GB SSD L2arc? But wouldn't single SSD L2arc device slow down max throughput of the pool (SSD max speed ~500MB/s, while pool delivers ~800MB/s)?

From my side if there's anybody with similar use case - I'm more than happy to share my experiences, although I'm certainly not a pro in this area

2fanatyk2 · Dec 13, 2020

Sorry for some typos - didn't notice them and it seems I can't edit the post (maybe due to being new to the forum). I believe it's understandable nevertheless.

2fanatyk2 · Dec 13, 2020

Update: on editing software (Davinci resolve) project files. Sadly after going a bit more into it it seems, that project file sizes (the "disk database" files) are rather 10-15MB size, so I believe it rules out metadata+small files fusion pool setup. If I was to set all files >15MB to be stored there, the size of required SSD's would be impractical. Therefore for my use case I'd set it as "metadata + files<0,5MB", so that while project files wouldn't be served from SSD's, it would at least take some burden from HDD's, and would allow me to utilize otherwise wasted SSD space (metadata only would probably take 2-5 GB, maybe 10GB). So is this scenario still worth it from your experience?

sretalla · Dec 14, 2020

If the performance is determined by reading a large list of files or by being able to quickly re-read the same large volume of files repeatedly, you may see some benefit from either L2ARC with a metadata only setting or by using a metadata VDEV.

If your performance is driven by writing large numbers of small files, the metadata VDEV may help (by making metadata writes a little faster), but a SLOG would probably be of more help (assuming you are hitting a peak IOPS number for only short bursts).

In almost all cases, more RAM is probably going to speed it up.

2fanatyk2 · Dec 17, 2020

sretalla said:
If the performance is determined by reading a large list of files or by being able to quickly re-read the same large volume of files repeatedly, you may see some benefit from either L2ARC with a metadata only setting or by using a metadata VDEV.

If your performance is driven by writing large numbers of small files, the metadata VDEV may help (by making metadata writes a little faster), but a SLOG would probably be of more help (assuming you are hitting a peak IOPS number for only short bursts).

In almost all cases, more RAM is probably going to speed it up.

Hi Sretalla,

Here you touched on a couple of my doubts:
1. SLOG - from my knowledge it is only beneficial for sync writes - and all (or most) of my writes are async. From my current knowledge it would actually hurt performance instead of improving it, would it not?
2. Does ARC actually impact write speeds? I believe it should. But if so, L2ARC (which I believe is mostly similar, only different tier/storage medium) should behave simillary, although what I've read on the forum kind of contradicts it, as everybody says L2ARC improves READ speed, not WRITE speed.

Best regards!

sretalla · Dec 17, 2020

2fanatyk2 said:
SLOG - from my knowledge it is only beneficial for sync writes - and all (or most) of my writes are async. From my current knowledge it would actually hurt performance instead of improving it, would it not?

It certainly can't help with async writes and it can hurt if you then set sync=always (turning your async writes into slower sync ones).

2fanatyk2 said:
Does ARC actually impact write speeds? I believe it should. But if so, L2ARC (which I believe is mostly similar, only different tier/storage medium) should behave simillary

No, not at all. It's not tiered.

2fanatyk2 said:
what I've read on the forum kind of contradicts it, as everybody says L2ARC improves READ speed, not WRITE speed.

This is correct (sort-of). Some reads will go faster if they aren't already in ARC and are still in L2ARC... that's no guarantee, but it can help if your working set is bigger than ARC can be.

Tony-1971 · Dec 17, 2020

Hi,
If I understand the doc https://www.truenas.com/docs/hub/initial-setup/storage/fusion-pool/

It is recommended to have metadata vdev redundancy match the redundancy of the other normal devices in the pool.

So in your configuration is better to use raidz2 also for metadata special dev.
Best Regards,
Antonio

sretalla · Dec 17, 2020

Tony-1971 said:
So in your configuration is better to use raidz2 also for metadata special dev.

Not really, it's recommended to have the same level of redundancy (for RAIDZ2, you can lose 2 disks and still be OK), so in this case, a 3-way mirror is actually best.

jenksdrummer · Dec 17, 2020

Everything gets written to ARC then to disk; flushed every 5sec with the intent that they would be large stripes of data to the disk, where spindles perform best (sequential writes). That said it also acts as a read-back buffer, so when a system writes a file and asks to confirm it; arc can kick that back quickly. Large enough ARC and you read/write small amounts of data, it all stays in RAM; with updated writes happening behind the scenes. Also, anything read gets written to ARC as a read cache...

Sync-Writes = ON requires the disks to report back, not ARC, that the data has written. This is much slower; which is why people who want this or consider it critical for data integrity in their environment, will use a SSD of some form to do this. Incoming data to ARC gets also written to SLOG, then when ARC flushes to disk, SLOG data is dumped. A literal high rate of churn and burn; the only time that data is read is if there's is an issue (OS crash, power fail, etc) that requires it.. It will not speed up writes, but it does raise the level of sanity behind those writes.

L2ARC is a read cache; as items age out of ARC, they get shifted to L2ARC, which is ideally a fast SSD; data can be read from there rather than from slower spindles. This does not improve writes speeds; only helps with read speeds for data that was recently accessed or written; depending on the level of churn from ARC aging.

MetavDevs in the default configuration just do metadata. It can do small block IO. Metadata in it's nature is small; so when it comes to sequential writes that spinning disks like, vs random IO that SSDs do well at, metadata is more suited to SSD; to that, small block IO, fits that too; but I'm not a fan of splitting off small block IO unless I have a LOT of SSD to mess with; once you use up your MetadataVDEV; those writes all go back to the slower data vdev (spindles) - so the benefit is lost. It's also not a hot/cold aware, so it's literally the first data that gets written to it will be there until it's deleted.

DedupeVDEV - if you turn on metadata vdevs; along with that will also get dedupe tables allocated to the metadatavdev. If you turn on dedupe, both metadata and dedupe table writes/lookups will happen from the disks assigned. And let me tell ya, dedupe makes disks take a POUNDING. So, you can dedicate devices as dedupe vdev and offload that to more specialized hardware. I do this with a pair of M.2 (Evo 970 Plus, actually) and I see no, zero ,notta performance drop with turning on dedupe. I have a pair of 200GBish Intel DC-series SATA SSDs for metadata.

2fanatyk2 · Dec 17, 2020

Tony-1971 said:
Hi,
If I understand the doc https://www.truenas.com/docs/hub/initial-setup/storage/fusion-pool/

So in your configuration is better to use raidz2 also for metadata special dev.
Best Regards,
Antonio

Thanks Tony,

Not sure I actually agree with this. First of all, main pool is much bigger, therefore more disks that can potentially fail (put very simply and fairly inaccurately: 3/8 is less than 1/2 of the pool

). On top of that SSD resilvering is MUCH faster, and SSD life is mostly dependent on writes, not reads - in my case most of data is written once and not re-written (or re-written like once per year), and only small subset of data is rewritten constantly.

I do get that similar level of vdev safety is required, but it does not necessarily mean same layout/redundancy, as described above. Also, my main server is replicated to backup server every 3 hours during my working hours, so in very unlikely event both SSD's fail at the same time I'm not risking much.

@jendksdrummer - thanks for your reply, some of my doubts are solved now

You also addressed something I've been considering, but after research have decided to give up on - dedupe. Since I have files I re-use in many projects, I have a fair number of duplicates I want to keep. All my reading led me to conclusion, that dedupe affects pool performance heavily, so I gave up on the idea, since it's cheaper and easier to add bigger/new disks, than meet dedupe requirements. However your approach with M.2 drives makes sense. Would you mind sharing, what is your system and what is the read/write performance?

I also wonder, if performance with SATA / SAS 6G connections will be similar, since I don't have M.2 slots, and I don't want to waste PCI-e slot for M.2 adapter. Don't think I'd be doing it in my system anyway, but seriously considering it for backup server.

Currently I'm gravitating towards:
One ssd as L2ARC (250GB one)
One ssd as separate pool only for video editing software's "disk database". No redundancy, just frequent (like every 10-30 minutes) replication to main pool or backup server (all changes I do during working day is probably ~10-20 MB of data, so replication shouldn't take more than 1 second).

However I'm still thinking of metadata vdev, since it would probably speed up searching for footage from older projects.

sretalla · Dec 17, 2020

2fanatyk2 said:
I do get that similar level of vdev safety is required, but it does not necessarily mean same layout/redundancy, as described above. Also, my main server is replicated to backup server every 3 hours during my working hours, so in very unlikely event both SSD's fail at the same time I'm not risking much.

Just to spell out for you how the Metadata (and other special) VDEV works, that VDEV becomes the ONLY location of the data stored on it in the entire pool, so if you were to lose that VDEV (completely different from SLOG or L2ARC, which can be lost without pool loss), your pool will be dead and unrecoverable.

Fair warning.

2fanatyk2 · Dec 17, 2020

sretalla said:
Just to spell out for you how the Metadata (and other special) VDEV works, that VDEV becomes the ONLY location of the data stored on it in the entire pool, so if you were to lose that VDEV (completely different from SLOG or L2ARC, which can be lost without pool loss), your pool will be dead and unrecoverable.

Fair warning.

Hi Sretalla,

I really don't need "spelling out", and I don't think such patronizing language is in order here.

I do perfectly understand that. I understood it before even asking my first question here. My question is (and always was) about performance, not data security. I gave explanation why I don't consider that as a significant risk in my scenario (redundancy PLUS easily accessible backup with ADDITIONAL RAIDZ1 redundancy).

So to sum up, even with 2-way ssd mirror I'd lose data only if:
1. Both SSD's fail within 1 hour (or even less - resilvering SSD's is a matter of minutes - having 8-disk RAIDZ2 fail during resilver is more likely in my opinion, especially since during resilver existing drives are read instead of written to, which in case of SSD is preferable, while new drive is being written to)
AND
2. At least 2 drives from my backup server fail while restoring data to main server (so within 1-2 days based on current data stored)

Having said that, I've had exactly 1 HDD failure over the last 5 years and exactly 0 SSD failures in the last 8 years. I really think my setup is secure enough for my use case.

I hope now that is clear. Can we please get back to performance topic?

sretalla · Dec 17, 2020

2fanatyk2 said:
such patronizing language

I'm only seeking to avoid unhappiness... If you knew it already, great! But at least now you know it twice, which is much better than 0 and didn't cost much in the process (also I am not replying just to you... others will see the posts and not be as clear on it, so I'm spelling it out for them too... this is new and dangerous stuff if not done correctly... pools will be lost).

Apologies for sounding patronizing. It was not my intention.

2fanatyk2 · Dec 17, 2020

sretalla said:
I'm only seeking to avoid unhappiness... If you knew it already, great! But at least now you know it twice, which is much better than 0 and didn't cost much in the process (also I am not replying just to you... others will see the posts and not be as clear on it, so I'm spelling it out for them too... this is new and dangerous stuff if not done correctly... pools will be lost).

Apologies for sounding patronizing. It was not my intention.

Ok, thank you for explanation. In my mother tongue "spelling it out" is extremely disrespectful (as in: "you're an idiot, so I'll try to explain that your way"), and while I do believe in english it is also far from polite I get what you mean now, and I do accept (and appreciate) apology.

Following your goal of educating others, from my side what I consider crucial for anybody considering metadata vdev's is:
1. If metadata vdev is lost, all data is lost - redundancy is a must and backup is *strongly* recommended
2. SSD's are way less likely to fail during resilvering, since:

read operations do not wear them out as much as writes
less data = quicker resilvers
way faster reads/writes = even quicker resilvers (if your CPU is fast enough)

3. SSD's are not strained by powering system down if replacement is not readily available. If one of them fails (and if you can) shutdown the system, until you can replace failed SSD, unless you still have sufficient redundancy (e.g RAIDZ2+ or three+ way mirror)
4. Points 2 and 3 mean, that generally you can have lower level of redundancy with SSD pools while having similar level of safety, compared to HDD. How much lower is debatable. Also, this only applies to *good* SSD's, not cheap chinese brands

This is based on my knowledge and research to date, I don't have extensive experience on server maintenance or heavy SSD usage - so if anyone has different experiences, feel free to correct me.

Herr_Merlin · Dec 17, 2020

I already had it two times that a RAID1 of two consumer SSDs within a hours as they where's used in Servers.
Plus some server appliances vendor used a single SSD within the unit and went for the cheapest possible.. it blew after about 250days both units with the HA cluster had reached 40TBW and died within 48h..
Yeah not going down that route again.
Spend a ton of money on any kind of SSD you are using for IO heavy operation let alone writes..
There is a reason why 1+ DWPD SSDs cost so much. Invest that money.
Go for at least double digits PB TBW..

For L2ARC it depends as the system will continue to work but slower. You may cheap it out with some samsung pro mlc / tlc ssds

2fanatyk2 · Dec 18, 2020

Herr_Merlin said:
I already had it two times that a RAID1 of two consumer SSDs within a hours as they where's used in Servers.
Plus some server appliances vendor used a single SSD within the unit and went for the cheapest possible.. it blew after about 250days both units with the HA cluster had reached 40TBW and died within 48h..
Yeah not going down that route again.
Spend a ton of money on any kind of SSD you are using for IO heavy operation let alone writes..
There is a reason why 1+ DWPD SSDs cost so much. Invest that money.
Go for at least double digits PB TBW..

For L2ARC it depends as the system will continue to work but slower. You may cheap it out with some samsung pro mlc / tlc ssds

What was the load on that server? I have a feeling this experience while valid is irrelevant to question originally asked (home/small-business single user)? So I wouldn't cross-out consumer SSD's of reputable brands so definitevely (as you say yourself - those were cheapest SSD's possible).

I also believe your comment adds one important point, that has been said over and over again on the forum - if you have same disks in pool (HDD or SSD) and install them at once, their mileage will be the same, so failure is much more likely to happen within short timespan. Therefore installing disks from different batches / vendors / at different times is preferable. Kind of asumed this is obvious, and in my case (described above) I said I'm using different disks, that I have been using already, and each of them is on totally different stage of its lifecycle.

I think I don't need 1+DWPD SSD for my use case. In previous 5 years I've written 10 TB TOTAL to the server and read approx 40 TB TOTAL. If metadata vdev holds only a tiny bit of overall data, it's writes are probably of the order of magnitude of tens of gigabytes, maybe hundreds, not terabytes. So when is investing in 1+ DWPD SSDs becoming profitable - in 30 years or so?

Sorry, but I have a feeling that while the input is valuable, it is very inaccurate. You're not taking into account the nature of metadata vdevs (according to my research metadata only is of the order of magnitude of gigabyte/TB total storage or less), and you're basing your advice on TOTALLY different system and usage scenario.

I have a feeling that lots of advice on the forum comes from large server admins (and is valuable), but what is missing is further discussion on usage scenario, and a lot of community readers are people with homelabs, small business usage etc., where some factors are worth considering, but fairly irrelevant due to usage scenario. Due to that I'd strongly suggest not giving definite advice (like investing in PB TBW) without considering usage scenario, which makes such devices a false economy due to declining costs of SSD storage.

Still, this whole discussion mostly focuses on data security instead of performance. Big shoutout to sretalla and jenksdrummer, who were the only ones who actually addressed question I have actually asked, but I'm still looking for input from somebody with practical experience with fusion pool performance/L2Arc. Can we please get back to topic of that thread?

P.S. I don't want to underestimate anyone's input, it's just that I'm really interested in performance and it's hard to find any quantitative info on that. On security - there's a lot on the forum already and as (I think) you can see - I have that topic figured out for my use case already. Still, I'm willing to share my thought process and experience also on data security topic, so that this topic is (hopefully) useful to others considering those options.

SweetAndLow · Dec 18, 2020

You should try to run the arc_summary script built in to see your cach hit and miss statistics. This would help you understand the performance advantage of you add the l2arc.

jenksdrummer · Dec 18, 2020

2fanatyk2 said:
@jendksdrummer - thanks for your reply, some of my doubts are solved now You also addressed something I've been considering, but after research have decided to give up on - dedupe. Since I have files I re-use in many projects, I have a fair number of duplicates I want to keep. All my reading led me to conclusion, that dedupe affects pool performance heavily, so I gave up on the idea, since it's cheaper and easier to add bigger/new disks, than meet dedupe requirements. However your approach with M.2 drives makes sense. Would you mind sharing, what is your system and what is the read/write performance?

I have a pair of Supermicro SSG-5029P-E1CTR12L, identical boxes, just I operate them differently.

Hardware installed with both boxes:
Intel Silver 4210
192GB RAM
3008-IR onboard
3108 Raid Controller
2x 120GB SATA DOM in onboard DOM/SATA slots
2x SATA intel DC 240 SSD in the back
6x WDC Gold 10TB HE drives (new ones are back to air, FWIW)
2x 10GTek 10G-T dual port (total of 6 10G-T ports between onboard and addon cards)
1 SuperMicro 2x M.2 PCI Addon Card
1x Samsung EVO Plus 1TB M.2 onboard
2x Samsung EVO Plus 1TB M.2 on the card

SAN-A I use the 3108 in RAID 10. Not a recommended setup on this forum for a number of reasons but it works. FOR ME, I keep multiple copies of my data on other systems, this is more about me seeing what the performance differences are. 3008 is still enabled, just not used.

SAN-B I use the 3008-IR. I could flash it to IT as recommended here, but SuperMicro says that will kill off my warranty and I have another 2 years left of warrantied support. Might do it later, might not. No issues I've noticed...3108 is installed, but just not plugged into anything.

I tend to take one box or the other and reload it with alternatives and do comparisons; having two of them lets me keep a baseline to run multiple examples.

This is my at home lab; well part of it. Only data I care about is on a 3rd Supermicro mini-chassis; and I occasionally swap to one of the others to be the primary, but I generally keep at least 2 copies of data.

In terms of data throughput, using Acronis against my gaming desktop (4x 250GB RAID-0 950 Pro M.2 on a 4x AddonCard + 2x 10TB WDC Gold 10TB SATA (how I know new ones are not HE...) in RAID0 - I can run a full 4.1TB backup in about 2.5 hours; datarate of that is averaged at 3.8gb/sec. I could likely improve that; things like source compression and such, but the majority of the bottleneck is the 2x SATA RAID; it sticks around 87% / 3.5-4GB/s for about 4 of the 4.1TB of data. The 4x250GB will run solid at about 6GB/sec.

Running benchmarks, I can saturate out a 3x 10GB NIC setup easily enough. I didn't try more than that as I didn't have additional at the time.

Mainly, it's about the 2x AoC M.2 taking dedupe workloads (mirrored). The Intel DC pair are my metadata vdev...then I have the onboard M.2 for L2ARC.

Something to also consider with metadata vdevs; if you leave it at metadata alone, you'll generally see a factor of 1% compared to the data vdev consumption; IE, I have 3.8TB consumed at the moment, my metadata vdev is at 3.8GB. If you throw small block IO in there, that will skew the results quite a bit.

Important Announcement for the TrueNAS Community.

L2ARC vs Fusion pool

2fanatyk2

Dabbler

2fanatyk2

Dabbler

2fanatyk2

Dabbler

sretalla

Powered by Neutrality

2fanatyk2

Dabbler

sretalla

Powered by Neutrality

Tony-1971

Contributor

sretalla

Powered by Neutrality

jenksdrummer

Patron

2fanatyk2

Dabbler

sretalla

Powered by Neutrality

2fanatyk2

Dabbler

sretalla

Powered by Neutrality

2fanatyk2

Dabbler

Herr_Merlin

Patron

2fanatyk2

Dabbler

SweetAndLow

Sweet'NASty

jenksdrummer

Patron

Similar threads

Important Announcement for the TrueNAS Community.

L2ARC vs Fusion pool

Dabbler

Dabbler

Dabbler

Powered by Neutrality

Dabbler

Powered by Neutrality

Contributor

Powered by Neutrality

Patron

Dabbler

Powered by Neutrality

Dabbler

Powered by Neutrality

Dabbler

Patron

Dabbler

Sweet'NASty

Patron

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "L2ARC vs Fusion pool"

Similar threads