What's the nature of the TrueNAS Scale clusters? Quo vadis?

abufrejoval

Dabbler
Joined
May 9, 2023
Messages
20
Conceptual issues:

I operate various oVirt hyperconverged HCI clusters today, both in my home-lab and in a corporate research lab. They are built from cheap/low-power Atoms and NUCs at home and mostly from left-over hardware in the corporate lab, using Nbase-T Ethernet and a mix of SSD and HDD local storage converted into fault-tolerant distributed storage via Gluster, some replica volumes, some dispersed/erasure coded volumes (tricky with oVirt!): there is no SAN or NFS filers, just servers which is why HCI is so attractive.

oVirt is the upstream variant of Redhat virtualization and basically a vSphere/Nutanix look-alike that originally mostly did VM orchestration using KVM from Qumranet on SAN or NFS storage and was then modded into a kind of HCI (hyperconverged infrastructure) using things like GlusterFS when Nutanix made that popular and also included features like VDO, a de-duplication and compression technology from Permabit, which ZFS people typically regard as just another option, not as a separate product or layer.

oVrit/RHEV was built from quite a few Redhat acquisitions, been refactored some years ago in Ansible and never achieved full integration or maturity, one of many reasons it's now being discontinued as a commercial offering while the community project is still hoping to find community leaders.

That's extremely inconvenient, because now that I managed to get it stable enough to use, I was hoping to use oVirt for as long as I lived but Redhat has EoL'ed most of the components and derailed CentOS.

I believe Proxmox is doing a similar job on VM orchestration, but does not include a native HCI type storage layer, there used to be Linstor for that, the two companies are less than 2km of each other in Vienna but evidently no longer talking...

And there is also Xcp-NG, a Xen based orchestrator, which is looking for a better way of doing storage now that Gluster seems in dire straights, with all commercial downstream products from Redhat cancelled. I've evaluated that for quite a bit, it's much faster and more stable than oVirt in many ways, but suffers from a lot of technological debt carried over from Citrix and a Linux 4.9 kernel.

In many ways oVirt/Xcp-NG and Proxmox have long ago achieved functionalities which TrueNAS seems seems to be aiming for and I wonder just how much both projects are aware of each other or where exactly TrueNAS is trying to take the SCALE Clusters in terms of functionality?

Are you aiming for HCI, with VM/container orchestration or will you stop clusters at the storage level?

And incidentally, what is the nature of the TrueNAS storage clustering exactly, which protocols are supported and how?

My understanding is that ZFS has no native clustering mechanism, nor is it remote storage which is why things like Lustre exist and single node remote access would be done via NFS or iSCSI, right? And there is no cluster support built into NFS or iSCSI, so high-availability is either part of the appliance or somehow implemented on the client side, correct?

Now there does seem to be some cluster support built into modern CIFS/SMB variants and support for that is the major aim for TrueNAS SCALE clusters. For Linux clients TrueNAS clusters are simply relying on GlusterFS, with all the good and the bad that implies: do I read all that correctly or I missing something important?

Practical issues:

I've bootstrapped three VMs with the latest TrueNAS SCALE to try create a cluster and I've launched a Docker container with TrueCommand to manage it.

I managed to overcome the limitation that the current TrueCommand container only works with TrueNAS nodes that use "root" as the admin account and am now stuck in the cluster creation wizard, where it can't find cluster interfaces on the three VMs.

Evidently it wants distinct East-West interfaces there, but even after I added those to the VMs, they wouldn't become selectable and I've not found any documentation, hits or logfiles which allow me to understand what the wizard is looking for.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
That's extremely inconvenient, because now that I managed to get it stable enough to use, I was hoping to use oVirt for as long as I lived but Redhat has EoL'ed most of the components and derailed CentOS.

Sorry to hear it, it was nice stuff.

In many ways oVirt/Xcp-NG and Proxmox have long ago achieved functionalities which TrueNAS seems seems to be aiming for and I wonder just how much both projects are aware of each other or where exactly TrueNAS is trying to take the SCALE Clusters in terms of functionality?

I'll let @morganL speak for iXsystems, but the feeling I get is that Proxmox is never going to be the storage/NAS engine that TrueNAS is. Proxmox isn't likely to scale up into the petabytes or have the ZFS/NAS features like replication, Samba, etc. On the other hand, adding KVM features to TrueNAS isn't that rough, and the Kubernetes stuff isn't that rough either. It would probably be harder for Proxmox to "become" TrueNAS than for TrueNAS to "become" Proxmox.

My understanding is that ZFS has no native clustering mechanism, nor is it remote storage which is why things like Lustre exist and single node remote access would be done via NFS or iSCSI, right? And there is no cluster support built into NFS or iSCSI, so high-availability is either part of the appliance or somehow implemented on the client side, correct?

As far as I know, it's the standard ZFS dodge. Nexenta was one of the first to engineer the dual-head system, where you had a heartbeat in between two heads with a shared pool. Because ZFS is not a cluster-aware filesystem (a term that means that multiple hosts can simultaneously access the shared storage safely), the basic idea was that if the primary node got hung up, the secondary node would quickly do a forced mount, CARP failover, etc., and take over fileservice duties. It's a bit luddite, but with the ZFS SLOG and persistent L2ARC capabilities, it works reasonably well. Protocols like NFS and iSCSI are merely expected to reconnect in response to the inevitable RST from the acquiring head. In the dozen years since they did that, I am not sure anything better has been devised. I'm patiently waiting for iXsystems to one day offer me a nice small TrueNAS Enterprise system but that hasn't happened yet. :smile:
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I've bootstrapped three VMs with the latest TrueNAS SCALE to try create a cluster and I've launched a Docker container with TrueCommand to manage it.

I managed to overcome the limitation that the current TrueCommand container only works with TrueNAS nodes that use "root" as the admin account and am now stuck in the cluster creation wizard, where it can't find cluster interfaces on the three VMs.

Evidently it wants distinct East-West interfaces there, but even after I added those to the VMs, they wouldn't become selectable and I've not found any documentation, hits or logfiles which allow me to understand what the wizard is looking for.

Best to follow this documentation.... then report any issues
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I'll let @morganL speak for iXsystems, but the feeling I get is that Proxmox is never going to be the storage/NAS engine that TrueNAS is. Proxmox isn't likely to scale up into the petabytes or have the ZFS/NAS features like replication, Samba, etc. On the other hand, adding KVM features to TrueNAS isn't that rough, and the Kubernetes stuff isn't that rough either. It would probably be harder for Proxmox to "become" TrueNAS than for TrueNAS to "become" Proxmox.
Generally agree...Proxmox and Linbit both do good stuff.

TrueNAS has a more file and ZFS centric view of the world. We want to provide very reliable storage that can scale to hundreds of PetaBytes.
At the same time, we find there's a large homelab community that wants integrated storage, Apps and networking. Supporting both makes sense for our business.. we want a large and happy community, but we also want to support larger customers.

As pointed out, we still have work to do clustered hyperconvergence... we recommend using CSI in the meantime.

As far as I know, it's the standard ZFS dodge. Nexenta was one of the first to engineer the dual-head system, where you had a heartbeat in between two heads with a shared pool. Because ZFS is not a cluster-aware filesystem (a term that means that multiple hosts can simultaneously access the shared storage safely), the basic idea was that if the primary node got hung up, the secondary node would quickly do a forced mount, CARP failover, etc., and take over fileservice duties. It's a bit luddite, but with the ZFS SLOG and persistent L2ARC capabilities, it works reasonably well. Protocols like NFS and iSCSI are merely expected to reconnect in response to the inevitable RST from the acquiring head. In the dozen years since they did that, I am not sure anything better has been devised. I'm patiently waiting for iXsystems to one day offer me a nice small TrueNAS Enterprise system but that hasn't happened yet. :smile:

You deserve it...
If only we could download the hardware from GitHub :frown:
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Cue the MPAA's "you wouldn't download a car" campaign...

What, you want me to build a mobile NAS? Well I did just get 60 6TB drives from one of my clients, so let's see, that's 360TB, need five 2U chassis, and probably about 1500W power....

Grinch's crazy downloadmobile. Get a Starlink for the roof, 10 gig networking with Wifi 6E, ...
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Practical issues:

I've bootstrapped three VMs with the latest TrueNAS SCALE to try create a cluster and I've launched a Docker container with TrueCommand to manage it.

I managed to overcome the limitation that the current TrueCommand container only works with TrueNAS nodes that use "root" as the admin account and am now stuck in the cluster creation wizard, where it can't find cluster interfaces on the three VMs.

Evidently it wants distinct East-West interfaces there, but even after I added those to the VMs, they wouldn't become selectable and I've not found any documentation, hits or logfiles which allow me to understand what the wizard is looking for.
In fairness, this feature is still marked as experimental. While I certainly did also encounter the same oddities, it really does work. I believe the word on the street is that a more polished experience will hit in Cobia and whatever newer version of TrueCommand comes with it.
Now there does seem to be some cluster support built into modern CIFS/SMB variants and support for that is the major aim for TrueNAS SCALE clusters. For Linux clients TrueNAS clusters are simply relying on GlusterFS, with all the good and the bad that implies: do I read all that correctly or I missing something important?
That is the current architecture, yes.

The project is just starting to hit production enterprise systems with the most recent release, so obviously there is growth potential. On a personal note, I really like SCALE as a hypervisor/storage appliance in my homelab, and I am excited to see HCI "SCALE" out in more ways than SMB cluster :)

To me, the most interesting aspect of this conversation is the value add. Many of the technologies and products out there now that "scale-out" assume that each individual node in the cluster is expendable. I think a lot of this stems from Google's approach to servers in the early days. Am I the only one asking: Why is that the assumption? In every other aspect of the stack, we strive for redundancy. We have RAID, we have LAGs, we have DRS, we even have VMWare Fault Tolerance (which I still think is a friggen miracle that it works). Just because technologies exist to lessen the importance of some single critical systems, doesn't mean that use cases don't still exist. In Active Directory, we have a multi-master database so the death of any one DC isn't the end of the world. But what if that DC held roles that your others don't? What if that DC is doing LDAP auth for some random integral system? What about DirSync for Azure?

With IX's HA offerings, coupled with scale, there's certainly potential to ensure redundancy at all layers of the stack, whether it be ZFS for the individual drive failures, highly available controllers like @jgreco described above, and with things like Gluster for SMB so even an entire HA unit can fail. If we're chasing more 9s, SCALE today on an IX HA system's a good place to start.
I'm patiently waiting for iXsystems to one day offer me a nice small TrueNAS Enterprise system but that hasn't happened yet. :smile:
Me too, I just decided to buy an M50 on eBay instead. it's not even HA though o_O
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Many of the technologies and products out there now that "scale-out" assume that each individual node in the cluster is expendable. I think a lot of this stems from Google's approach to servers in the early days. Am I the only one asking: Why is that the assumption?

You're looking at it the wrong way. The goal through much of the previous century was to make servers bigger, so they could handle more load, and therefore they also needed to be more reliable, so you ended up with needing dual power supplies, link aggregation, memory mirroring, etc., and yet stuff didn't scale too well.

Back when FreeBSD was born, SOL (one of my companies) was heavily invested in Sun servers and workstation gear, but it was expensive and a bit feature barren. The PC architecture had stuff like well designed UARTs with buffering and interrupt mitigation, inexpensive SCSI and network controllers, etc. and I moved as much stuff as possible over to inexpensive FreeBSD systems as quickly as I could.

It was a focus on the RoI; you could spend a lot of pricey Sun gear to do a task, or instead spend a little on two or three FreeBSD servers to do that task. This meant that you could use design and strategy to mitigate failures rather than doing it through brute force 99.999% uptime strategies for a single expensive server.

And here's where we get to kinda the point Scale-out generally doesn't assume that each individual node in the cluster is expendable. Given enough nodes knocked out, the service will still fail. But you can make it resilient against individual failures, and you can do so at a much cheaper cost than buying the big "reliable" hardware.

For a bit of a deeper insight, I can tell you a bit about this, because I was doing radical server designs at the time, and at least within the Usenet industry, I gave what was the defining presentation on scalability and redundancy. The content explosion on Usenet in the mid 90's led to a race to store a vast quantity of data, and this made the SAN-based solutions used by some of the biggest industry players at the time impractical, because you'd get to a certain size and then you were at the maximum practical size. How much storage can you practically add to a big UNIX system? We didn't have ZFS at the time.

So the dodge was to create multiple storage nodes. The early ones were based on 24-drive-in-4U and then frontend servers could connect to dozens of these, allowing each frontend access to hundreds of drives worth of storage. Usenet messages were stored by message-ID, so this was essentially what today is called "object storage" but done long before that term existed. Any Usenet user who remembers companies like Newshosting in the early ought'ies probably remembers this stunning era of massive retention. I put a lot of time and effort into designing the software and strategies that made these things possible, because it was essentially making the network do the hard work of keeping lots of storage attached to the individual front end servers.

The loss of a frontend server was not significant because there were dozens of them. The loss of a backend storage server was a bit more critical, but there was at least simple redundancy for every article, and careful design of distribution through clever hashing kept hot spots from forming.

So the thing is, you're really just moving the redundancy up a level, and this has lots of benefits because it means that you don't have to try quite as hard to squeeze 9's out of each individual piece of gear.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
I kinda think we're saying the same thing here. You hit the nail on the head with the front end vs back end servers bit. Storage has always been a more conservative market when compared to other industry trends for the very reasons one might imagine. I guess all I am saying is, just because we have shiny new tech that we're bolting on to an older car, let's not throw the baby out with the bathwater. It's important to remember the lessons from christmas past. Redundancy should still exist at all possible layers, and single points of failure are going to give you a bad day.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Redundancy should still exist at all possible layers,

You can get rid of a lot of the costly redundancy, though. I don't think you need it at all possible layers. Stuff like memory mirroring serves primarily to line the pockets of Intel and vendors. Look at ZFS itself. You don't need a $1000+ RAID controller. Look at networking. You can use OSPF to build a redundant and resilient network that doesn't require pricey Juniper or Cisco networking gear. I spent an entire career showing people how to do it cheaper AND better. It's fun and fascinating. You need to focus on what is actually important, and I can tell you, that's often not what actually happens in the real world.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
You can get rid of a lot of the costly redundancy, though. I don't think you need it at all possible layers. Stuff like memory mirroring serves primarily to line the pockets of Intel and vendors. Look at ZFS itself. You don't need a $1000+ RAID controller. Look at networking. You can use OSPF to build a redundant and resilient network that doesn't require pricey Juniper or Cisco networking gear. I spent an entire career showing people how to do it cheaper AND better. It's fun and fascinating. You need to focus on what is actually important, and I can tell you, that's often not what actually happens in the real world.
Sure, but at the same time you are also advising folks to use an IR mode HBA for mirrored boot disk. We can certainly both be correct here :P
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Sure, but at the same time you are also advising folks to use an IR mode HBA for mirrored boot disk. We can certainly both be correct here :P

Can we? :smile:

The solution I suggested is functionally similar to the Dell BOSS card which retails for $110, but the H200 used is quite a bit cheaper. Plus the use case focus is a bit different, the point is that you can make stuff *known* to fail on a server much more reliable at a relatively low cost.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Listen I hear you, grey areas are grey and there has to be a point of “good enough” for a given application. I think a USB to SATA dongle is probably good enough for most folks as a boot drive and costs Linder 50 bucks :)
 

abufrejoval

Dabbler
Joined
May 9, 2023
Messages
20
Best to follow this documentation.... then report any issues
Was there a link meant to be included here? I can't find the documentation... and where would I look for log files?

I've tried to have a look at the Javascript code on the UI, but that does'nt seem to be made for easy debugging...

In the case of oVirt documentation was also extremely hard to come by, but there is an abundance of log files around everywhere. Unfortunately you have to first understand their architecture to know where to go hunt for log files, but that's not for here and now...
 

abufrejoval

Dabbler
Joined
May 9, 2023
Messages
20
Generally agree...Proxmox and Linbit both do good stuff.

TrueNAS has a more file and ZFS centric view of the world. We want to provide very reliable storage that can scale to hundreds of PetaBytes.
At the same time, we find there's a large homelab community that wants integrated storage, Apps and networking. Supporting both makes sense for our business.. we want a large and happy community, but we also want to support larger customers.

As pointed out, we still have work to do clustered hyperconvergence... we recommend using CSI in the meantime.
Sorry, but CSI stands for what? (pretty sure it's not Crime Scene Investitgations)...

But if you're planning to go there eventually, then I suggest you have a look at oVirt and consider taking it on, because they are hurting to find it a new home!

And it certainly does have the VM orchestration done rather well, it's the integration that was always a bit rough around the edges, but that was also because it tried to remain as open and flexible as possible, allowing just about every type of storage out there (and overlay networks).

Home-lab and scale-out use are largely antagonistic, but true scale-out is clouds using proprietary technology, nothing where TrueNAS could sell (nor RHV, which is why they're betting on OpenShift instead...).

But I can see HCI having an industrial edge use case, that carries over into a lot of other areas, from remote farms, to ships, trains, military etc. where you need fault resilience built from identical low cost lego pieces.

In a way that was what I was trying to create for my family (and for my colleagues at work): An IT infrastructure for relatively low compute but high-availability services at the price of a reasonable PC but without a single point of failure (or moving parts, as a critical element of achieving that).

So I started with three fully passive Mini-ITX Goldmont Plus Atoms (oVirt won't fit on Raspberry Pies) with N5005 10 Watt SoCs, 32GB of RAM, 1TB of SATA-SSD, 2.5Gbit Ethernet on USB3 and a 3-replica Gluster. No ECC option there, but I hear that certain Jasper Lake SKUs will actually do inline ECC...

The idea was that I could go on a business trip for a week or two, without having a wave of wailing phone calls from home, because the primary AD, NextCloud or whatnot was down and me next to the box to fix it.

Mind you, oVirt was never designed for low-cost hardware and I had to jump through quite a few hoops to make it work, but it has become super stable... also because the nodes never failed: Ultimately I only proved that I could do patching without down time, but that's another story.

In the corporate lab it proved its mettle far better. There I had an on-board RAID controller failure in a three node HCI Cluster with 2 replica + 1 arbiter Gluster storage, and the ops team replaced the mainboard with dirty data still in the BBU protected controller write-back cache...

I lost the VDO (Permabit dedup and compression) control structures and all data on the affected node, but thanks to the Gluster, things were back without real data loss within hours...
 

abufrejoval

Dabbler
Joined
May 9, 2023
Messages
20
You can get rid of a lot of the costly redundancy, though. I don't think you need it at all possible layers. Stuff like memory mirroring serves primarily to line the pockets of Intel and vendors. Look at ZFS itself. You don't need a $1000+ RAID controller. Look at networking. You can use OSPF to build a redundant and resilient network that doesn't require pricey Juniper or Cisco networking gear. I spent an entire career showing people how to do it cheaper AND better. It's fun and fascinating. You need to focus on what is actually important, and I can tell you, that's often not what actually happens in the real world.

We used to run Stratus machines for fault tolerance until 9/11 increased the radius of the bullet beyond planned scenarios. Today it's Linux systems built on "PC" technology with application level eventual consistency for the front-end, mindful of the CAP theorem I knew had to exist before I found it was already proven.

But the fabric needs to be more reliable than the nodes, otherwise you'll get a very expensive random number generator and with Ethernet, that's hard to do.

Since I had Thunderbolt NUCs in one cluster I tried to make that work (I was also hoping for higher bandwidth) with direct connect cables and IP over TB.

Actually only one of the three NUCs had dual TB connectors so I made that a router during the experiment using 64k packets for throughput...

Biggest issue there is that TB ports don't have a MAC and they will get a random one from the emulation layer on any topology event (restart or replug): nothing you'd want in a cluster...

I'm sure some type of topology discovery layer could fix that but the TB networking is limited to 10Gbit/s anyway (but with Infiniband like latencies!), so I went with TB Aquantia NICs instead.

I'd draw the line on ECC and pretty much at every layer from RAM across busses to storage and fabrics. Software has to do the rest.

Stratus type hardware redundancy is way too expensive and doesn't always work, either: e.g. we had a machine where the clock failed, the only non-redudant element in a Continuum machine, but without a working clock, it was just a pile of trash. And Status didn't even have a spare part in Europe, ...because clocks don't fail! We had kill a QA machine instead...
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Sorry, but CSI stands for what? (pretty sure it's not Crime Scene Investitgations)...

CSI is Container Storage Interface..... is allows K8S cluster (or Docker node) to manage storage,
 

abufrejoval

Dabbler
Joined
May 9, 2023
Messages
20
You're looking at it the wrong way. The goal through much of the previous century was to make servers bigger, so they could handle more load, and therefore they also needed to be more reliable, so you ended up with needing dual power supplies, link aggregation, memory mirroring, etc., and yet stuff didn't scale too well.
That's a lovely debate which might have cost man decades of debates, fights, tears, sweat perhaps even blood.

And the main issue is that clusters might have been thought up for one (scale-out) or the other (HA), but only seen in one of those two perspective by most. Not sure what Dave Cutler had in his mind originally, when they did the VAX clusters...

I don't know how much of my life expectation I've wasted hotly debating with people who felt that Oracle RAC solved all their problems: by the time I told them that Oracle MAA includes a standby which was togled from physical standby to logical on a RAC failure, they had long stopped listening, because the budget wouldn't allow them to do it all. I liked streams replication much better, because it gave me application control.

I'd completely agree on doing the practical thing: Having unreliable nodes is obviously the wrong thing to do, making your nodes much more resilient via local redundancy than the fabric is a waste of money and adds more problems than it solves.

HCI makes things a bit more complicated, because a VM has a lot more state at stake than a storage block. Too large an object on object storage is as much an issue, which is why Gluster and oVirt likes to chunk things into 64MB blocks underneath the file abstraction. Checkpointing VMs or running application clusters with negative affinity is the sort of thing that VM orchestrators support for good reason.

The fault tolerant VMs across two physical hosts is something I proposed to a VMware representative several years before they implemented it: He offered a job in return, but my wife didn't want to leave Europe. And to be honest, my only contribution was to make it work via a hypervisor, because a physical cluster solution using 4 PCs, two as I/O nodes, two to host the hardware-virtualized machine had been done by a company, whose name I can't remember any more.

Their major innovation beyond the classical Stratus approach was to not try using CPU lock-step but pure I/O, network and storage comparison, since their target was a fault-tolerant Windows terminal server.

(Turns out those then failed mostly because NT 4 had printer (& graphics) drivers moved from user mode (in NT 3.51) into kernel mode for performance, where most of those cheap printer drivers then turned out not to be thread-safe and ran entire TS populations into blue screens ;-))

To my understanding ZFS requires a certain degree of fault resilience in the hardware because it is all about aggressive caching. keeping as many things in RAM as possible for performance. I don't actually know if it goes as far as to truly only write when RAM is needed for something new or if it will flush buffers when there is truly nothing to do.

I also don't know if it does some background patrol checksumming both in RAM and on storage, because even with ECC bit-rot is just less likely not eliminiated. I also don't know if it keeps compression dictionaries and de-dup information mostly or or only in RAM, because single bit errors there could have catastrophic consequences, invalidating the consistency of huge amounts of data, while a flipped pixel in a movie will often go unnoticed.

It's one of the reasons I've never considered ZFS without ECC memory, because it seemed to actually rely on it, but that could have been sales people talking.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
To my understanding ZFS requires a certain degree of fault resilience in the hardware because it is all about aggressive caching. keeping as many things in RAM as possible for performance. I don't actually know if it goes as far as to truly only write when RAM is needed for something new or if it will flush buffers when there is truly nothing to do.
I think you've missed a lot of the details of what's going on behind the scenes there... there is a lot of use of RAM for things, but it's not really the only location of storage for async write data for long (just a few seconds at a time) and never for any pool critical data nor sync write data.

This article (and particularly the diagrams on page 3) may help with that. https://arstechnica.com/information...101-understanding-zfs-storage-and-performance
 

abufrejoval

Dabbler
Joined
May 9, 2023
Messages
20
I think you've missed a lot of the details of what's going on behind the scenes there... there is a lot of use of RAM for things, but it's not really the only location of storage for async write data for long (just a few seconds at a time) and never for any pool critical data nor sync write data.

This article (and particularly the diagrams on page 3) may help with that. https://arstechnica.com/information...101-understanding-zfs-storage-and-performance
Thanks for the link!

Well it looks like RAM isn't used nearly as aggressively as I thought. With all de-dup and compression data also on-disk, scrubbing should be enough, good to know.
 
Top