- Joined
- Jan 1, 2016
- Messages
- 9,703
So I look at YouTube sometimes and I get recommended things about TrueNAS and generally from some folks that I watch a bit.
TechnoTim is one of those folks and he made a video about TrueNAS, so I guess that put it high on my list of recommended videos.
I watched it. I had high hopes for what it might do to help people understand how to get good performance and it sort-of does some of that, but I feel compelled to call out several key errors in understanding about ZFS that came up during the video:
If you watched that video, be sure to read the below to not be confused about those things.
I'll go by concept:
ZIL/SLOG
ZFS Intent Log (ZIL) is something that every ZFS pool has.
Separate LOG (SLOG) is used (if a disk or disks are provided to the pool for it) as the ZIL (not in addition to it as shown in the video).
Writes go to RAM and the ZIL and are then flushed from RAM to the pool disks (never from the ZIL/SLOG itself unless recovery is invoked due to some kind of loss).
This Ars Technica article covers it well on page 3: https://arstechnica.com/information...-understanding-zfs-storage-and-performance/3/
You don't really "need" to mirror SLOG as it's already a backup and is not used except in the cases where you would lose data still in RAM when the system can't keep that data (power cut or whatever else).
What is recommended is a very fast SLOG with Power Loss Protection (PLP) ... usually an Intel Optane. (full reading here: https://www.truenas.com/community/threads/slog-benchmarking-and-finding-the-best-slog.63521/)
SLOG can help to smooth out performance peaks on a busy system, but it's only ever going to deal with reducing the write penalty of sync writes, never "caching" writes as such (that can't be done for sync writes)... it's only "protecting" those writes which would otherwise be left "hanging" only in RAM while waiting for the next transaction group write.
I really don't like the implication of the wording in the video that the writes in SLOG ("can later be written out to disk"... of course it's from RAM, not SLOG almost all the time, but my point is that it's not really keeping it there very long)... Transaction groups go to disk every 5 seconds and the most that will be held in memory is 3 transaction groups... 1 "new" one as needed to get ready to replace the "current" one, the "current" one where writes are going until it's closed and the "last" one, currently being written out to disk. (sense and logic says that at absolute most, that's 15 seconds of writing you need to cover... realistically just over 10... let's say 100Gbit/s... 12.5GB/s... x15 = 187GB... if you can saturate a 100Gbit network)
Also a good reference in general for understanding SLOG: https://www.truenas.com/community/threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/
ARC
As already pointed out in the video comments by @LawrenceSystems ARC tweaking isn't needed in Dragonfish as memory behavior is "fixed" to be like CORE in that version (and presumably subsequent ones).
L2ARC
Level 2 Adaptive Replacement Cache can be useful for some people, but only in limited circumstances.
You should really have 64GB of RAM before you consider using it.
You should see in
There are improvements to ARC coming:
And you can set some (or all) datasets to store only metadata in L2ARC to get the benefits to file listing on large directory structures, particularly to help with things like rsyncs of large structures:
and for interest:
Snapshots
I get the impression that Tim has not quite understood that snapshots aren't the data...
A snapshot is a copy of the block allocation at a certain point in time which freezes all referred blocks from deletion in order to be able to present that "snapshot" or dataset/ZVOL/pool as it stood at that moment in time until the snapshot is released (destroyed).
That means that a snapshot is literally never a "backup" as it includes (almost always) the blocks themselves holding the files (and maybe some that were deleted) in the spot where they were at that moment... if that "original" is lost, the snapshot can't help.
"Copying" a snapshot is really just reading out the blocks as they were at the time that snapshot was taken, so this can be very useful for the creation of a backup (which it's great to not have files in the middle of being changed while you do) and it can also be great for quickly getting at copies of files deleted since the snapshot(s) were taken.
Mirrors
While mirrors are great at doing performance (when the pool was created with all member VDEVs present), if additional VDEVs are added to create more space in the pool, there's no automatic process to re-distribute (referred to as rebalance) the data evenly across all the VDEVs including the new one.
What this means is that the new VDEV will often be chosen as the location of new blocks ahead of the more full existing VDEVs, potentially reducing your IOPS.
You can do nothing and not care about it if you're just storing files/media that don't need the IOPS, but if you're doing VMs/block storage, you need those IOPS to continue to go well and/or even improve now you have an additional VDEV worth of IOPS...
Using a rebalancing script like this one: https://github.com/markusressel/zfs-inplace-rebalancing may help with that, or you can just backup, recreate your pool and restore the data to have it balanced.
Finishing notes
Overall, I don't want this to come across as a "panning" of the video... not enough of us do those kind of things that would make it easier for new folks to get into the technology and get a setup that works for them.
I applaud the effort and intent of the video, but it would be great if it didn't result in the ~65K viewers (at time of writing this) coming out with the misguided ideas about how ZFS works and getting into issues where they think it's not what they signed up for.
I like Tim's approach to his lab and technology in general and he covers a lot of interesting subjects in a very watchable and well-resourced way. I hope he might eventually see this and revise (or create an updated version of) his video.
TechnoTim is one of those folks and he made a video about TrueNAS, so I guess that put it high on my list of recommended videos.
I watched it. I had high hopes for what it might do to help people understand how to get good performance and it sort-of does some of that, but I feel compelled to call out several key errors in understanding about ZFS that came up during the video:
Getting the Most Performance out of TrueNAS and ZFS
After setting up your TrueNAS server there are lots of things to configure when it comes to tuning ZFS. From pools, to disk configuration, to cache to netwo...
www.youtube.com
If you watched that video, be sure to read the below to not be confused about those things.
I'll go by concept:
ZIL/SLOG
ZFS Intent Log (ZIL) is something that every ZFS pool has.
Separate LOG (SLOG) is used (if a disk or disks are provided to the pool for it) as the ZIL (not in addition to it as shown in the video).
Writes go to RAM and the ZIL and are then flushed from RAM to the pool disks (never from the ZIL/SLOG itself unless recovery is invoked due to some kind of loss).
This Ars Technica article covers it well on page 3: https://arstechnica.com/information...-understanding-zfs-storage-and-performance/3/
You don't really "need" to mirror SLOG as it's already a backup and is not used except in the cases where you would lose data still in RAM when the system can't keep that data (power cut or whatever else).
What is recommended is a very fast SLOG with Power Loss Protection (PLP) ... usually an Intel Optane. (full reading here: https://www.truenas.com/community/threads/slog-benchmarking-and-finding-the-best-slog.63521/)
SLOG can help to smooth out performance peaks on a busy system, but it's only ever going to deal with reducing the write penalty of sync writes, never "caching" writes as such (that can't be done for sync writes)... it's only "protecting" those writes which would otherwise be left "hanging" only in RAM while waiting for the next transaction group write.
I really don't like the implication of the wording in the video that the writes in SLOG ("can later be written out to disk"... of course it's from RAM, not SLOG almost all the time, but my point is that it's not really keeping it there very long)... Transaction groups go to disk every 5 seconds and the most that will be held in memory is 3 transaction groups... 1 "new" one as needed to get ready to replace the "current" one, the "current" one where writes are going until it's closed and the "last" one, currently being written out to disk. (sense and logic says that at absolute most, that's 15 seconds of writing you need to cover... realistically just over 10... let's say 100Gbit/s... 12.5GB/s... x15 = 187GB... if you can saturate a 100Gbit network)
Also a good reference in general for understanding SLOG: https://www.truenas.com/community/threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/
ARC
As already pointed out in the video comments by @LawrenceSystems ARC tweaking isn't needed in Dragonfish as memory behavior is "fixed" to be like CORE in that version (and presumably subsequent ones).
L2ARC
Level 2 Adaptive Replacement Cache can be useful for some people, but only in limited circumstances.
You should really have 64GB of RAM before you consider using it.
You should see in
arc_summary
that you are getting a lot of misses/evictions and that's hurting your read speeds for files already read multiple times (more than twice)... implying that your "working set" (files you use for regular activities) is bigger than ARC.There are improvements to ARC coming:
ZFS "ARC" is getting smarter with version 2.2+ :smiley_face_emoji:
This is a continuation and update of this thread: https://www.truenas.com/community/threads/zfs-arc-doesnt-seem-that-smart.100423/ Looks like OpenZFS 2.2+ introduces some cleaned up and rewritten code¹ to more intelligently and gracefully handle ARC data / metadata eviction from RAM. It's such...
www.truenas.com
And you can set some (or all) datasets to store only metadata in L2ARC to get the benefits to file listing on large directory structures, particularly to help with things like rsyncs of large structures:
L2ARC impact on rsync performance for largely dormant data
I've been doing some more testing and the evidence suggests that using a L2ARC with a metadata only flag has a potential substantial positive impact on rsync performance over AFP. For largely dormant data, the combination of L2ARC dedicated to metadata seems to benefit rsync performance a lot...
www.truenas.com
Impact of sVDEV on rsync vs. L2ARC
Good evening, A few years past, I looked into the impact of L2ARC on rsync performance with FreeNAS. Back then (i.e. FreeNAS 11.x and earlier), the L2ARC was not persistent and for my use case (metadata only) it usually took about three passes before the L2ARC got "hot" (as of TrueNAS 12, the...
www.truenas.com
Snapshots
I get the impression that Tim has not quite understood that snapshots aren't the data...
A snapshot is a copy of the block allocation at a certain point in time which freezes all referred blocks from deletion in order to be able to present that "snapshot" or dataset/ZVOL/pool as it stood at that moment in time until the snapshot is released (destroyed).
That means that a snapshot is literally never a "backup" as it includes (almost always) the blocks themselves holding the files (and maybe some that were deleted) in the spot where they were at that moment... if that "original" is lost, the snapshot can't help.
"Copying" a snapshot is really just reading out the blocks as they were at the time that snapshot was taken, so this can be very useful for the creation of a backup (which it's great to not have files in the middle of being changed while you do) and it can also be great for quickly getting at copies of files deleted since the snapshot(s) were taken.
Mirrors
While mirrors are great at doing performance (when the pool was created with all member VDEVs present), if additional VDEVs are added to create more space in the pool, there's no automatic process to re-distribute (referred to as rebalance) the data evenly across all the VDEVs including the new one.
What this means is that the new VDEV will often be chosen as the location of new blocks ahead of the more full existing VDEVs, potentially reducing your IOPS.
You can do nothing and not care about it if you're just storing files/media that don't need the IOPS, but if you're doing VMs/block storage, you need those IOPS to continue to go well and/or even improve now you have an additional VDEV worth of IOPS...
Using a rebalancing script like this one: https://github.com/markusressel/zfs-inplace-rebalancing may help with that, or you can just backup, recreate your pool and restore the data to have it balanced.
Finishing notes
Overall, I don't want this to come across as a "panning" of the video... not enough of us do those kind of things that would make it easier for new folks to get into the technology and get a setup that works for them.
I applaud the effort and intent of the video, but it would be great if it didn't result in the ~65K viewers (at time of writing this) coming out with the misguided ideas about how ZFS works and getting into issues where they think it's not what they signed up for.
I like Tim's approach to his lab and technology in general and he covers a lot of interesting subjects in a very watchable and well-resourced way. I hope he might eventually see this and revise (or create an updated version of) his video.
Last edited: