rungekutta
Contributor
- Joined
- May 11, 2016
- Messages
- 146
@HoneyBadger thanks for the nuance. I like nuance. I find that’s often where “truth” lies rather than in absolutes, particularly if they don’t move with the times and evolving circumstances.
I saw the SNIA presentation, thanks. The gist of it seems to be that with modern NVMe drives, L2ARC is more broadly useful including for streaming/sequential workloads (previously at risk of instead becoming a bottleneck vs arrays of spinning disks). Useful tuning advise in there also.
Look, here are the major points I’m trying to get across:
All this combined strongly suggests that the efficient frontier in terms of “best bang for the buck”, given a finite budget, also needs to move with the times.
For what it’s worth, my own experience:
In my view, recommended advise should be
[1] @jgreco I’m not going to go as far as going through ZFS release notes and look for the changes myself, but there are compelling internet sources that suggest you are wrong, including Jim Salter who maintains the OpenZFS Development Roadmap (so he should know), and also writes on e.g. Arstechnica – including “[...] The issue of indexing L2ARC consuming too much system RAM was largely mitigated several years ago, when the L2ARC header (the part for each cached record that must be stored in RAM) was reduced from 180 bytes to 70 bytes.” (https://arstechnica.com/gadgets/202...get-a-persistent-ssd-read-cache-feature-soon/). But you seem convinced that this has never changed - what are your sources…?
Edit: typo
I saw the SNIA presentation, thanks. The gist of it seems to be that with modern NVMe drives, L2ARC is more broadly useful including for streaming/sequential workloads (previously at risk of instead becoming a bottleneck vs arrays of spinning disks). Useful tuning advise in there also.
Look, here are the major points I’m trying to get across:
- SSD prices per byte have come down much quicker than RAM. Compared to 10 years ago RAM prices are roughly 1/4th and very volatile, SSD prices about 1/20th and much more stable (https://jcmit.net/memoryprice.htm, https://jcmit.net/flashprice.htm)
- Meanwhile SSDs are (or at least can be) faster by factor 10-20x through the same time period, thanks to combination of improved chip and controller design with protocol changes (SATA -> NVMe).
- ZFS has evolved too, with less memory overhead from L2ARC everything else equal (@jgreco I know you don’t agree but I think you’re wrong, see below [1]) and some smart design choices too (as alluded to by @HoneyBadger ).
All this combined strongly suggests that the efficient frontier in terms of “best bang for the buck”, given a finite budget, also needs to move with the times.
For what it’s worth, my own experience:
- Memory pressure from L2ARC significantly less than one would expect from other threads similar to this one. It suggests that many on this forum have a preconceived and inaccurate views on the trade-offs in terms of benefits of L2ARC vs associated negative effect on ARC.
- Some of the L2ARC parameters perhaps most notably vfs.zfs.l2arc_write_max default to ridiculous values in 2023 (8MB) and need to be adjusted upwards for L2ARC to be effective.
- HOWEVER in doing so, more pressure will be put on the L2ARC SSD(s) – make sure your L2ARC doesn’t actually become the bottleneck. My starting point was that “any NVMe must be faster than my spinning RAIDZ2”, but found that to be wrong and the SSD actually pegged at 100% and rather slowed things down under certain workloads. So upgraded from cheap generic NVMe to Samsung EVO and had much better experience all round. (Have since moved on to Optanes and a much beefier setup overall.)
In my view, recommended advise should be
- Check and know your stats – ARC and L2ARC hit rates
- Keep an eye on the RAM usage for L2ARC headers (l2_hdr_size). Manage this trade-off vs ARC for your workload (based on hit rates). If you know what you’re doing you can adjust the ARC fraction which is allowed to be used for this (/sys/module/zfs/parameters/zfs_arc_meta_limit_percent).
- Tune vfs.zfs.l2arc_write_max, vfs.zfs.l2arc_write_boost and possibly l2arc_headroom (how far through the ARC lists to search for L2ARC cacheable content, expressed as a multiplier of l2arc_write_max, default 2, higher value means smarter evictions from ARC to L2ARC but at the cost of CPU).
- In line with tuning up (3), be careful so your L2ARC doesn’t become the bottleneck, check drive utilization and realise that NVMe is just a protocol and suppliers can still build crappy drives on top of it. 32GB NVMe Optane sticks are cheap and quite fast, Samsung EVO generally seems a good compromise too.
[1] @jgreco I’m not going to go as far as going through ZFS release notes and look for the changes myself, but there are compelling internet sources that suggest you are wrong, including Jim Salter who maintains the OpenZFS Development Roadmap (so he should know), and also writes on e.g. Arstechnica – including “[...] The issue of indexing L2ARC consuming too much system RAM was largely mitigated several years ago, when the L2ARC header (the part for each cached record that must be stored in RAM) was reduced from 180 bytes to 70 bytes.” (https://arstechnica.com/gadgets/202...get-a-persistent-ssd-read-cache-feature-soon/). But you seem convinced that this has never changed - what are your sources…?
Edit: typo
Last edited: