ZFS Design: Upgrade recommendations from current setup (special vDev, L2ARC, SLOG)

DigitalMinimalist

Contributor
Joined
Jul 24, 2022
Messages
162
Hello,

I currently run TrueNAS Scale (since many month with this pool design (see also my old thread)
Asrock Rack X470DU with 2700X and 4x16GB RAM ECC

4x 16TB Toshiba as striped mirror
2x Micron 7300 Pro 960GB mirrored as special vDev (metadata)
no L2ARC
no SLOG

With "sudo zpool list -v" I get the following picture.
First of all looks ok, but I observe that "only" 20GB of the 868GB are used on the special metadata device AND it's only showing 6 files at 1K????

Any suggestions?

sudo zpool list -v
1710680142394.png


find . -type f -print0 | xargs -0 ls -l | awk '{ n=int(log($5)/log(2)); if (n<10) { n=10; } size[n]++ } END { for (i in size) printf("%d %d\n", 2^i, size) }' | sort -n | awk 'function human(x) { x[1]/=1024; if (x[1]>=1024) { x[2]++; human(x) } } { a[1]=$1; a[2]=0; human(a); printf("%3d%s: %6d\n", a[1],substr("kMGTEPYZ",a[2]+1,1),$2) }'
1710681610793.png


sudo zdb -Lbbbs -U /data/zfs/zpool.cache storage
1710682395264.png


On top, I'm not happy with the power consumption, which is around 70-80W... not bad, but nor great either.

I did buy a Gigabyte MC12-LE0 for cheap with a Ryzen 4650G and also bought 4x32GB RAM ECC - In my test setup, this system uses 30W idle (no VMs, no load) with a similar hardware setup (3x NVME + 4x HDD).
I have the suspicion, that the Micron 7300 Pro are the bad guys here (I have 4, as I initially planned to 2x mirrored to use the system more heavily for Virtualization).

Though, here is the idea:

Rebuild my 24/7 TNS server (after Release of Dragonfish):
  • Gigabyte MC12-LE0
  • 4650G
  • 4x 32GB ECC RAM
  • X710-DA2 NIC
  • 4x16TB HDDs
  • 256GB Intel 760p for OS

I will only run a few virtualization tasks:
Proxomx Backup Server (PBS) as VM
optional: Jellyfin server (I could also use my Proxmox Hypervisor and only access the data from Server via NFS)

Open questions:
  • Continue to use special vDev for Metadata? If yes, I can get Intel P1600X Optane 58GB for 45 Euro :) (increase speed + lower power consumption)
  • L2ARC? gut feeling: I should be good with 128GB
  • SLOG: use another Intel P1600X Optane 58GB as SLOG?
  • Will it be a problem, if I have to setup the PBS VM (weekly automatic backups of my Hypervisor VMs/LXCs) on my storage pool with HDDs? I think, I can't install VMs on the special vDev, or the OS SSD? Correct?

My X470D4U will become a backup server with TNS:
X470D4U
2700X
2x 16GB ECC RAM (or 4x 16GB)
X520-DA2
HP H220 HBA (SAS)
256GB SSD for OS
9x 4TB HDDs @ RAIDZ1 (I bought some cheap 4TB SAS HDDs)
Leftover Chenbro Rackount Chassis
 
Last edited:

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Continue to use special vDev for Metadata? If yes, I can get Intel P1600X Optane 58GB for 45 Euro :) (increase speed + lower power consumption)
Special vdev requires redundancy. What's the actual benefit of the special vdev here?
If power consumption is a concern, do not use Optane; these drives are not designed with low power in mind.
L2ARC? gut feeling: I should be good with 128GB
Run arc_summary on your current system and get some hard facts…
SLOG: use another Intel P1600X Optane 58GB as SLOG?
Any sync writes in the workload?

My X470D4U will become a backup server with TNS:
X470D4U
2700X
2x 16GB ECC RAM (or 4x 16GB)
X520-DA2
HP H220 HBA (SAS)
256GB SSD for OS
9x 4TB HDDs @ RAIDZ1 (I bought some cheap 4TB SAS HDDs)
Leftover Chenbro Rackount Chassis
To lower power consumption, get fewer, larger, drives and drop the HBA.
 

DigitalMinimalist

Contributor
Joined
Jul 24, 2022
Messages
162
Thank you Etorix

What's the actual benefit of the special vdev here?
Special vDDev for Metadata is supposed to speed up my storage pool - and yes: redundancy is a must!

Run arc_summary on your current system and get some hard facts…
arc_summary: is the "Total Hits" the relevant one?
Also: the system is still running on 64GB RAM
1710689036860.png


Any sync writes in the workload?
What's the best way to answer the question? Which command can I run?

To lower power consumption, get fewer, larger, drives and drop the HBA.
The last one is my backup server, which will only run once per week: I don't care about power consumption here - only for the 24/7 server :)
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
All "hits" are of relevance. With "only" 64 GB RAM, you're already serving 99.9% of metadata requests from ARC and can't really improve near perfection. That special vdev is sitting very, very idle once ARC has warmed up. And L2ARC is definetely NOT warranted.

Sync writes would just be in the setup. iSCSI uses them. NFS defaults to sync, but this can be disabled for non-critical uses. SMB defaults to async.
My assumption is that "storage" has no sync writes; "software" might have sync writes for VMs, but this is another pair of 7300 Pro, right? There's not much to gain by adding a SLOG to PLP enterprise NVMe drives—a single SLOG might even be slower than the pair.
 

DigitalMinimalist

Contributor
Joined
Jul 24, 2022
Messages
162
Thank you,

exactly what my assumption is andd why the current ZFS design is built as is:
  • I mainly use SMB -> async -> no SLOG
  • I have plenty of RAM for ARC -> no L2ARC
  • I will not use many VMs in the future - PBS the only important - therefore I will not use the mirrored NVMEs for that in the next built
  • I heard many good things about special vDev in the L1Tech forum/YT, but if I look into the utilization... Either I'm doing something wrong, or it's not that useful than I hoped for...
I feel that I did something wrong with “ZFS record size” and “zfs set special_small_blocks” size…
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
If there are large files in there, you may try a larger maximal block size, but that is unlikely to make a noticeable difference.

You have lots of RAM and virtually all metadata remains in ARC. The special vdev would likely make a difference with 16 GB RAM, or with a few tens of these 16 TB drives in the pool. Here you've just applied litterally the advice that "ZFS loves RAM", and ZFS has indeed enough RAM to work its magic on a not-so-large pool without any further niceties.
 

DigitalMinimalist

Contributor
Joined
Jul 24, 2022
Messages
162
You are probably right and overcomplicating...
I just see that my initial setup was good, but not perfect for my use case and I realize that it is painful to change afterwards. Therefore I'm exploring some areas...

Anyhow, just ordered 2x Intel Optane P1600X 118GB from Amazon. Either I use them as special vDev, try SLOG, or use them as a boot drive in my W11 workstation
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Metadata vdev works great if you have many clients using SMB in parallel and potentially large directories. It definitely fixes the "windows explorer takes forever to display the contents" problem.

For a real life example: a dozen employees using the TrueNAS as a central time machine server.
 

DigitalMinimalist

Contributor
Joined
Jul 24, 2022
Messages
162
Metadata vdev works great if you have many clients using SMB in parallel and potentially large directories. It definitely fixes the "windows explorer takes forever to display the contents" problem.

For a real life example: a dozen employees using the TrueNAS as a central time machine server.
Probably it’s just ONE user = me…

I probably need to experiment a little bit, when I have my SAS HDD drives.

Test setup with 4x4TB and 2x 970 Evo NVME
  1. RAIDZ2 with special vDev
  2. Striped mirror with special vDev
  3. RAIDZ2
  4. Striped mirror
SMB filetransfer between W11 Workstation and Server via 10GBit NIC.
Which command do I need to use to deactivate ARC?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
@DigitalMinimalist Have you tuned your record size and Metadata (Special) Small Block Size? If things are left as default then I am not suprised the special device doesn't get a lot of use. This is a manual process and you need to understand what you are doing in order to make effective decisions.

For example I have a dataset that is full of small files, approaching half a million files with 2-3 or so files per folder. These files are almost all small. I set the small block size to cover most of the files and set the record size to be the next increment upwards (do not set it the same). I have 215G of small files in the special vdev (out of 744G) which is the vast majority of the small files.

For that particular dataset I use a record size of 256K and a small block size of 128K. The folder is 289GB with 215G of files in the special device.
 

DigitalMinimalist

Contributor
Joined
Jul 24, 2022
Messages
162
Yes, I left the small block size at default, which I assume (since today) is zero.

I just set it to 32k, as this is equivalent to around 120GB. I also assume, 32k will now just work for new files in the pool…
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Correct - but as you say - only on new files.
If you want to populate the sepcial vdev then you will need to write every file. You can do this by replicating the dataset to a new dataset (on the same pool), destroy the old one and rename the new one. You will need some downtime on the dataset though
 

DigitalMinimalist

Contributor
Joined
Jul 24, 2022
Messages
162
I probably wait for my refurbished SAS HDDs for my backup server: copy all content over and start from scratch.

Only remaining question is if I have a significant advantage of special vDev in my one user scenario.

I probably should also consider to switch to RAIDZ2 instead of striped mirror.
My initial thought for a 4 drive setup was that I anyhow have 50% lost and that striped mirror has a little bit more performance, resilvering should be easier and it’s expandable, if I would need to add another mirror…
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
And the answer - it depends. In mu use case I saw a significant advantage storing the small files on SSD rather than HDD. But it is quite an extreme case I think
 
Top