request for comments on 8 disks layout

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
...

- Instead of writing directly to the main pool consisting HDDs, I can write to a pool consisting only NVMe SSDs and then move the data to the main pool in the background. I just dont know how easy this would be operationally.
...
Yes, you can do this. It's all manual, meaning you would choose where to write the data, (NVMe SSDs in dedicated pool). And then you can have a cronjob or background task notice the new file and move it over to the bulk storage pool.


This particular behavior has been talked about for years, (more than a decade?). People want a fancy fusion of high speed writes on smaller SSDs, which then get auto-migrated to bulk storage. (To make room for more high speed writes.) And they want the data security & redundancy of ZFS while doing that.

TrueNAS and ZFS simply don't support this behavior now. And I know of no plan in place to add it to either TrueNAS or OpenZFS.


Sun Microsystems & StorageTek both had something that did this automatically. I knew it by the name ASM-FS at StorageTek, but Sun called it SAM-QFS;
This had nothing to do with ZFS, (or TrueNAS).

SAM-FS / QFS / ASM-FS used multiple tiers, like high speed disk, (Fibre Channel back then), low speed bulk disk, (SATA), and tape. The file system looked normal and could be exported via NFS. Any writes would end up on the high speed disks. As the high speed disks filled up, the least recently used would be migrated to low speed bulk disk. Then even to tape.

Various options existed, like number of copies, (2 copies on 2 different tapes to be considered completely written). Or only 2 tiers, disk & tape.

With today's technology, like PCIe 4.x NVMe SSDs & huge bulk storage disks, (like 20TB SATA), something like that would be useful addition to layer on top of ZFS. But, as I said, no one has any plans to do so as far as I know.
 
Last edited:

metebalci

Dabbler
Joined
Jan 18, 2023
Messages
28
I realized, I think, what I was trying to ask and optimize was txg size (dirty_data_max etc.). It is nice to optimize this and see the effect while monitoring the txg sync events with dtrace.

There is something I dont understand. For testing, I have a 3x HDD pool (striped), and 1x to 4x NVMe SSD pool (striped). When I write something (SMB, large file), I can correlate the speed I see to the layout, disk speeds, txg size etc. But when I read the same file back, I always see ~350MB/s, but I think any of these layouts should have a much better streaming read performance. What am I missing ?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Did you clear the ZFS ARC, (Adaptive Replacement Cache), after writing but before reading?

ZFS might be caching the file in memory. And 350MBps, (assuming you meant Mega-Bytes), would translate to approximately 2.8Giga-bits per second. So I guessing you have >1Giga-bit Ethernet. That speed might be a limiting factor either on the NAS side or the client side.
 

metebalci

Dabbler
Joined
Jan 18, 2023
Messages
28
Did you clear the ZFS ARC, (Adaptive Replacement Cache), after writing but before reading?

I didnt do anything and that was actually another question I had in mind. My understanding is data being written is not cached for read, is that the case or ?

ZFS might be caching the file in memory. And 350MBps, (assuming you meant Mega-Bytes), would translate to approximately 2.8Giga-bits per second. So I guessing you have >1Giga-bit Ethernet. That speed might be a limiting factor either on the NAS side or the client side.

(Yes I use B for byte, b for bit)

I have 10G network and verified the speed, even with single flow (iperf3) it is ~9Gbps. So raw network is not the limiting factor and I can write with >1GBps (windows SMB file copy) speed, so I dont get the read speed. As it is same for HDD pool and NVMe SSD pool, it might be something else but what.
 
Top