@HoneyBadger No worries your posts are great! No feeling of condescension and I think you sell your self short calling it rambling ;)
The "archival workflow" is fairly well understood and behaves nicely in ZFS. Dumping big files in, and deleting them rarely or never, tends to work great. Even when you're filling the pool up close to the maximum ("up to but not exactly 100%") capacity, the fill pattern still leaves you with a lot of contiguous free space, and deleting files in large chunks results in a large amount of space being freed at once.
Benchmarking this workflow is easy, but probably will be already well-understood.
That is actually a large part of my data. Which is video I collect but rarely delete. I know I should see a psychiatrist for that but my current solution is cheaper :P
The bench marking is mostly because no one is actually able to give me any proper numbers on performance. Also just for fun of course :P
As soon as you go to smaller granularity, or even worse doing "update-in-place" of files or block devices, the nice contiguous free space gets covered with a finely chopped mixed of those in-place files, as well as ruining the ability to sequentially read from the underlying vdevs. If you write a 1GB file, and then start updating random blocks of 1M in the middle of it, you'll end up with things out of order and have to seek around to read. Do that with ten 1GB files and it gets even worse. Do it with 100 50G .vmdk's worth of data on block storage - and you've basically asked your drives to deliver I/O at random across a 5T span of data.
Yes, that sounds about right.
Now, this "steady state" can absolutely be benchmarked, but the question is "what is the value of that benchmark?"
Hmm. So yeah good question, there are a number of reasons. First off because it's something I haven't done before so it's interesting. Second I have seen many misleading posts about performance in these cases (not intentionally misleading of course), they either use a bad testing methodology or test on a clean zPool. Other complain about it not performing which is often the case of mis configuration/full zPool/Old zPool with loads of fragmentation/or just plain stupidity.
Now with a synthetic benchmark I could simulate my use cases this would allow me to:
- determine the optimum setup
- determine a better metric then 50% or 80% is the max you should fill it (which is advice that honestly gives me the shivers)
- plot performance degradation to fragmentation and fill rates
- create predictions on performance degradation (or penalties) based on fill rate fragmentation and fill rates based on certain use cases
- possibly provide a tool for interested parties of the community (if there are any, always good to give something back)
It will definitely tell us something we already know, that being "spinning disks suck at random I/O." But the question I would ask it "do you really need to hit that 5T span of data at performance-level-X, or do you realistically only need to hit 500G of it that fast?" Because that's where something like a huge ARC (with compression) and L2ARC devices start to come into play. With more hits to your RAM and SSD, and suddenly your spindles have more free time to deliver the I/O requests that miss the cache. Maybe it's a VM datastore or NFS export, you're backing your VMs up nightly or weekly, and you will hit all of that 5T span, you don't care too much that it takes a while as long as it finishes inside your backup window, but it can't tank the rest of your running VM performance.
Agreed. I am going to need to look at what is best for my situation. I think I will have at least 4 zPools at this point:
- Archive like storage, Movies/Series/Binaries/Audio
--> Read is fairly important but realistically doesn't matter much other then I want it to be fast just for that once in a blue moon I want to export it to somewhere or something. It should at least be able to support 3x 4k compressed stream with decent quality audio. Which should be peanuts in most configs
--> Write not really important except for when I want to import a large library from somewhere else. So the only reason is because it would make feel good to be able to have high write speeds :P I mean what else will I brag about to my friends ;)
- Document and photo storage, data protection is key here (this is backupped, but I prefer not to have the hassle to need the backup!)
--> Mostly fairly small files no real need for high speed other then bragging right and the luxury of things going fast
- Download disk
--> Fairly high performance required however deleting and recreating the pool every few months is not and issue and I would probably be able to automate this
- Block storage for VM's and assorted reasons
--> Not really clear on this yet. But probably will be the last thing I add
For my workstations/Desktops I will actually just be adding another SSD and install it in the PC if I need since this is probably cheaper if I need performance.
The only true benchmark is you (or someone with the same workflow) actually using the storage. You can definitely make observations and extrapolations from someone else's experience, but it's difficult to try to "boil it down" to just a single number, graph, or report sheet. Bandwidth, latency, IOPS, the size of the working set, all of this will have to be taken into account. But at the same time it's important to have objective metrics, because what's "fast enough" for one person might be "intolerable" for another.
Agreed. So my long term goal is to make data lake where all this data from many users goes. With this you start to mine the data for commonalities give numbers which aren't in the real of "fast enough" but.
- zPool 2x2TB mirror vdev
-> average speed: 100 MB/s read/write
-> 80% filled 10% fragmentation average: 90MB/s read/write
-> 80% filled 90% fragmentation average: 5MB/s read/write
this way user get actual numbers to determine what choices to make.
I'll see if I can manage to get something more coherent into text to help you out with some workflow and benchmark ideas, but I'd suggest checking the ground already trod by others with tools like VDbench, HCIbench, or diskspd for simulating "real world" setups in a scalable and programmatic scenario.
Nice was sure I wasn't the first one to think of this. Will look into those!
Thnx!