Thank you for your answer, my friend! Let's look at it word by word.
When you get to the point of that many VMs, you need to have a serious conversation with them about providing you with capital to build a SAN that can support the workload. And at that point, hopefully you're into the world of hot-swappable NVMe U.2 bays.
I agree. But right now I need to start with what is given to me.
Here's a question for you: How quickly would you be able to connect to either the web UI or a console shell of the FreeNAS machine in case of an SLOG device failure?
From immediately to half an hour. It depends. But this is not the bank or nuclear power station or something mission-critical. They will forgive me a few hours of downtime. Though I don't want it to happen anyway.
If you have a single SLOG and a large sync write workload, and there is a failure, sync writes will grind to a halt and you're very likely (almost guaranteed) to have all VMs on the affected storage grind to a halt and become unavailable, unresponsive, and possibly crash entirely. This is what a mirrored SLOG is designed to protect against.
Thank you. I have done this kind of projects before and in mission-critical environments, too. I know the pool which I'm entering to swim inside.
In a 2-drive SLOG mirror, ZFS will be waiting for the top-level log vdev to return "write complete" before it acknowledges back up the chain. And in order for that to return, you'll have to write the same data down two separate wires. So there's no parallelism to be gained there; in fact, you'll be limited by whichever of the two drives is slower.
Yes, that's why I am speaking raid0+1 - so the writes are going into 2 parallel wires to return the success code to the party which initiated the
synchronous write request.
Yes, identical drives would, in theory, be just as fast - but if one of them is in the middle of garbage collection, TRIMming a block, or something else, you'll be waiting that extra few microseconds.
Good point. I will think of it. But note, I'm speaking 4 (four) SSDs raid0+1 and I know how to TRIM so no garbage collection will happen during the working hours.
With a "striped" SLOG (or a 4-drive "stripe of mirrors") then it will still be waiting for the top-level log vdev - the difference being that there will be two vdevs underneath that can serve two separate sync writes.
Yes, and I think it will already a gain (though thanks, my idea of splitting the queue of sync writes into 4 queues was wrong, actually it will be split into 2 queues).
It does increase the effective latency because you're now "serving two requests at once" but it's not as simple as being "2x the speed."
Hmm. Now I am curious. With a single SSD, I get a writing queue of length Ν. With raid0+1 I get the writing queue split into 2 queues of length N/2. The latency should decrease? or not?
Short answer: Yes, it will work. You may have to manually configure it through the command line though
Thank you, for me, this is zero problem, I'm familiar with UNIX command line since about 2.9BSD, like this.
The main bottleneck is the write latency of the SLOG device, which is the major contributor.
Yes, I acknowledge this, and this is exactly what I'm researching now.
Let me explain to you some tech details of the whole project. There are some people who are doing (and teaching, and learning) some heavy computer graphics. I neither know nor I am willing to share the exact details. Right now I have about 30 users of this stuff waiting for it to start working. They decided to acquire a
single shared resource, namely AMD FirePro™S9150 (or 9170?) mega-GPU, and to put it inside the externally hosted ESXi server and use it from there. So the use case is:
- the user is sitting elsewhere, with whatever device he has, I don't care,
- the user starts his personal VM remotely inside ESXi,
- the user uploads some heavy files to his VM (I decided to store all user data outside of his VM's vmdk but on the FreeNAS instead) - the first case of workload, but this is the light one because the user's Internet connection is 100Mbps or less,
- the user launches some software inside his VM which is dealing with these graphics files, I don't really care what the software it would be,
- the software inside the VM reads the data from the FreeNAS (this is easy), processes it inside ESXi VM using the (shared) mega-GPU, and stores the processed heavy files back onto FreeNAS, and this is the hard part of the story, and I have no clues how good the compression of the user's data will be (0% or 30% or 50% - no clues as of now),
- multiply this use case by 30+ users already, in parallel, more to come.
The users want it to work fast and safely, and their data to be safe.
Your next bottleneck is the protocol (assuming an equally fast device, NVMe would beat SATA, which would beat SAS).
Right now, I am limited to what was given to me - Supermicro chassis with old X8 motherboard, 2 Xeons, 48 GB of RAM and 3 LSI 2108 HBAs. NVMe? Maybe next time.
And finally your last bottleneck would be any ZFS overhead from SLOG vdev layout.
Huh, finally, we got there. Will the raid0+1 SLOG layout decrease the latency by a factor of (almost) two, or not?
In regards to your other thread's question, compression is applied to the data before it is written to SLOG, so you will gain the "effective write speed" benefits there as well.
Wow that's great! Thank you!
Another (theoretical) case - SLOG on 4 striped raid0 SSDs. Will it be faster, or not?