Any shared lock/resource between zvols on different pools?

marmoset

Dabbler
Joined
Dec 18, 2020
Messages
27
I'm running into a performance problem and have reduced it down to a simpler case than I saw it, but the core issue is that if a zvol on one pool is maxed out on writes, then a zvol on a different pool is affected.

I have two pools, bulk and nvme. bulk is a 8x6TB SAS raidz2 spinning drives, nvme is a mirror of two 2TB nvme drives.

The only thing on the nvme pool is a zvol that a VM is using.

I have a zvol on bulk where I'm importing (dd over ssh) from a different server, and it maxes out the bulk drives (~200MB/s), and after a little while (less than a minute) any disk access on the VM is very slow, one example:

Code:
root@testing:~# time (touch asdfasdf ; rm asdfasdf)

real    0m17.363s
user    0m0.002s
sys     0m0.003s
root@testing:~#


It's bad enough that doing much on the VM will cause the sata controller (in the VM) to reset and the filesystem to be remounted read only.

Doing things in /mnt/nvme performs as expected, which is what led me to believe it was a zvol bottleneck.

This is on TrueNAS-SCALE-22.02-RC.1, but also occurred in BETA.2 (and in proxmox fwiw).

Server is running everything as defaults, has 2x8/16 cores, 96G of memory.

Please let me know if there's any more details I can provide or tests I can run.

Edit: after rebooting that VM, turns out that it corrupted the zvol and it can't even boot, which explains why I had some VMs that this happened to in the past week.

Thank you!
 
Last edited:

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
The ARC is shared....that may include dirty data.

Have you set the slow pool as sync = always, standard or never?
 

marmoset

Dabbler
Joined
Dec 18, 2020
Messages
27
The ARC is shared....that may include dirty data.

Have you set the slow pool as sync = always, standard or never?

I had not, it was on standard, I tried the other two with identical results.

It goes into the heavy IO state on the VM (99% iowait) after about 4GB of transfer, which seems weird.

Just to try it out, I also made a VM with the zvol on the slow pool and it had the same issue at the same time.
 

marmoset

Dabbler
Joined
Dec 18, 2020
Messages
27
Another data point which would seem to indicate it's not ARC related (at least from my understand and I could easily be wrong), if I do:

dd if=/path/to/other_zvol bs=1M |ssh root@truenas dd of=/mnt/bulk/somefilename bs=1M

it is *not* impactful to the nvme pool, compared to the thing that is:

dd if=/path/to/other_zvol bs=1M |ssh root@truenas dd of=/dev/zvol/bulk/disk-zvol

and kind of interesting, even doing the first one (output to a file in the dataset instead of a zvol) it's also not impactful to the VM with a zvol on the bulk pool, they just both slow down a little (as you'd expect if you're maxing out writes) instead of going to the blocking for so long stuff breaks failure mode.

If there's a better zfs resource place to ask about this LMK, and I'll try there, thanks.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
4GB is about the size of the ZFS transaction group... after that it has to wait for the 4GB to flush to disk. Since its a single VDEV with slow drives it will take a while.

I don't know for sure, but dd over ssh might behave differently from SMB and NFS. There is no feedback mechanism to slow writes down. Its just an unusual setup or use-case and I have not seen that issue before.
 

marmoset

Dabbler
Joined
Dec 18, 2020
Messages
27
FWIW I believe it is the same issue as here:


I was seeing the same time spent in osq_lock. using oflag=direct with dd helped a bit, but I think I'm just going to have to give up on zvols.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
THe correct comparison would be iSCSI.. With iSCSI there is flow control so that only about 10 I/Os per LUN/zvol are queued.

My guess is that the dd over SSH doesn't have that flow control. Why are you using that protocol to a zvol?
 
Top