SOLVED VM running very slow, lots of swap_pager_getswapspace(##): failed messages

jordanthompson · Apr 8, 2022

I have an Ubuntu VM that occasionally runs very slowly. It is set up with 4 CPU's and 4 threads each. When this happens, I get the swap_pager_getswapspace(##): failed errors on the console (MANY of them!) (Why don't I get these messages in the alerts section?)

This is from top ->o -> res while this is going on (Note that byve is going nuts!):

Just killed the VM from the TrueNAS GUI and waited a few minutes:

Still getting tons of swap_pager_getswapspace(##): failed messages. Is there an issue with bhyve?

rvassar · Apr 8, 2022

We're going to need a bit more information on your system config, and the config of your VM. I do note that 91% of your swap space appears to be in use. So you're off into the realm of extreme memory starvation. The question is why?

jordanthompson · Apr 8, 2022

rvassar said:
We're going to need a bit more information on your system config, and the config of your VM. I do note that 91% of your swap space appears to be in use. So you're off into the realm of extreme memory starvation. The question is why?

What info do you need (and I am missing from my signature?)

jordanthompson · Apr 8, 2022

This is what top ->o -> res looks like after a reboot:

rvassar · Apr 8, 2022

jordanthompson said:
What info do you need (and I am missing from my signature?)

That's pretty complete. I'm thinking we need a Bhyve expert to look at this. The Bhyve process doesn't seem to be consuming the memory, yet the swap is clearly in use at critical threshold levels, and is free'd by a reboot. There's some kind of bug lurking here...

jordanthompson · Apr 8, 2022

rvassar said:
That's pretty complete. I'm thinking we need a Bhyve expert to look at this. The Bhyve process doesn't seem to be consuming the memory, yet the swap is clearly in use at critical threshold levels, and is free'd by a reboot. There's some kind of bug lurking here...

Just messaged you (question on your signature)

jordanthompson · Apr 8, 2022

Just started to interact with the VM:

Note that bhyve is still going crazy, but I am not getting any errors on the console

rvassar · Apr 8, 2022

You're not chewing thru swap space yet. That bottom line:

Swap: XXXX Total, XXXX Free

Treat it like a credit card balance. If it's unused, you're good. If it's max'ed out you're in trouble.

jordanthompson · Apr 8, 2022

Forgot to mention that when I boot, on the console screen, I get:

Code:

Solaris: WARNING: ignoring tunable zfs_arc_max (using 30064771072 instead)

At one time I had played around with the zfs_arc_max, but never got anywhere and everyone suggested to remove it and let the system tune itself. I did remove it:

But I continue to get the warning on boot. Is this a clue?

jordanthompson · Apr 8, 2022

I just shut the VM down from the TrueNAS GUI and this time, for whatever reason, the errors stopped filling my console (

). Of course, I am not running the VM because it seems to definitely be the problem (

).

Any suggestions would be most welcome!

jordanthompson · Apr 9, 2022

I am unable to start my VM (I have not tried rebooting TrueNAS yet). When I try to start the VM, I get the dreaded "ERROR: Not Enough Memory" error. If I ignore this, I get "libvirtError internal error: client socket is closed" and the VM does not start

rvassar · Apr 10, 2022

There's some kind of memory contention problem here that is preventing your 32Gb system from allocating 4Gb for the VM. The warning about the zfs_arc_max tuneable probably shouldn't be appearing, but it also should be treated as a limit, not a "go grab all this memory and lock it up" parameter. I would make an effort to figure out if that tuneable is truly back to the default and not some value that is forcing the system to guess incorrectly what value it should be. Failing that, I would probably set up a memory budget for all my Jails and VM's, and attempt to set that tunable to something reasonable. It looks like the system is picking roughly 30Gb (30064771072) as the limit. Maybe set it to 25769803776 and limit the ARC to 24Gb, then watch your arc stats to make sure you haven't impacted ZFS filesystem performance. But understand I'm just making an educated guess here. I fiddled with Bhyve a few years back and decided to stick with ESXi for things that need full virtualization. The BSD Jails work well, but Bhyve itself isn't particularly robust when compared to virtualization solutions like ESX, Linux KVM, Windows Hyper-V, etc...

HoneyBadger · Apr 10, 2022

jordanthompson said:
I have an Ubuntu VM that occasionally runs very slowly. It is set up with 4 CPU's and 4 threads

I only poked at bhyve a few times (prefer external hypervisors) but do you mean to say that you've set a config like below:

Code:

hw.vmm.topology.cores_per_package: 4
hw.vmm.topology.threads_per_core: 4

Because that's going to try to emulate a CPU that 1) doesn't exist and 2) will be trying to use sixteen threads (4 cores, 4 threads per core) which will heavily starve out your physical CPU.

Try setting it to 2+2 (two cores with HT) if you're going for four threads in-guest.

jordanthompson · Apr 10, 2022

HoneyBadger said:
Try setting it to 2+2 (two cores with HT) if you're going for four threads in-guest.

Done and restarted the VM. Seems to be running OK so far (no performance hit on the VM anyway).

On another note, the TrueNAS unexpectedly rebooted this morning... I looked at the crash logs and could not really understand it all, but I saw this that gave me pause:

Code:

KDB: enter: panic

I can post the crash files, but am not sure if there is any sensitive data in them (suggestions?)

rvassar · Apr 10, 2022

That's a kernel panic. The interesting bits are probably a few lines above that point. I would not attempt to post the crash dump files for the reasons you've stated, but the logs should be OK to share. Was the VM running when you hit this?

jordanthompson · Apr 10, 2022

The VM was probably running (if it was, it was just on and not really doing any work.) Which logs should I post?

jordanthompson · Apr 27, 2022

This is driving me crazy. It is so bad that it shuts down my VM and I am unable to start it without restarting the entire NAS...

jordanthompson · May 6, 2022

So, FWIW, I upgraded to TrueNAS-12.0-U8.1 (had to re-point my BIOS to my mirrored SSD drives - that gave me a bit of a fright) and I am not seeing these messages any more (yay!). I am going to close this thread and hope that all of the others that are experiencing this problem will be able to benefit from this.

Important Announcement for the TrueNAS Community.

SOLVED VM running very slow, lots of swap_pager_getswapspace(##): failed messages

jordanthompson

Patron

rvassar

Guru

jordanthompson

Patron

jordanthompson

Patron

rvassar

Guru

jordanthompson

Patron

jordanthompson

Patron

rvassar

Guru

jordanthompson

Patron

jordanthompson

Patron

jordanthompson

Patron

rvassar

Guru

HoneyBadger

actually does care

jordanthompson

Patron

rvassar

Guru

jordanthompson

Patron

jordanthompson

Patron

jordanthompson

Patron

Similar threads

Important Announcement for the TrueNAS Community.

SOLVED VM running very slow, lots of swap_pager_getswapspace(##): failed messages

Patron

Guru

Patron

Patron

Guru

Patron

Patron

Guru

Patron

Patron

Patron

Guru

actually does care

Patron

Guru

Patron

Patron

Patron

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "VM running very slow, lots of swap_pager_getswapspace(##): failed messages"

Similar threads