SOLVED 11.1U2 Docker VM - NMI watchdog: BUG: soft lockup - CPU#0 stuck

gravely

Cadet
Joined
Jul 4, 2016
Messages
6
I'm running FreeNAS 11.1U2 on a i3-4130 @3.4GHz, 28G memory with a 4x10TB raidz2 GELI encrypted volume. Tunables are enabled.

I'm intermittently hitting a soft lockup in the Rancher VM that I installed using the UI per the docs. I've configured Rancher to mount my FreeNAS volumes over NFS in cloud-config.yml. I've assigned the VM all 4 CPUs (I know there are only 2 real cores, I've tried giving it 1, 2 and 4 with no luck) and 16GB of memory. I'm running 7 of the usual suspect docker containers: crashplan pro, plex, transmission, etc, as well as the native Rancher Proxy stack w/ letsencrypt and haproxy. I really like Rancher's Prometheus stack but it can be taxing on the system so I've left it disabled while troubleshooting this problem.

When I find that none of the containers are responding, I cu into the console, confirm that it's another soft lockup, and power cycle the VM. I've tried doing less in Rancher, like keeping the aforementioned Rancher Prometheus stack disabled but also disabling CrashPlan Pro, with no luck.

I'm unsure how to even troubleshoot what thread is locking the CPU in Rancher, or what I can do to give the VM more resources, or have it recover from this state.

Asides but possibly related: FreeNAS swaps more than I expected - with this much memory I would think it would never swap. Also: none of this was ever a problem for me under Corral, or under Freenas 11.0 when I managed my own Rancher installation using bhyve after following advice from other posters on this forum. This only started after migrating my 11.0 rancher container configs to 11.1.

Thanks in advance for any help!
 
D

dlavigne

Guest
Anything in /var/log/messages around the time of the lockup?
 

Sasayaki

Explorer
Joined
Apr 20, 2014
Messages
86
Hey mate, having had the exact same problem it turns out that NFS doesn't support proper file locking, so this is a problem.

The short and simple answer is that you'll need to mount your config files locally. I don't like it because it means my precious configs aren't being backed up, but when I copied the configs from the NFS share to the local drive, they worked fine.
 

gravely

Cadet
Joined
Jul 4, 2016
Messages
6
Welp, I didn't have enough space on the Rancher VM's / volume so I've created a new zvol, attached it to the rancher VM, created an ext4 partition on it, mounted it in could-config, and rsync'd my entire docker-config NFS export to it. 15 hours in and no swap on my FreeNAS (now @ -U4) and the rancher VM seems OK so far.

I'm hesitant to mark this as closed without a week++ like this and will start looking at log management options.

Cheers!
 

19norant

Dabbler
Joined
Dec 15, 2016
Messages
26
Sorry to resurrect a year old thread, but just wanted to chime in here to try adding some value for future searchers since this was the best thread I found for my issue.

I had the same issue here with docker on an Ubuntu 18.04 VM, but NFS wasn't in play as described here.

I switched my disks from AHCI to VirtIO and everything has been smooth now for a few days since doing that. Hope this helps someone.
 
Top