Bhyve with Ubuntu 19.04 - keeps locking up?

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288

apwiggins

Dabbler
Joined
Dec 23, 2016
Messages
41
I did say "if you can connect via serial console ..", which IIRC for many linux distros means a post install fix before dumping the VNC device ... eg: something like https://www.hiroom2.com/2017/06/19/debian-9-grub2-and-linux-with-serial-console/
@KrisBee Thanks. I was able yesterday to get the VM booting on serial console and it now just has a disk and NIC as the devices. However, I had to again poke the VM via serial (cu -l /dev/nmdmXXX) to get it to wake up before it would serve. Once poked via serial, it instantly came alive. This VM is running Ubuntu 18.04.3. It seems to have the same issue as before.

The disk has the following characteristics:
Code:
create_zvol: false
sectorsize: 0
type: VIRTIO
path: /dev/zvol/tank/rancheros-wgkzbe 


The NIC has the following characteristics:
Code:
nic_attach: igb0 
mac: 00:a0:98:17:c3:60 
type: E1000 
 

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
@apwiggins When you say the "same issue" , I take it this only is after your VM has been running more than 12hrs. My small scale FreeNAS box is mostly used for backups now and on those occasions it's on all day, I've still not seen this behaviour when running a Ubuntu 18.04.3 VM ( no VNC, only ssh or serial console access) that does little more than serve a webpage on my lan for heimdall with access for a couple of other docker containers and portainer

Patrick above on this thread, who is making serious use of bhyve VMs in production, reports how illusive some bugs can be. I wonder how much development/bug fixing effort bhyve gets. Perhaps you might get some additional help/clues here: https://lists.freebsd.org/pipermail/freebsd-virtualization/

I stick to using virtio for NIC devices where possible because of various past bug report at bugs.freebsd.org
 

apwiggins

Dabbler
Joined
Dec 23, 2016
Messages
41
@apwiggins When you say the "same issue" , I take it this only is after your VM has been running more than 12hrs. My small scale FreeNAS box is mostly used for backups now and on those occasions it's on all day, I've still not seen this behaviour when running a Ubuntu 18.04.3 VM ( no VNC, only ssh or serial console access) that does little more than serve a webpage on my lan for heimdall with access for a couple of other docker containers and portainer

Patrick above on this thread, who is making serious use of bhyve VMs in production, reports how illusive some bugs can be. I wonder how much development/bug fixing effort bhyve gets. Perhaps you might get some additional help/clues here: https://lists.freebsd.org/pipermail/freebsd-virtualization/

I stick to using virtio for NIC devices where possible because of various past bug report at bugs.freebsd.org
@KrisBee Thanks again for helping along. Yes, same issue meaning that the VM seems to suspend again. I've also switched the NIC to use VirtIO per your suggestion and rebooted the VM so we'll see how that works. With E1000 being the default NIC option, I hadn't looked at changing this item.
I've used lots of virtualization on Linux, so FreeNAS/FreeBSD with bhyve is new to me and its instrumentation/logging is less than desirable (i.e., silent failure). This problem appeared for me in the 11.2 series with a mix of VMs created on 11.1 and 11.2. What is frustrating is that I'm running 2 nearly identical Ubuntu 18.04.3 VMs. One is fine using defaults (and VNC) for running my gitlab instance which runs solidly as expected while the second one for Docker orchstration (portainer) keeps hitting this suspend issue. So, this second one keeps diverging from the first as we've been experimenting (vnc, virtio, etc) I started investigating portainer on Ubuntu for docker orchestration as my rancher VM instance was also hitting this suspend issue in 11.2 and I've been searching for a stable option. I really like the FreeNAS storage and have been a bit blindsided by the instability of its virtualization which kinda crept up on me.

Server is a Lenovo TD350, XeonE5-2620v4 48GB RAM, 3x8GB IronWolf disks which is pretty standard hardware.
 

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
I would imagine it's the limitations of bhyve and related issues that has led people to use ESXI/VMWARE with FreeNAS virtualised using disks attached to passed through HBAs. Others keep FreeNAS purely as a storage server and use a separate hypervisor of their choice - maybe proxmox or xen-ng - on another box. I don't make any serious use of VMs at home, but I'd have doubts about relying on bhyve if I did.
 
Last edited:

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Well, I have seen lockups due to network IF emulation issues in the past. Never took the time to hunt them down, though. VirtIO works 100% reliable for me with Linux or FreeBSD guests, both network and storage. For Windows you need some extra incantations for VirtIO to work, but I use that, too.
Now that VirtIO storage finally works for Windows (11.3-RC1), I switched a VM at home from AHCI to VirtIO and will watch how that turns out.

Is there any kind of power management active in your VM? I admittedly did not put too much research into all the odds and ends - booting an Ubuntu install image, install, run has always worked for me since the days of FreeNAS Corall. No glitches *at all*, sorry.

Kind regards,
Patrick
 

apwiggins

Dabbler
Joined
Dec 23, 2016
Messages
41
Thanks for the help this week.

@KrisBee First signs of stability. The portainer (Ubuntu18.04.3) VM stayed up overnight. So, current state is with serial console, no VNC, disk and NIC are now using VirtIO.

@Patrick M. Hausen - I'm just using a stock Ubuntu 18.04.3 image, so no special power management tweaks. I had this same thought when I was chasing a similar issue in my RancherVM about 6-8 months ago.
 

apwiggins

Dabbler
Joined
Dec 23, 2016
Messages
41
For notes on adjusting Ubuntu 18.04 to boot using serial console mode:
Code:
## /etc/default/grub

 GRUB_DEFAULT=0                                                                                                                                                                   
 #GRUB_HIDDEN_TIMEOUT=0                                                                                                                                                           
 GRUB_HIDDEN_TIMEOUT_QUIET=true                                                                                                                                                   
 GRUB_TIMEOUT=2                                                                                                                                                                   
 GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`                                                                                                                 
 GRUB_CMDLINE_LINUX_DEFAULT=""                                                                                                                                                   
 ### Change 1 - added the following string
 GRUB_CMDLINE_LINUX="console=ttyS0,115200n8"                                                                                                                                     
 
 # Uncomment to disable graphical terminal (grub-pc only)                                                                                                                         
 ### Change 2 - uncomment the following line
 GRUB_TERMINAL=console  


Then run sudo update-grub and reboot.
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
In Ubuntu 20.04 booting via EFI, these are the steps to get serial post-install. Taken from https://github.com/ynkjm/ubuntu-serial-install/.

Edit /etc/default/grub as follows with sudo vi /etc/default/grub:

Code:
GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_TERMINAL='serial console'
GRUB_CMDLINE_LINUX="console=hvc0 console=ttyS0,115200n8"
GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1"


Update grub configuration: sudo update-grub
 

apwiggins

Dabbler
Joined
Dec 23, 2016
Messages
41
In Ubuntu 20.04 booting via EFI, these are the steps to get serial post-install. Taken from https://github.com/ynkjm/ubuntu-serial-install/.

Edit /etc/default/grub as follows with sudo vi /etc/default/grub:

Code:
GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_TERMINAL='serial console'
GRUB_CMDLINE_LINUX="console=hvc0 console=ttyS0,115200n8"
GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1"


Update grub configuration: sudo update-grub
Thanks Yorick. Just needed that today!
 

coolnodje

Explorer
Joined
Jan 29, 2016
Messages
66
I'm also having this issue, after upgrading to an "Intel(R) Atom(TM) CPU C3758 @ 2.20GHz (8 cores)" with 64GB of ram, so I have plenty to run some VM's.
My old board was had a "Intel Avoton C2750 Octa-Core Processor " cpu with only 16GB of ram, but running 2 VM's was more stable than now. Just to slow because they started to swap due to the lack of RAM

Upgraded board and memory for better performance, but hate it that I have to restart my VM's almost daily...

So if anyone can help it would be really appriciated.
I was thinking it's maybe some BIOS setting or so.

No NFS or SMB yet in the VM. Only installed rancher and some nodes, with nothing on it yet, except monitoring.
I'm in the exact same hardware situation, except that my problems started only after I moved the VM zvol to a newly installed NVMe disk.
Before that, I never heard of this bug even though my VM would occasionally stall.

After moving the zvol of my docker host VM, I've started to encounter the issue. My initial lookup of the issue lead to me transfer also the docker containers config files away from NFS shares to another zvol on the NMVe disk.
Also I converted the VM zvol to Virtio.
Everything is really fast now compare to when everything was on spining disk, but the bug is not going away, on the contrary.
I now believe it has nothing to do with using NFS shares (nor VNC, VMs are all using SPICE by default on Truenas 12 now).
Only the real data file are still accessed from NFS shares from Truenas, but it's hardly likely to pose an issue.

From my specific situation, I can't help thinking the disk speed is actually causing the problem.Of course I've nothing technical to back this up.
I'd be heartbroken to move everthing back to the old slow dataset, but right now I don't see any better option.

This thread is getting old but I'm still encountering the issue.
Where are we today? Is there a solution to this?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
@coolnodje What type are your virtual disk and NIC devices? You should run VirtIO for both.
 

coolnodje

Explorer
Joined
Jan 29, 2016
Messages
66
I've changed them all to VirtIO after I started to encounter the issue.
It doesn't feel like it makes any difference unfortunately.
 

coolnodje

Explorer
Joined
Jan 29, 2016
Messages
66
I'm actually not too sure how the VNC button on VM works now.
The last time I could see a bhyve command in middleware.log dates back from 2020-05.
I don't see any vnc process nor any port that would match a "classic" VNC server.
But the GUI VM section has this VNC button so... would it be worth a try to disable that system and only keep a serial redirection?

From the comments I gather bhyve is not reliable anyway, I'm not sure how much progress could have been made 2 short years, but probably not much.
I gather the decision to create TrueNAS SCALE was probably influence by freebsd unability to provide a reliable hypervisor.

So is it time to migrate to SCALE, now that it's in beta, to be able to host services in containers?
 
Last edited:

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
I am at a complete loss. I personally run 4 VMs in production. At work it's 20. All in TrueNAS CORE, all 24x7, a mix of Windows 10, Server 2016, Ubuntu, CentOS.

Not a single problem and great performance. I would love to know what we do differently.
 

coolnodje

Explorer
Joined
Jan 29, 2016
Messages
66
I've actually upgraded to Ubuntu 21.10 at the same time I installed the NVMe disk actually.
Could be something. Though people seem to have the issue with Ubuntu 18 already, but who knows.
 

spiceygas

Explorer
Joined
Jul 9, 2020
Messages
63
Not sure if this will help or not, but I had the same problem of Ubuntu VMs freezing and eventually figured out it's something with the Linux kernel version. By rolling back to kernel version 5.4.0 the problem went away. I have multiple Linux VMs and that solved it for all of them.

Obviously, not the most appealing answer. "Just stop updating." But it worked. Maybe that information helps you, maybe not...

https://www.truenas.com/community/t...a-vm-with-vnc-stops-the-vm.87704/#post-639088
 

coolnodje

Explorer
Joined
Jan 29, 2016
Messages
66
Thanks for your comment.
It supports my feeling that the Linux OS has something to do with the issue.
And the kernel being the source of it makes sense to me looking at the very low level warning that we get during freezing.

In my case updating to Ubuntu 21.10 just when changing the storage support to NVMe was really coincidental.
I had everything running fine with VNC and using NFS shares before upgrading, so the source of the problem can only be Ubuntu (if I exclude the hardware update, which seems most likely to cause if, but again just a feeling).

I was very worried for a couple of days, looked for an alternative container host system, tried Debian, then PhotonOS, but didn't even have to go all the way down the migration path: after a couple of kernel updates on Ubuntu 21.10, I haven't seen the issue for quite a while.
Using 5.13.0-2 now.
 

coolnodje

Explorer
Joined
Jan 29, 2016
Messages
66
ok, never mind, it's still happening wit this kernel.
Maybe I should ry to go back to Ubuntu 20.04 and an earlier kernel
 
Top