SOLVED BHYVE VMs seem to be broken by upgrading to RC2 - (Spoiler: it wasn't)

endcycle

Dabbler
Joined
Sep 7, 2018
Messages
13
FINAL EDIT - it was absolutely one of my hard drives slowly dying. The fun part was that it was passing smartctl / SMART tests. As soon as I pulled it from the array and started running in degraded, everything started working smoothly once again. Replaced and rebuilt the array last week and all is well. Everything below.... is immaterial now. :)

Edit update: Rolling back to RC1 manually to see if that resolves things.
Edit update 2: Nope. No difference. Damn.
Edit update 3: Yeah, the problem is very likely a dying hard drive in my array causing things to run really slowly, which throws everything off. I'll know more as soon as my replacement (and second spare) arrive.

Background -

Was running 11.2 RC1 (and previously the beta) with no issues at all. Overall setup seems pretty stable still after upgrading to RC2.

The problem is with my hosted VMs in Bhyve. I have 4 -
Windows 7 pro (runs all sorts of fun utilities like Radarr etc)
Ubuntu 18.04 server running pihole
Ubuntu 18.04 server running docker / hosted whatever / testing system
Ubuntu 18.04 server running plex

Prior to the upgrade, I was happy with the performance. Everything was great. :) Saw no issues reported at all around VMs in the first few days around the RC2 release, took the plunge... and now none of the VMs are booting fully.

upload_2018-11-26_9-49-10.png


This is the console output of one of the VMs at boot, and it's fairly typical of all 3 Ubuntu VMs.

Am currently trying to install a fresh VM (again ubuntu 18.04 since that's onhand and relatively current), and it's just hanging after booting the ISO.

SO! Any ideas where to start troubleshooting? I'm a bit out of my depth here. :)


Edit to add - output of tail /var/log/messages:
Nov 26 09:57:32 freenas uwsgi: line 544, in _open
'_open', req)
File "/usr/local/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.6/site-packages/raven/utils/http.py", line 46, in https_open
return self.do_open(ValidHTTPSConnection, req)
File "/usr/local/lib/python3.6/urllib/request.py", line 1320, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error timed out>
Nov 26 09:57:32 freenas uwsgi: [sentry.errors.uncaught:702] ['timeout: timed out', ' File "django/core/handlers/exception.py", line 42, in inner', ' File "django/core/handlers/base.py", line 244, in _legacy_get_response', ' File "freenasUI/freeadmin/middleware.py", line 296, in process_request', ' File "freenasUI/middleware/auth.py", line 8, in authenticate', ' File "freenasUI/middleware/client.py", line 20, in __enter__', ' File "middlewared/client/client.py", line 320, in __init__', ' File "middlewared/client/client.py", line 313, in __init__', ' File "middlewared/client/client.py", line 170, in connect', ' File "ws4py/client/__init__.py", line 215, in connect']
 
Last edited:

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
Trying looking for clues in the middleware log, e.g.
Code:
tail -f /var/log/middleward.log


Also, has the upgrade from RC1 to RC2 changed the device order for the NIC, disk devices , etc. attached to your individual VMs?
 

endcycle

Dabbler
Joined
Sep 7, 2018
Messages
13
sudo tail /var/log/middlewared.log -

[2018/11/26 17:48:06] (DEBUG) VMService.run():286 - LinuxTEST: wrmsr to register 0x140(0) on vcpu 1
[2018/11/26 17:48:06] (DEBUG) VMService.run():286 - LinuxTEST: rdmsr to register 0x4e on vcpu 1
[2018/11/26 17:48:06] (DEBUG) VMService.run():286 - LinuxTEST: wrmsr to register 0x4e(0x2) on vcpu 1
[2018/11/26 17:48:06] (DEBUG) VMService.run():286 - LinuxTEST: rdmsr to register 0x4e on vcpu 1
[2018/11/26 17:48:14] (DEBUG) VMService.run():286 - LinuxTEST: rdmsr to register 0x34 on vcpu 0
[2018/11/26 17:48:14] (DEBUG) VMService.run():286 - LinuxTEST: Unhandled ps2 mouse command 0xe1
[2018/11/26 17:48:15] (DEBUG) VMService.run():286 - LinuxTEST: Unhandled ps2 mouse command 0x0a
[2018/11/26 17:48:15] (DEBUG) VMService.run():286 - LinuxTEST: Unhandled ps2 mouse command 0x01
[2018/11/26 17:48:15] (DEBUG) VMService.run():286 - LinuxTEST: Unhandled ps2 mouse command 0x41
[2018/11/26 17:48:15] (DEBUG) VMService.run():286 - LinuxTEST: Unhandled ps2 mouse command 0x88

tail /var/log/middlewared.log.1 -

self._callback(*self._args)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_protocol.py", line 381, in start
resp = await self._request_handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_app.py", line 322, in _handle
resp = await handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_middlewares.py", line 88, in impl
return await handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_middlewares.py", line 78, in impl
return await handler(request)

tail /var/log/middlewared.log.2 -

resp = await handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_middlewares.py", line 88, in impl
return await handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_middlewares.py", line 78, in impl
return await handler(request)
File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 1100, in ws_handler
await connection.on_message(x)
File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 286, in on_message
method(self, message)

tail /var/log/middlewared.log.3 -

return await handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_middlewares.py", line 78, in impl
return await handler(request)
File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 1100, in ws_handler
await connection.on_message(x)
File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 286, in on_message
method(self, message)
File "/usr/local/lib/python3.6/site-packages/middlewared/plugins/auth.py", line 138, in update_token
token = self.authtokens.get_token_by_sessionid(app.sessionid)


Didn't seem to change any boot orders.

Attempted to create a new linux VM, using a livecd image to boot into the live desktop environment (worked previously for testing purposes) - won't boot to the desktop of the image. Just sitting on the ubuntu splash screen - 10 minutes and counting.

upload_2018-11-26_12-55-6.png


UPDATE:

Another dump of the middlewared.log file reveals this, which is different....

Oh interesting - just getting this now in the middlewared log -

[2018/10/28 11:38:26] (WARNING) middlewared._loop_monitor_thread():1136 - Task seems blocked: File "/usr/local/bin/middlewared", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 1287, in main
debug_level=args.debug_level,
File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 1197, in run
self.__loop.run_forever()
File "/usr/local/lib/python3.6/asyncio/base_events.py", line 422, in run_forever
self._run_once()
File "/usr/local/lib/python3.6/asyncio/base_events.py", line 1432, in _run_once
handle._run()
File "/usr/local/lib/python3.6/asyncio/events.py", line 145, in _run
self._callback(*self._args)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_protocol.py", line 381, in start
resp = await self._request_handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_app.py", line 322, in _handle
resp = await handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_middlewares.py", line 88, in impl
return await handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_middlewares.py", line 78, in impl
return await handler(request)

[2018/10/28 11:38:28] (WARNING) middlewared._loop_monitor_thread():1136 - Task seems blocked: File "/usr/local/bin/middlewared", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 1287, in main
debug_level=args.debug_level,
File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 1197, in run
self.__loop.run_forever()
File "/usr/local/lib/python3.6/asyncio/base_events.py", line 422, in run_forever
self._run_once()
File "/usr/local/lib/python3.6/asyncio/base_events.py", line 1432, in _run_once
handle._run()
File "/usr/local/lib/python3.6/asyncio/events.py", line 145, in _run
self._callback(*self._args)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_protocol.py", line 381, in start
resp = await self._request_handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_app.py", line 322, in _handle
resp = await handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_middlewares.py", line 88, in impl
return await handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_middlewares.py", line 78, in impl
return await handler(request)

[2018/10/28 11:38:30] (WARNING) middlewared._loop_monitor_thread():1136 - Task seems blocked: File "/usr/local/bin/middlewared", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 1287, in main
debug_level=args.debug_level,
File "/usr/local/lib/python3.6/site-packages/middlewared/main.py", line 1197, in run
self.__loop.run_forever()
File "/usr/local/lib/python3.6/asyncio/base_events.py", line 422, in run_forever
self._run_once()
File "/usr/local/lib/python3.6/asyncio/base_events.py", line 1432, in _run_once
handle._run()
File "/usr/local/lib/python3.6/asyncio/events.py", line 145, in _run
self._callback(*self._args)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_protocol.py", line 381, in start
resp = await self._request_handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_app.py", line 322, in _handle
resp = await handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_middlewares.py", line 88, in impl
return await handler(request)
File "/usr/local/lib/python3.6/site-packages/aiohttp/web_middlewares.py", line 78, in impl
return await handler(request)

[2018/10/28 11:44:23] (WARNING) middlewared._loop_monitor_thread():1136 - Task seems blocked: File "/usr/local/bin/middlewared", line 10, in <module>
sys.exit(main())
 
Last edited:

tmueko

Explorer
Joined
Jun 5, 2012
Messages
82
We have a similar Problem with RC-2: thousands of this entry, 60-80 per second:

Code:
[2018/11/27 08:39:35] (DEBUG) VMService.run():286 - Win10: wrmsr to register 0x49(0x1) on vcpu 0
[2018/11/27 08:39:35] (DEBUG) VMService.run():286 - Win10: wrmsr to register 0x49(0x1) on vcpu 0
root@freenas:/mnt/Pool/VM/ISO # grep 'wrmsr to register 0x49(0x1) on vcpu 0' /var/log/middlewared.log | wc -l
   41458
root@freenas:/mnt/Pool/VM/ISO # grep 'wrmsr to register 0x49(0x1) on vcpu 0' /var/log/middlewared.log | grep '08:39:36' | wc -l
	  79
root@freenas:/mnt/Pool/VM/ISO # grep 'wrmsr to register 0x49(0x1) on vcpu 0' /var/log/middlewared.log | grep '08:39:37' | wc -l
	  83
root@freenas:/mnt/Pool/VM/ISO # grep 'wrmsr to register 0x49(0x1) on vcpu 0' /var/log/middlewared.log | grep '08:39:38' | wc -l
	  68

 

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
Can't comment about a win10 vm, but a VM based on an ubuntu-18.04.1-server-amd64. iso works for me in RC2.

Test condtions:

iso : ubuntu-18.04.1-server-amd64
FreeNAS host: FreeNAS RC2 VM running on kvm/qemu on Linux ( supports nested virtualisation) with 4GB memory
guest bhyve VM: 10gb virtio disk, virtio NIC, vnc, 1gb memory.

Bare install and left idle for 2hrs. Starts OK on reboot.
 

endcycle

Dabbler
Joined
Sep 7, 2018
Messages
13
Another update... looks like I might have a slowly failing drive causing spectacularly slow performance. I think that's maybe causing all of my issues the more I look into it. Getting a replacement drive and will report back.

Ugh. Fun times. :)
 

anderstn

Dabbler
Joined
Oct 2, 2017
Messages
41
Just out of curiosity was there a correlation between the failing drive and the drive where zvools were installed or was it an entirely arbitrary drive? Also did this issue cause any soft reboots for you? I'm seeing lots of similar errors you see and I can't seem to find any other problems in the logs before Freenas either performs a soft re-boot or simply crashes requiring a hard restart.
 

endcycle

Dabbler
Joined
Sep 7, 2018
Messages
13
Hi Anderstn -

I used the smartctl tests to see what was happening, and noticed that the drive in question had a lot more CORRECTABLE errors than other drives. Pulled it out of the array (failed it out in the interface) and in the degraded state, the array started performing MUCH faster.

It was an entirely arbitrary drive FYI.
 

anderstn

Dabbler
Joined
Oct 2, 2017
Messages
41
Was the correctable errors within the "safe" boundaries established by the manufacturer?
 
Top