"libvirt.libvirtError: internal error: client socket is closed" when trying to start a VM

sepahewe · Jan 25, 2022

Hi,

I have searched the forum but not found any similar issues.

My system:

MB: SM X11SPi-TF
CPU: Intel Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz
RAM: 196 GB
Storage: 6xWD WD Ultrastar 550 18 TB SAS3, 2xIntel Optane P4801x 100GB, 2xSamsung 970 EVO 2 TB
HDC: LSI 9340-8i 12G
NIC: 2x Intel x722 10 GB, 1xMellanox ConnectX-3 Pro

Software:

TrueNAS-SCALE-22.02-RC.2

I have created a couple of Ubuntu VMs through the GUI, they started without any issues so I checked "Autostart" and rebooted my server, and now they won't start. If I try to start them manually I get an error:

Code:

Error: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 160, in call_method
    result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self,
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1281, in _call
    return await methodobj(*prepared_call.args)
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1269, in nf
    return await func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1137, in nf
    res = await f(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/vm_lifecycle.py", line 42, in start
    await self.middleware.run_in_thread(self._start, vm['name'])
  File "/usr/lib/python3/dist-packages/middlewared/utils/run_in_thread.py", line 10, in run_in_thread
    return await self.loop.run_in_executor(self.run_in_thread_executor, functools.partial(method, *args, **kwargs))
  File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/vm_supervisor.py", line 62, in _start
    self.vms[vm_name].start(vm_data=self._vm_from_name(vm_name))
  File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/supervisor_base.py", line 122, in start
    if self.domain.isActive():
  File "/usr/lib/python3/dist-packages/libvirt.py", line 1709, in isActive
    raise libvirtError('virDomainIsActive() failed')
libvirt.libvirtError: internal error: client socket is closed

Looking through the log journalctl | grep libvirt I found one instance of

Code:

libvirt: XML-RPC error : Cannot write data: Broken pipe

followed by a large amount of:

Code:

libvirt: XML-RPC error : internal error: client socket is closed
libvirt: XML-RPC error : internal error: client socket is closed
libvirt: XML-RPC error : internal error: client socket is closed
libvirt: XML-RPC error : internal error: client socket is closed
libvirt: XML-RPC error : internal error: client socket is closed
libvirt: XML-RPC error : internal error: client socket is closed
libvirt: XML-RPC error : internal error: client socket is closed
libvirt: XML-RPC error : internal error: client socket is closed
libvirt: XML-RPC error : internal error: client socket is closed

I checked the source for middlewared on GitHub and extracted the URL used for virsh: "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" so I logged in using SSH and tried virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" sysinfo and it worked:

Code:

<sysinfo type='smbios'>
  <bios>
    <entry name='vendor'>American Megatrends Inc.</entry>
    <entry name='version'>3.5</entry>
    <entry name='date'>05/18/2021</entry>
    <entry name='release'>5.14</entry>
  </bios>
  <system>
    <entry name='manufacturer'>SuperMicro</entry>
    <entry name='product'>Super Server</entry>
    <entry name='version'>0123456789</entry>
    <entry name='serial'>0123456789</entry>
    <entry name='uuid'>00000000-0000-0000-0000-3cecef075479</entry>
    <entry name='sku'>To be filled by O.E.M.</entry>
    <entry name='family'>To be filled by O.E.M.</entry>
  </system>
  ....

so the socket works fine if called manually, but Middlewared gets an error.

I haven't done any config CLI but everything through the GUI.

Have anyone seen anything similar?
Have I missed something?

//Many thanks

wonkybot · Feb 4, 2022

I am seeing more/less the same error when I manually start a VM. This started shortly after a failed attempt to install Linux Mint and/or Debian onto a VM. I have since forgotten the exact sequence.. But I remember one of the two VMs reached a state of no response so I force killed one of the VMs and since then VMs will not start. I have not tried rebooting TrueNAS yet.

hexadecagram · Feb 10, 2022

I am also seeing this (albeit on TrueNAS CORE). The first time it happened was last night when I was working my way through the solution recommended to me in my most recent post: first I had shut down my VM, then I issued # ifconfig bridge0 destroy on the host, then I manually created a bridge in the WebGUI, then I tried to start up the VM, then I saw the error, then I rebooted, recreated the bridge, and started the VM back up and the error did not present itself again.

Just now, one of my VMs locked up so I powered it off and tried to restart it and saw the error once again. As before, I rebooted with the VM autostarted and the error does not resurface.

Inimic · Feb 20, 2022

I'm having similar issue with my VM, just started within past 24 hours.

I recently upgraded to U8 (yesterday 2/19) and now VM stability issue (rock solid running under U7). VM will now crash and become unresponsive after a few hours. When attempting to restart it via UI it will hang then report "libvirt.libvirtError: internal error: client socket is closed" when trying to start up again. I have to reboot Truenas itself to resolve but when rebooting Truenas the shutdown process hangs "some process could not die: ps axl advised" and needs a power cycle.

sepahewe · Feb 28, 2022

A quick update on my issue. Everything started working again after I updated to 22.02. My guess is that NAS-114087 fixed the issue.

olepinguin · Apr 11, 2022

sepahewe said:
A quick update on my issue. Everything started working again after I updated to 22.02. My guess is that NAS-114087 fixed the issue.

nop - I'm on
Version:
TrueNAS-SCALE-22.02.0.1

Would be nice to get an idea whta is going on in code line 121...domain? I hope my domain is active

), the developer not..

line 121, in start if self.domain.isActive(): File "/usr/lib/python3/dist-packages/libvirt.py", line 1709, in isActive raise libvirtError('virDomainIsActive() failed') libvirt.libvirtError: internal error: client socket is close

olepinguin · Apr 11, 2022

olepinguin said:
nop - I'm on
Version:
TrueNAS-SCALE-22.02.0.1

Would be nice to get an idea whta is going on in code line 121...domain? I hope my domain is active ), the developer not..

line 121, in start if self.domain.isActive(): File "/usr/lib/python3/dist-packages/libvirt.py", line 1709, in isActive raise libvirtError('virDomainIsActive() failed') libvirt.libvirtError: internal error: client socket is close

maybe because TN is Resilvering - who knows, not me
..

olepinguin · Apr 11, 2022

olepinguin said:
maybe because TN is Resilvering - who knows, not me
..

at least here, stopping Resilvering by a good old reboot solved the problem VM is up and running again.
So I assume when the box is to busy it can't get the connection to its own brige....or so

olepinguin · Apr 16, 2022

Hey, is anybody aware of that topic?
The issue still exists on latest version.
What kind logs are needed?

bondif · Apr 22, 2022

I'm having the same issue on TrueNAS-12.0-U8.1

tirefire · May 27, 2022

Version: TrueNAS-SCALE-22.02.1

I'm seeing this in my logs:
May 27 13:18:43 truenas middlewared[917]: libvirt: XML-RPC error : internal error: client socket is closed
May 27 13:18:43 truenas middlewared[917]: libvirt: XML-RPC error : internal error: client socket is closed

I have a VM that's running, but the VM state is "ERROR". I'm not sure what the problem is.

[COLOR=var(--fg1)] libvirtError[/COLOR]

internal error: client socket is closed
[COLOR=var(--fg2)]Error: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/middlewared/main.py", line 175, in call_method result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self) File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1257, in _call return await methodobj(*prepared_call.args) File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1261, in nf return await func(*args, **kwargs) File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1129, in nf res = await f(*args, **kwargs) File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/vm_lifecycle.py", line 39, in start await self.middleware.run_in_thread(self._start, vm['name']) File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1172, in run_in_thread return await self.run_in_executor(self.thread_pool_executor, method, *args, **kwargs) File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1169, in run_in_executor return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs)) File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run result = self.fn(*self.args, **self.kwargs) File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/vm_supervisor.py", line 62, in _start self.vms[vm_name].start(vm_data=self._vm_from_name(vm_name)) File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/supervisor_base.py", line 121, in start if self.domain.isActive(): File "/usr/lib/python3/dist-packages/libvirt.py", line 1709, in isActive raise libvirtError('virDomainIsActive() failed') libvirt.libvirtError: internal error: client socket is closed[/COLOR]

Close

tirefire · May 28, 2022

olepinguin said:
at least here, stopping Resilvering by a good old reboot solved the problem VM is up and running again.
So I assume when the box is to busy it can't get the connection to its own brige....or so

Rebooting seems to have "fixed" the problem, for the last day or so no repeated issue. But the fact that it comes up is worrisome.

thenickstrick · May 31, 2022

I have had a similar, if not exact same issue crop up twice now. It seems to occur at ~6 day intervals on my system; once today (why I'm here), and another the 27th after close to 6 days of uptime. I do not have issues logging into the VMs themselves, just interacting with them in the webUI or running virsh list --all. Restarting libvirtd doesn't seem to do the trick. I rebooted the whole machine after safely shutting down all of the machines and all of the machines set to autostart booted up just fine and I could interact with the VM webUI again.

If it happens again (I expect it to, but nbd) I'll try to get as much info as I can. Any info that I can grab that would help fix?

engineerdj · Jun 1, 2022

I'm having a similar issue on TrueNAS-SCALE-22.02.1 that we just noticed last night.

One of our Windows Server 2019 VMs started exhibiting weird behavior.

Task Manager doesn't start, returns error -- The service cannot accept control messages at this time.
Services Manager doesn't start, returns error -- The service cannot accept control messages at this time.
Event Viewer doesn't start, returns error -- The service cannot accept control messages at this time.
Elevated CLI doesn't start, returns error -- The service cannot accept control messages at this time.
Shutdown/Reboot commands from the GUI do nothing, no error returned.

Checked TrueNAS GUI and the VMs (multiple) are OFF/ERROR state, despite both VMs being up, accessible, and carrying on their respective functions. I just can't get into any of the Administrative functions.

Attempting to start them returns the libvrtError.

Looks like the only solution I have is to gracefully shutdown and reboot TrueNAS SCALE.

MadMungo · Jun 5, 2022

engineerdj said:
I'm having a similar issue on TrueNAS-SCALE-22.02.1 that we just noticed last night.

One of our Windows Server 2019 VMs started exhibiting weird behavior.

Task Manager doesn't start, returns error -- The service cannot accept control messages at this time.
Services Manager doesn't start, returns error -- The service cannot accept control messages at this time.
Event Viewer doesn't start, returns error -- The service cannot accept control messages at this time.
Elevated CLI doesn't start, returns error -- The service cannot accept control messages at this time.
Shutdown/Reboot commands from the GUI do nothing, no error returned.

Checked TrueNAS GUI and the VMs (multiple) are OFF/ERROR state, despite both VMs being up, accessible, and carrying on their respective functions. I just can't get into any of the Administrative functions.

View attachment 55778

Attempting to start them returns the libvrtError.

View attachment 55779

Looks like the only solution I have is to gracefully shutdown and reboot TrueNAS SCALE.

I have the exact same happening to me. VM is up and running normally despite saying it is off under state, but I cannot access it at all.
Same libvirtError as above if I try to start any of my other VM's or the running one.

Boh · Jun 11, 2022

Same issue. I have 3 TrueNAS SCALE servers in total. 2 of them are used for VMs. Both of them gives the same errors. VMs can rune fine for a couple of days, then suddenly they're turned off and won't start. Reboot of server "fixes" the problem - which is... "Less than desirable"... Both Windows, FreeBSD and Linux VMs.

Boh · Jun 17, 2022

Boh said:
Same issue. I have 3 TrueNAS SCALE servers in total. 2 of them are used for VMs. Both of them gives the same errors. VMs can rune fine for a couple of days, then suddenly they're turned off and won't start. Reboot of server "fixes" the problem - which is... "Less than desirable"... Both Windows, FreeBSD and Linux VMs.

To clarify; the VMs are fine, up and running. It's the TrueNAS SCALE UI that seems to have problems.

thenickstrick · Jun 19, 2022

Just had this happen again: An anecdotal observation is that I have gotten along further uptime wise without any of my windows vms live. ~18 days before erroring out. All vms are still up and functional, I just cannot manage them through webUI or virsh:

Code:

root@truenas[/]# virsh list
error: failed to connect to the hypervisor
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory

Hitchie · Jun 20, 2022

Running into this issue, with the UI disconnecting from libvirt. Running "systemctl restart middlewared" seems to fix it without having to reboot. I posted an issue to the Jira tracker

Boh · Jun 21, 2022

Workaround in place, in the form of a daily middlewared restart then..

Important Announcement for the TrueNAS Community.

"libvirt.libvirtError: internal error: client socket is closed" when trying to start a VM

Cadet

Attachments

Cadet

Dabbler

Cadet

Cadet

Dabbler

Dabbler

Dabbler

Dabbler

Cadet

Cadet

[COLOR=var(--fg1)] libvirtError[/COLOR]​

Cadet

Cadet

Dabbler

Dabbler

Cadet

Cadet

Cadet

Cadet

Cadet

Similar threads

[COLOR=var(--fg1)] libvirtError[/COLOR]