"libvirt.libvirtError: internal error: client socket is closed" when trying to start a VM

sepahewe

Cadet
Joined
Nov 11, 2021
Messages
2
Hi,

I have searched the forum but not found any similar issues.

My system:
  • MB: SM X11SPi-TF
  • CPU: Intel Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz
  • RAM: 196 GB
  • Storage: 6xWD WD Ultrastar 550 18 TB SAS3, 2xIntel Optane P4801x 100GB, 2xSamsung 970 EVO 2 TB
  • HDC: LSI 9340-8i 12G
  • NIC: 2x Intel x722 10 GB, 1xMellanox ConnectX-3 Pro
Software:
  • TrueNAS-SCALE-22.02-RC.2
I have created a couple of Ubuntu VMs through the GUI, they started without any issues so I checked "Autostart" and rebooted my server, and now they won't start. If I try to start them manually I get an error:
Code:
Error: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 160, in call_method
    result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self,
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1281, in _call
    return await methodobj(*prepared_call.args)
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1269, in nf
    return await func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1137, in nf
    res = await f(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/vm_lifecycle.py", line 42, in start
    await self.middleware.run_in_thread(self._start, vm['name'])
  File "/usr/lib/python3/dist-packages/middlewared/utils/run_in_thread.py", line 10, in run_in_thread
    return await self.loop.run_in_executor(self.run_in_thread_executor, functools.partial(method, *args, **kwargs))
  File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/vm_supervisor.py", line 62, in _start
    self.vms[vm_name].start(vm_data=self._vm_from_name(vm_name))
  File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/supervisor_base.py", line 122, in start
    if self.domain.isActive():
  File "/usr/lib/python3/dist-packages/libvirt.py", line 1709, in isActive
    raise libvirtError('virDomainIsActive() failed')
libvirt.libvirtError: internal error: client socket is closed


Looking through the log journalctl | grep libvirt I found one instance of
Code:
libvirt: XML-RPC error : Cannot write data: Broken pipe

followed by a large amount of:
Code:
libvirt: XML-RPC error : internal error: client socket is closed
libvirt: XML-RPC error : internal error: client socket is closed
libvirt: XML-RPC error : internal error: client socket is closed
libvirt: XML-RPC error : internal error: client socket is closed
libvirt: XML-RPC error : internal error: client socket is closed
libvirt: XML-RPC error : internal error: client socket is closed
libvirt: XML-RPC error : internal error: client socket is closed
libvirt: XML-RPC error : internal error: client socket is closed
libvirt: XML-RPC error : internal error: client socket is closed


I checked the source for middlewared on GitHub and extracted the URL used for virsh: "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" so I logged in using SSH and tried virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" sysinfo and it worked:
Code:
<sysinfo type='smbios'>
  <bios>
    <entry name='vendor'>American Megatrends Inc.</entry>
    <entry name='version'>3.5</entry>
    <entry name='date'>05/18/2021</entry>
    <entry name='release'>5.14</entry>
  </bios>
  <system>
    <entry name='manufacturer'>SuperMicro</entry>
    <entry name='product'>Super Server</entry>
    <entry name='version'>0123456789</entry>
    <entry name='serial'>0123456789</entry>
    <entry name='uuid'>00000000-0000-0000-0000-3cecef075479</entry>
    <entry name='sku'>To be filled by O.E.M.</entry>
    <entry name='family'>To be filled by O.E.M.</entry>
  </system>
  ....


so the socket works fine if called manually, but Middlewared gets an error.

I haven't done any config CLI but everything through the GUI.

Have anyone seen anything similar?
Have I missed something?

//Many thanks
 

Attachments

  • debug-nas-20220125091649.tgz
    9.9 MB · Views: 146

wonkybot

Cadet
Joined
Feb 4, 2022
Messages
1
I am seeing more/less the same error when I manually start a VM. This started shortly after a failed attempt to install Linux Mint and/or Debian onto a VM. I have since forgotten the exact sequence.. But I remember one of the two VMs reached a state of no response so I force killed one of the VMs and since then VMs will not start. I have not tried rebooting TrueNAS yet.
 

hexadecagram

Dabbler
Joined
Jul 15, 2016
Messages
32
I am also seeing this (albeit on TrueNAS CORE). The first time it happened was last night when I was working my way through the solution recommended to me in my most recent post: first I had shut down my VM, then I issued # ifconfig bridge0 destroy on the host, then I manually created a bridge in the WebGUI, then I tried to start up the VM, then I saw the error, then I rebooted, recreated the bridge, and started the VM back up and the error did not present itself again.

Just now, one of my VMs locked up so I powered it off and tried to restart it and saw the error once again. As before, I rebooted with the VM autostarted and the error does not resurface.
 

Inimic

Cadet
Joined
Feb 20, 2022
Messages
1
I'm having similar issue with my VM, just started within past 24 hours.

I recently upgraded to U8 (yesterday 2/19) and now VM stability issue (rock solid running under U7). VM will now crash and become unresponsive after a few hours. When attempting to restart it via UI it will hang then report "libvirt.libvirtError: internal error: client socket is closed" when trying to start up again. I have to reboot Truenas itself to resolve but when rebooting Truenas the shutdown process hangs "some process could not die: ps axl advised" and needs a power cycle.
 

sepahewe

Cadet
Joined
Nov 11, 2021
Messages
2
A quick update on my issue. Everything started working again after I updated to 22.02. My guess is that NAS-114087 fixed the issue.
 

olepinguin

Dabbler
Joined
Apr 10, 2022
Messages
10
A quick update on my issue. Everything started working again after I updated to 22.02. My guess is that NAS-114087 fixed the issue.
nop - I'm on
Version:
TrueNAS-SCALE-22.02.0.1

Would be nice to get an idea whta is going on in code line 121...domain? I hope my domain is active :smile:), the developer not..

line 121, in start if self.domain.isActive(): File "/usr/lib/python3/dist-packages/libvirt.py", line 1709, in isActive raise libvirtError('virDomainIsActive() failed') libvirt.libvirtError: internal error: client socket is close
 

olepinguin

Dabbler
Joined
Apr 10, 2022
Messages
10
nop - I'm on
Version:
TrueNAS-SCALE-22.02.0.1

Would be nice to get an idea whta is going on in code line 121...domain? I hope my domain is active :smile:), the developer not..

line 121, in start if self.domain.isActive(): File "/usr/lib/python3/dist-packages/libvirt.py", line 1709, in isActive raise libvirtError('virDomainIsActive() failed') libvirt.libvirtError: internal error: client socket is close
maybe because TN is Resilvering - who knows, not me
..
 

olepinguin

Dabbler
Joined
Apr 10, 2022
Messages
10
maybe because TN is Resilvering - who knows, not me
..
at least here, stopping Resilvering by a good old reboot solved the problem VM is up and running again.
So I assume when the box is to busy it can't get the connection to its own brige....or so
 

olepinguin

Dabbler
Joined
Apr 10, 2022
Messages
10
Hey, is anybody aware of that topic?
The issue still exists on latest version.
What kind logs are needed?
 
Last edited:

tirefire

Cadet
Joined
May 25, 2022
Messages
2
Version: TrueNAS-SCALE-22.02.1

I'm seeing this in my logs:
May 27 13:18:43 truenas middlewared[917]: libvirt: XML-RPC error : internal error: client socket is closed
May 27 13:18:43 truenas middlewared[917]: libvirt: XML-RPC error : internal error: client socket is closed

I have a VM that's running, but the VM state is "ERROR". I'm not sure what the problem is.

[COLOR=var(--fg1)] libvirtError[/COLOR]​

internal error: client socket is closed
[COLOR=var(--fg2)]Error: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/middlewared/main.py", line 175, in call_method result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self) File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1257, in _call return await methodobj(*prepared_call.args) File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1261, in nf return await func(*args, **kwargs) File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1129, in nf res = await f(*args, **kwargs) File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/vm_lifecycle.py", line 39, in start await self.middleware.run_in_thread(self._start, vm['name']) File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1172, in run_in_thread return await self.run_in_executor(self.thread_pool_executor, method, *args, **kwargs) File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1169, in run_in_executor return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs)) File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run result = self.fn(*self.args, **self.kwargs) File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/vm_supervisor.py", line 62, in _start self.vms[vm_name].start(vm_data=self._vm_from_name(vm_name)) File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/supervisor_base.py", line 121, in start if self.domain.isActive(): File "/usr/lib/python3/dist-packages/libvirt.py", line 1709, in isActive raise libvirtError('virDomainIsActive() failed') libvirt.libvirtError: internal error: client socket is closed[/COLOR]
Close
 

tirefire

Cadet
Joined
May 25, 2022
Messages
2
at least here, stopping Resilvering by a good old reboot solved the problem VM is up and running again.
So I assume when the box is to busy it can't get the connection to its own brige....or so
Rebooting seems to have "fixed" the problem, for the last day or so no repeated issue. But the fact that it comes up is worrisome.
 
Joined
May 31, 2022
Messages
2
I have had a similar, if not exact same issue crop up twice now. It seems to occur at ~6 day intervals on my system; once today (why I'm here), and another the 27th after close to 6 days of uptime. I do not have issues logging into the VMs themselves, just interacting with them in the webUI or running virsh list --all. Restarting libvirtd doesn't seem to do the trick. I rebooted the whole machine after safely shutting down all of the machines and all of the machines set to autostart booted up just fine and I could interact with the VM webUI again.

If it happens again (I expect it to, but nbd) I'll try to get as much info as I can. Any info that I can grab that would help fix?
 

engineerdj

Dabbler
Joined
Jan 17, 2022
Messages
21
I'm having a similar issue on TrueNAS-SCALE-22.02.1 that we just noticed last night.

One of our Windows Server 2019 VMs started exhibiting weird behavior.

Task Manager doesn't start, returns error -- The service cannot accept control messages at this time.
Services Manager doesn't start, returns error -- The service cannot accept control messages at this time.
Event Viewer doesn't start, returns error -- The service cannot accept control messages at this time.
Elevated CLI doesn't start, returns error -- The service cannot accept control messages at this time.
Shutdown/Reboot commands from the GUI do nothing, no error returned.

Checked TrueNAS GUI and the VMs (multiple) are OFF/ERROR state, despite both VMs being up, accessible, and carrying on their respective functions. I just can't get into any of the Administrative functions.

1654086644376.png


Attempting to start them returns the libvrtError.

1654086704966.png


Looks like the only solution I have is to gracefully shutdown and reboot TrueNAS SCALE.
 

MadMungo

Dabbler
Joined
Jul 18, 2015
Messages
13
I'm having a similar issue on TrueNAS-SCALE-22.02.1 that we just noticed last night.

One of our Windows Server 2019 VMs started exhibiting weird behavior.

Task Manager doesn't start, returns error -- The service cannot accept control messages at this time.
Services Manager doesn't start, returns error -- The service cannot accept control messages at this time.
Event Viewer doesn't start, returns error -- The service cannot accept control messages at this time.
Elevated CLI doesn't start, returns error -- The service cannot accept control messages at this time.
Shutdown/Reboot commands from the GUI do nothing, no error returned.

Checked TrueNAS GUI and the VMs (multiple) are OFF/ERROR state, despite both VMs being up, accessible, and carrying on their respective functions. I just can't get into any of the Administrative functions.

View attachment 55778

Attempting to start them returns the libvrtError.

View attachment 55779

Looks like the only solution I have is to gracefully shutdown and reboot TrueNAS SCALE.
I have the exact same happening to me. VM is up and running normally despite saying it is off under state, but I cannot access it at all.
Same libvirtError as above if I try to start any of my other VM's or the running one.
 

Boh

Cadet
Joined
Apr 12, 2016
Messages
3
Same issue. I have 3 TrueNAS SCALE servers in total. 2 of them are used for VMs. Both of them gives the same errors. VMs can rune fine for a couple of days, then suddenly they're turned off and won't start. Reboot of server "fixes" the problem - which is... "Less than desirable"... Both Windows, FreeBSD and Linux VMs.
 

Boh

Cadet
Joined
Apr 12, 2016
Messages
3
Same issue. I have 3 TrueNAS SCALE servers in total. 2 of them are used for VMs. Both of them gives the same errors. VMs can rune fine for a couple of days, then suddenly they're turned off and won't start. Reboot of server "fixes" the problem - which is... "Less than desirable"... Both Windows, FreeBSD and Linux VMs.
To clarify; the VMs are fine, up and running. It's the TrueNAS SCALE UI that seems to have problems.
 
Joined
May 31, 2022
Messages
2
Just had this happen again: An anecdotal observation is that I have gotten along further uptime wise without any of my windows vms live. ~18 days before erroring out. All vms are still up and functional, I just cannot manage them through webUI or virsh:
Code:
root@truenas[/]# virsh list
error: failed to connect to the hypervisor
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory

1655649505621.png

1655649530034.png
 

Boh

Cadet
Joined
Apr 12, 2016
Messages
3
Workaround in place, in the form of a daily middlewared restart then..
 
Top