qemu-ga won't run on Debian bullseye VM in 22.12.0

JohnnyD

Dabbler
Joined
Jan 6, 2022
Messages
43
Hi, do I need to do something special to allow qemu-ga to run on my debian bullseye vm's? I cant seem to get it to run?

JD
 

JohnnyD

Dabbler
Joined
Jan 6, 2022
Messages
43
Thanks for the reply, not sure what you mean by nested virtualization? I just want the debian VM's to close down gracefully if the Truenas scale bare metal receives a shutdown command?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
Thanks for the reply, not sure what you mean by nested virtualization? I just want the debian VM's to close down gracefully if the Truenas scale bare metal receives a shutdown command?

Best to describe what steps you took and what the result was,
 

JohnnyD

Dabbler
Joined
Jan 6, 2022
Messages
43
The VM's were created on my old 22.x system, I did a full replication of all the VM's onto the new 22.12 system, and made VM's out of those replicated 'drives' but the qemu-ga is not running on the transfered over VM's so they hard shutdown if the Scale powers down, not gracefully as it needs to be.
 

tprelog

Patron
Joined
Mar 2, 2016
Messages
291
Best to describe what steps you took and what the result was,

I'm also facing issues getting the qemu-guest-agent running inside VMs - While my goal is to get this working in Home Assistant OS, I have also created a vanilla Debian VM for testing / troubleshooting.

There is also a thread in the Home Assistant forum which shares a solution from reddit. I first tried the proposed solution (in my Home Assistant VM) but it failed to work on SCALE 22.12.0

Because Home Assistant OS is also an "appliance OS" I created the debian VM for further testing.

I've taken steps mentioned below using the debian VM. This is just a base install with ssh

Basically the proposed solution says we need to attach the special serial device to the VM - So I created a file with the following content

Code:
<channel type='unix'>
   <source mode='bind' path='/var/lib/libvirt/qemu/f16x86_64.agent'/>
   <target type='virtio' name='org.qemu.guest_agent.0'/>
</channel>


Then attached it to my VM
Code:
/usr/bin/virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" attach-device 1_debian --file /mnt/nvme/vm/channel.xml --current


On SCALE 22.02.04 the virsh command is successful

Code:
Device attached successfully


On SCALE 22.12.0 the virsh command fails
Code:
error: Failed to attach device from /mnt/nvme/vm/channel.xml
error: internal error: no virtio-serial controllers are available




Only when the command virsh is successful, I can ssh into the VM and start the qemu-ga

Code:
systemctl start qemu-guest-agent.service


Code:
systemctl status qemu-guest-agent.service
● qemu-guest-agent.service - QEMU Guest Agent
     Loaded: loaded (/lib/systemd/system/qemu-guest-agent.service; static)
Active: active (running) since Thu 2022-12-29 07:57:32 EST; 55s ago
   Main PID: 444 (qemu-ga)
      Tasks: 2 (limit: 4671)
     Memory: 2.2M
        CPU: 5ms
     CGroup: /system.slice/qemu-guest-agent.service
             └─444 /usr/sbin/qemu-ga

Dec 29 07:57:32 debian systemd[1]: Started QEMU Guest Agent.
 

JohnnyD

Dabbler
Joined
Jan 6, 2022
Messages
43
Thank you for confirming, I also read that it now needs some seriel device configuration (The qemu-ga relies on a special virtio serial port to send the comms between the host and the guest, and if it's not there the qemu-ga will not work} but no idea how/what to do, but if you have tried the above and it fails on 22.12, there is no point in me trying the same thing. So on 22.12 we are stuck?
 

tprelog

Patron
Joined
Mar 2, 2016
Messages
291
So on 22.12 we are stuck?

Well, I think obvious next step is filing something on Jira. But what to file, I'm not sure about... is this a bug or a feature request?

There has already been a feature request to expose additional VM customizations. Adding a channel for the guest-agent is mentioned as one use case. (Hint: consider casting a vote )


If we can file this as a bug, I'll be happy to create a new ticket
 

JohnnyD

Dabbler
Joined
Jan 6, 2022
Messages
43
I really have no idea if its a bug or something not included yet, its worked on all my other vm hosts (Non Truenas) without any config necessary..........But its worrying that all my VM's will just die if the host recieves a powerdown command and not shut them down gracefully........
 

tprelog

Patron
Joined
Mar 2, 2016
Messages
291
Ok... lots of back and forth between releases... compare this / compare that... Probably would have been so much quicker for some who actually knows what they are doing :tongue: Anyways a few hours of staring at this and here's where I'm at.

I dumped my VM logs from both versions of SCALE. Here is what I'm guessing is the important part

On SCALE 22.02.04 I see these lines in my VM log (among others)
Code:
-chardev spicevmc,id=charchannel0,name=vdagent \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 \


However, these lines were missing from the VM logs on SCALE 22.12.0

Looking again at the error message no virtio-serial controllers are available I started connecting dots.

Using the same approach mentioned on reddit, I tried creating another xml file to try and define / attach a virtio-serial - The contents of my file is irrelevant anyways error: Operation not supported: 'virtio-serial' controller cannot be hot plugged.

I'm not sure what the correct way is to add a virtio-serial controller on TrueNAS - So I had a really bad idea, why not dig around in the code to see if I could make it work?! Somehow I managed to get it working (sort of) without destroying my NAS.

Eventually I figured out that changing the display type from VNC to SPICE will create a virtio-serial controller on SCALE 22.12.0. With that, I can now get qemu-guest-agent running again, using this solution I mentioned above

This was the first (horrible) idea I tried (but it works using display type VNC)

Code:
nano /usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/domain_xml.py


Edit line 171 - add not as shown below
Code:
    if not spice_server_available:


Save changes and exit editor, then restart middleware
Code:
systemctl restart middlewared.service


Now after I restart my VM and check the logs, it includes

Code:
-chardev spicevmc,id=charchannel0,name=vdagent \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 \


and the virsh command is now successful and I can start the qemu-ga inside my VM on SCALE 22.12.0
 
Last edited:

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
Ok... lots of back and forth between releases... compare this / compare that... Probably would have been so much quicker for some who actually knows what they are doing :tongue: Anyways a few hours of staring at this and here's where I'm at.

I dumped my VM logs from both versions of SCALE. Here is what I'm guessing is the important part

On SCALE 22.02.04 I see these lines in my VM log (among others)
Code:
-chardev spicevmc,id=charchannel0,name=vdagent \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 \


However, these lines were missing from the VM logs on SCALE 22.12.0

Looking again at the error message no virtio-serial controllers are available I started connecting dots.

Using the same approach mentioned on reddit, I tried creating another xml file to try and define / attach a virtio-serial - The contents of my file is irrelevant anyways error: Operation not supported: 'virtio-serial' controller cannot be hot plugged.

I'm not sure what the correct way is to add a virtio-serial controller on TrueNAS - So I had a really bad idea, why not dig around in the code to see if I could make it work?! Somehow I managed to get it working (sort of) without destroying my NAS.

Eventually I figured out that changing the display type from VNC to SPICE will create a virtio-serial controller on SCALE 22.12.0. With that, I can now get qemu-guest-agent running again, using this solution I mentioned above

This was the first (horrible) idea I tried (but it works using display type VNC)

Code:
nano /usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/domain_xml.py


Edit line 171 - add not as shown below
Code:
    if not spice_server_available:


Save changes and exit editor, then restart middleware
Code:
systemctl restart middlewared.service


Now after I restart my VM and check the logs, it includes

Code:
-chardev spicevmc,id=charchannel0,name=vdagent \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 \


and the virsh command is now successful and I can start the qemu-ga inside my VM on SCALE 22.12.0

Thanks for the great detective work.

I'd suggest reporting this as a bug.. what you have learnt should make it easier to resolve.. or at least document the problem and solution better.
 

tprelog

Patron
Joined
Mar 2, 2016
Messages
291
I'd suggest reporting this as a bug



Along the way, I came across some red hat documentation which I copied my file name and update content from. I don't think the file name matters so much but agent.xml makes sense to me.

agent.xml
Code:
<channel type='unix'>
   <target type='virtio' name='org.qemu.guest_agent.0'/>
</channel>


Based on a script from reddit, here is the start-up script I ended up with. It worked fine to start my VM from the shell but took a few tries to get it working as a Post Init script. I think increasing the script timeout seems to be the solution - I ended up using a 60 second timeout.

670bb8d554e9dd8e2df5da890a3b0db077fe6034.png

start-haos.sh
Code:
#!/bin/bash

## Enter the alphanumeric name you used for your VM
vm='HomeAssistant'

## Set a full path to the agent xml file you created
agent_xml='/mnt/tank/vm/agent.xml'


set -euo pipefail

if [ ${EUID} -ne 0 ]; then
  echo "please run this script as root or using sudo"
elif [ ! -f "${agent_xml}" ]; then
  echo "agent_xml not found: ${agent_xml}"
else
  id=$(/usr/bin/midclt call vm.query | /usr/bin/jq -r '.[] | ( .id|tostring ) + ":" + .name' | /usr/bin/grep "${vm}" | /usr/bin/cut -f1 -d':')
  /usr/bin/midclt call vm.start "${id}"
  sleep 1
  name=$(/usr/bin/virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" list --name | /usr/bin/grep "${vm}")
  /usr/bin/virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" attach-device "${name}" "${agent_xml}"
fi
 
Last edited:

tprelog

Patron
Joined
Mar 2, 2016
Messages
291
Here's a small update to the VM start script.

Same function as before but this one will create a temporary Qemu guest agent XML file for you, attach it to the VM, then removes the temporary XML file. I think it's just a little easier to use this way. Feedback welcome

1. Copy script to your NAS
2. Set the name of VM on line 4
3. Run script to start VM and attach guest agent

Code:
#!/bin/bash

## Set the name of your VM
vm='HomeAssistant'


if [ ${EUID} -ne 0 ]; then
  echo "Please run this script as root or using sudo"
  exit 1
fi

set -euo pipefail

# create a temporary XML file for the QEMU guest agent
guest_agent=$(mktemp -t agent.xml.XXXXX)
cat << END_XML > "${guest_agent}"
<channel type='unix'>
   <target type='virtio' name='org.qemu.guest_agent.0'/>
</channel>
END_XML

# get id and use it to start the VM
id=$(/usr/bin/midclt call vm.query | /usr/bin/jq -r '.[] | ( .id|tostring ) + ":" + .name' | /usr/bin/grep "${vm}" | /usr/bin/cut -f1 -d':')
/usr/bin/midclt call vm.start "${id}" &> /dev/null
sleep 1

# get VM name and use it to attach the QEMU guest agent
name=$(/usr/bin/virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" list --name | /usr/bin/grep "${vm}")
/usr/bin/virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" attach-device "${name}" "${guest_agent}"

# remove the temporary XML file
rm "${guest_agent}"
 

rmr

Dabbler
Joined
Sep 8, 2021
Messages
17
Fantastic thread! On my TrueNAS 22.12, I fixed the problem by patching the middleware instead (apply the patch below, restart middlewared). This way, I don't have to select VMs or rely on restart scripts and I can still use VNC.

I also learned that TrueNAS supports the ptp_kvm module (see for example Time sync KVM guest on Debian on how to use it on chrony) for better time synchronization. But ONLY when the number of CPUs is set to 1 with multiple cores, rather than multiple CPUs with one core each.

Code:
--- /usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/domain_xml.py.orig    2022-12-13 06:32:23.000000000 -0600
+++ /usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/domain_xml.py    2022-12-31 22:29:48.212387984 -0600
@@ -168,14 +168,20 @@
         # default if not all headless servers like ubuntu etc require it to boot
         devices.append(create_element('video'))
 
-    if spice_server_available:
-        # We always add spicevmc channel device when a spice display device is available to allow users
-        # to install guest agents for improved vm experience
-        devices.append(create_element(
-            'channel', type='spicevmc', attribute_dict={
-                'children': [create_element('target', type='virtio', name='com.redhat.spice.0')]
-            }
-        ))
+    # We always add spicevmc channel device to allow users
+    # to install guest agents for improved vm experience
+    devices.append(create_element(
+        'channel', type='spicevmc', attribute_dict={
+            'children': [create_element('target', type='virtio', name='com.redhat.spice.0')]
+        }
+    ))
+    devices.append(create_element(
+        'channel', type='unix', attribute_dict={
+            'children': [create_element('source', mode='bind', path='/var/lib/libvirt/qemu/f16x86_64.agent'),
+                         create_element('target', type='virtio', name='org.qemu.guest_agent.0'),
+                        ]
+        }
+    ))
 
     devices.append(create_element('serial', type='pty'))
     return create_element('devices', attribute_dict={'children': devices})
 

JohnnyD

Dabbler
Joined
Jan 6, 2022
Messages
43
Fantastic thread! On my TrueNAS 22.12, I fixed the problem by patching the middleware instead (apply the patch below, restart middlewared). This way, I don't have to select VMs or rely on restart scripts and I can still use VNC.

I also learned that TrueNAS supports the ptp_kvm module (see for example Time sync KVM guest on Debian on how to use it on chrony) for better time synchronization. But ONLY when the number of CPUs is set to 1 with multiple cores, rather than multiple CPUs with one core each.

Code:
--- /usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/domain_xml.py.orig    2022-12-13 06:32:23.000000000 -0600
+++ /usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/domain_xml.py    2022-12-31 22:29:48.212387984 -0600
@@ -168,14 +168,20 @@
         # default if not all headless servers like ubuntu etc require it to boot
         devices.append(create_element('video'))
 
-    if spice_server_available:
-        # We always add spicevmc channel device when a spice display device is available to allow users
-        # to install guest agents for improved vm experience
-        devices.append(create_element(
-            'channel', type='spicevmc', attribute_dict={
-                'children': [create_element('target', type='virtio', name='com.redhat.spice.0')]
-            }
-        ))
+    # We always add spicevmc channel device to allow users
+    # to install guest agents for improved vm experience
+    devices.append(create_element(
+        'channel', type='spicevmc', attribute_dict={
+            'children': [create_element('target', type='virtio', name='com.redhat.spice.0')]
+        }
+    ))
+    devices.append(create_element(
+        'channel', type='unix', attribute_dict={
+            'children': [create_element('source', mode='bind', path='/var/lib/libvirt/qemu/f16x86_64.agent'),
+                         create_element('target', type='virtio', name='org.qemu.guest_agent.0'),
+                        ]
+        }
+    ))
 
     devices.append(create_element('serial', type='pty'))
     return create_element('devices', attribute_dict={'children': devices})
How do you mean you 'patched' the middleware, how do I do that? and is it allowed to do such a thing, does it have any side effects on future official patches?
 

oblivioncth

Explorer
Joined
Jul 13, 2022
Messages
71
How do you mean you 'patched' the middleware, how do I do that? and is it allowed to do such a thing, does it have any side effects on future official patches?
They basically mean that they simply modified the python script that controls this behavior.

What they're provided is a "patch" or diff file that contains only the differences between the original version of the file and their offsets.

You can use it with the 'patch' "command to apply these same changes to your copy of the file. First, you'd want to make a copy of the original as a backup.

As for if you can do this:

For certain doing this is not supported, as in if you ask for help with problems related to VM setup and its discovered you made this change, those helping you might get quite annoyed and stopping helping you. At least in an official capacity; however, this is a relatively minor patch and probably works fairly well (I will try it myself soon) given @rmr likely wouldn't use it if it completely bricked their VMs. So I would worry terribly about it.

As for how it affects updates:

I don't know exactly how SCALE implements updates so I can't be sure. If the updates themselves make use of patches, such that existing files are modified into the new ones, then yes this would break if that file gets changed in the update. If the updates simply copy in the new files, overwriting old ones, then no it wouldn't be an issue. The way to be absolutely safe about it would be to restore the original backup before performing any updates down the road.

I will say though, if this is functionality you'd like to see in SCALE by default (which I figured would be anyway given how annoying this issue is at the moment) I'd ask that you upvote the issue that @tprelo created on Jira about this so that there a better chance it gets addressed in a future update.
 
Last edited:

tprelog

Patron
Joined
Mar 2, 2016
Messages
291
As for how it affects updates

Future updates should just overwrite your manual changes.

If the issue is not addressed, you will need to apply the "patch" again after updates
 

JohnnyD

Dabbler
Joined
Jan 6, 2022
Messages
43
They basically mean that they simply modified the python script that controls this behavior.

What they're provided is a "patch" or diff file that contains only the differences between the original version of the file and their offsets.

You can use it with the 'patch' "command to apply these same changes to your copy of the file. First, you'd want to make a copy of the original as a backup.

As for if you can do this:

For certain doing this is not supported, as in if you ask for help with problems related to VM setup and its discovered you made this change, those helping you might get quite annoyed and stopping helping you. At least in an official capacity; however, this is a relatively minor patch and probably works fairly well (I will try it myself soon) given @rmr likely wouldn't use it if it completely bricked their VMs. So I would worry terribly about it.

As for how it affects updates:

I don't know exactly how SCALE implements updates so I can't be sure. If the updates themselves make use of patches, such that existing files are modified into the new ones, then yes this would break if that file gets changed in the updated. If the updates simply copy in the new files, overwriting old ones, then no it wouldn't be an issue. The way to be absolutely safe about it would be to restore the original backup before performing any updates down the road.

I will say though, if this is functionality you'd like to see in SCALE by default (which I figured would be anyway given how annoying this issue is at the moment) I'd ask that you upvote the issue that @tprelo created on Jira about this so that there a better chance it gets addressed in a future update.
Ok have upvoted the issue (as I started the thread LoL) If anyone has time to document how i make and apply this patch that would be great, I am very new to Scale, but not to debian........
 

oblivioncth

Explorer
Joined
Jul 13, 2022
Messages
71
Future updates should just overwrite your manual changes.

If the issue is not addressed, you will need to apply the "patch" again after updates
Ah yes, good point. I wasn't thinking about that originally.

Ok have upvoted the issue (as I started the thread LoL) If anyone has time to document how i make and apply this patch that would be great, I am very new to Scale, but not to debian........
Woops, sorry I missed that.

The patch utility is actually an old unix tool that dates back a bit, but I don't mind giving explicit directions.

I've taken the liberty of attaching the patch as a file to this post so that you can download it directly into your TrueNAS install without having to muck around with a share or something, and modified it a hair so that it can be applied without specifying the file path.

To be clear though for anyone passing by, I did not create this patch, it was made by @rmr.

If you're not logged in as root, you'll of course have to use sudo with the following commands.

Open up the shell and navigate to a directory where you want the patch file to be downloaded.

Download the patch file from this post:
Code:
curl https://www.truenas.com/community/attachments/middlware-qemu-ga-txt.62022/ --output middlware-qemu-ga.patch


Apply it to the file specified within the patch file, while also creating a backup of the original:
Code:
patch -b -d/ -p0 < middleware-qemu-ga.patch


Done. Delete the patch file if you want.

Now, for whatever reason I'm getting:
patch unexpectedly ends in middle of line
Hunk #1 succeeded at 168 with fuzz 1.
when applying the patch.

The first message can usually indicates a line ending issue and the second generally means that the patch offsets were off slightly; however, I've confirmed that the patch is correct visually, and by regenerating it myself after patching (the same exact patch was recreated) so I'm not sure what's up with that.

Regardless, I can confirm the patch applied successfully, though I haven't had a chance to setup my VM yet to test out that the agent is working.
 

Attachments

  • middlware-qemu-ga.txt
    1.6 KB · Views: 47
Last edited:

oblivioncth

Explorer
Joined
Jul 13, 2022
Messages
71
I can confirm that the middleware patch worked for me as I was able to safely shutdown my Home Assistant OS VM via the UI without having added the qemu-ga manually via a script.

Really hope they add it as a toggle in a future release.
 
Top