Unable to delete VM

awil95

Dabbler
Joined
Apr 23, 2017
Messages
28
So I have a Windows 10 VM that I had created with a Nvidia GPU passed-trough to it. I am now wanting to delete the VM; however, when I try to delete it the WebUI hangs and the system becomes unresponsive, requiring a hard reboot. I also attempted to remove all of the "devices" form the VM first before deleting it but trying to remove devices also causes the WebUI to hang. Is there a way to remove VMs manually through shell or SSH?

Additonally, the devices page shows tons of PCI devices passed through for some reason... I did not add all of these devices.
 

Attachments

  • Screenshot 2022-07-20 231806.png
    Screenshot 2022-07-20 231806.png
    339.8 KB · Views: 197

awil95

Dabbler
Joined
Apr 23, 2017
Messages
28
UPDATE: So I have tried using the CLI app via SSH to run the following command.
Code:
service vm delete 2

As soon as I run the command the system hangs. I also thought about deleting all of the PCI devices first from the VM via CLI. I was able to delete a few but eventually as I got down the list the system hung again.

Somehow it seems as if all of my system's PCI devices got assigned to this VM and when I try to remove them it causes the system to crash. I did not add all of these PCI devices to the VM only my Nvidia GPU. Does anyone have any recommendations as to how to delete this VM?
 

Attachments

  • Screenshot 2022-07-21 000851.png
    Screenshot 2022-07-21 000851.png
    215.9 KB · Views: 201

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
UPDATE: So I have tried using the CLI app via SSH to run the following command.
Code:
service vm delete 2

As soon as I run the command the system hangs. I also thought about deleting all of the PCI devices first from the VM via CLI. I was able to delete a few but eventually as I got down the list the system hung again.

Somehow it seems as if all of my system's PCI devices got assigned to this VM and when I try to remove them it causes the system to crash. I did not add all of these PCI devices to the VM only my Nvidia GPU. Does anyone have any recommendations as to how to delete this VM?
You'll need to provide a software version and a history... what worked and when it did it fail?
 

awil95

Dabbler
Joined
Apr 23, 2017
Messages
28
You'll need to provide a software version and a history... what worked and when it did it fail?
My machine is running SCALE 22.02.2.1. Hardware specs are in my signature if it matters.

Here's my timeline of events.
-I have the Plex app installed and using my GTX 1650 for HW transcoding.
-I wanted to experiment with running a windows VM using the GTX 1650, so I removed the GPU from my plex app. Plex app still works with no GPU.
-I created a Window 10 VM without the GPU and got everything setup and virt-io drivers installed.
-I went to WebUI Settings>Advanced>Isolated GPUs and added the GTX 1650 to be isolated from the system and rebooted.
-I then added the GPU as a PCI device on the Windows 10 VM.
-Booted the VM and installed Nvidia drivers. Everything with the VM was working great.
-After a couple of weeks I no longer wanted to use the VM with the GPU, so I shutdown the VM and removed the GPU as a PCI device.
-I went to WebUI Settings>Advanced>Isolated GPUs and removed the GTX 1650. Rebooted.
-I then re-assigned the GPU to my Plex app to use for transcoding again, works great with no issues.
-After a few days I decided I no longer wanted the VM installed so I tied to delete it. This is the first time I experienced a WebUI & a total system hang.
-After I hard reset the system, I checked the devices page of that VM and realized it somehow assigned all of my system's PCI devices to it and when I try to delete any of them the system crashes.
-It looks like SCALE did attempt to remove the VM, at least partly becuase the zVols I had created for the VM are gone.

Hopefully this timeline can give better context to my issue.

EDIT: If there are any relevant log files that would help understand this issue better I would be more than happy to provide them.
 
Last edited:

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
My machine is running SCALE 22.02.2.1. Hardware specs are in my signature if it matters.

Here's my timeline of events.
-I have the Plex app installed and using my GTX 1650 for HW transcoding.
-I wanted to experiment with running a windows VM using the GTX 1650, so I removed the GPU from my plex app. Plex app still works with no GPU.
-I created a Window 10 VM without the GPU and got everything setup and virt-io drivers installed.
-I went to WebUI Settings>Advanced>Isolated GPUs and added the GTX 1650 to be isolated from the system and rebooted.
-I then added the GPU as a PCI device on the Windows 10 VM.
-Booted the VM and installed Nvidia drivers. Everything with the VM was working great.
-After a couple of weeks I no longer wanted to use the VM with the GPU, so I shutdown the VM and removed the GPU as a PCI device.
-I went to WebUI Settings>Advanced>Isolated GPUs and removed the GTX 1650. Rebooted.
-I then re-assigned the GPU to my Plex app to use for transcoding again, works great with no issues.
-After a few days I decided I no longer wanted the VM installed so I tied to delete it. This is the first time I experienced a WebUI & a total system hang.
-After I hard reset the system, I checked the devices page of that VM and realized it somehow assigned all of my system's PCI devices to it and when I try to delete any of them the system crashes.
-It looks like SCALE did attempt to remove the VM, at least partly becuase the zVols I had created for the VM are gone.

Hopefully this timeline can give better context to my issue.

EDIT: If there are any relevant log files that would help understand this issue better I would be more than happy to provide them.

Thanks... its helpful.
It seems likely there is a bug in the process to remove the VM and GPU.
If the VM was deleted before the GPU was removed.... it may have worked.

The unusual step was deleting the VM... when its GPU was no longer available.
If you could report a bug with his timeline.... then its possible the team can recreate and find the issue.
 

awil95

Dabbler
Joined
Apr 23, 2017
Messages
28
Thanks... its helpful.
It seems likely there is a bug in the process to remove the VM and GPU.
If the VM was deleted before the GPU was removed.... it may have worked.

The unusual step was deleting the VM... when its GPU was no longer available.
If you could report a bug with his timeline.... then its possible the team can recreate and find the issue.
I will work on getting a Jira ticket submitted for this issue.

Question, If I wipe my boot drive, re-install SCALE and restore my configuration file will it load back all of the VM configurations as well? or does the system config file not restore VM configs? If it restores VM configs I'll just wipe the boot drives and start with a clean OS then import my pools.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I will work on getting a Jira ticket submitted for this issue.

Question, If I wipe my boot drive, re-install SCALE and restore my configuration file will it load back all of the VM configurations as well? or does the system config file not restore VM configs? If it restores VM configs I'll just wipe the boot drives and start with a clean OS then import my pools.

It will generally restore VM configs as well... however hard to say what it will do in this case with a partially deleted VM.

Safest bet would be clean OS...
 

awil95

Dabbler
Joined
Apr 23, 2017
Messages
28
It will generally restore VM configs as well... however hard to say what it will do in this case with a partially deleted VM.

Safest bet would be clean OS...
Thanks for the help! I have wiped my boot drives and am in the process of rebuilding my system.

For anyone else that ever encounters this issue below you will find a link to my Jira Ticket:
NAS-117289
 

emsicz

Explorer
Joined
Aug 12, 2021
Messages
78
Had similar issue. Described in NAS-117183. In my case, I created a VM on a locked dataset, the wizard didn't complain and went ahead with the creation. At the point where it tried creating the disk volume, it failed, leaving partially created VM. I also had bunch of PCI devices in that half-made VM and I too could not remove the VM. I was able to flip the "start on boot" switch to off position, reboot TrueNAS, then delete VM (the operation will silently fail and the VM will stay in the "Virtualization" tab. Then reboot TrueNAS, then VM is gone.
 

datafreak

Cadet
Joined
Aug 8, 2022
Messages
1
I had the same issue, PCI devices I never added etc. If I tried to remove it via the web GUI the machine would crash.

Here is how I fixed it:

Booted TrueNAS, logged in as soon as I could, toggled 'Start on boot' to off, rebooted it again (might not need to).

Opened a shell and queried the vm_vm table located in the FreeNAS sqlite db

root@truenas[~]# sqlite3 /data/freenas-v1.db "select * from vm_vm" 1|HAL||2|6144|1|LOCAL|UEFI|2|1|90|HOST-MODEL||0|1|||d42606d7-ffc1-42b4-9e94-cf9d6605c192 2|bacon||2|4096|0|LOCAL|UEFI|2|1|90|HOST-PASSTHROUGH||0|1|||0f39b19d-d2cc-40c7-9fa1-43df6069be05

‘bacon’ was the offending machine name with the id of ‘2’, so deleted every entry in the vm_device table with an id of ‘2’

root@truenas[~]# sqlite3 /data/freenas-v1.db "delete from vm_device where vm_id = '2'"


I confirmed via the web GUI that there were no devices listed, proceeded to delete the VM and presto, it vanished.
 
Top