Is TrueNAS SCALE (e.g. 22.02.1) meant to 'gracefully' shutdown VM's?

tackyone

Dabbler
Joined
Jun 8, 2020
Messages
19
Hi,

I restarted TrueNAS scale a while ago - and thought "Wow, those VM's got shutdown quickly..." but didn't think much of it.

Today I updated to 22.02.1 - which obviously involves a restart, which again happened really quickly.

This time I checked - and it appears the VM's get no notification, nor 'graceful' shutdown when the box gets restarted? - They just have their plugs pulled? - So all are running their various disk/database recovery stuff when the system starts again.

Is this right?

Seems a bit of an oversight to not even have a warning when it's going to shutdown, along the lines of "You have active VM's you may want to shut these down before continuing - as restarting will just kill them"?

Just a bit surprised really :(
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Bug Report?
I'd say that was a big one
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Neither does CORE but you can setup a shutdown task. This is mine:
Code:
#! /bin/sh

PAUSE="10"

test -t 1 && echo "Sending SIGTERM to bhyve processes."
/usr/bin/killall -TERM bhyve

test -t 1 && echo -n "Waiting for processes to exit ... "
while /usr/bin/killall -0 bhyve >/dev/null 2>&1
do
    sleep "${PAUSE}"
done
test -t 1 && echo "done."

exit 0

I assume you can setup something similar in SCALE.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
This leaves me to wonder if Scale gracefully stops containers or does it just dump them as well?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
The entire container ecosystem is more or less standard Linux including the startup and shutdown. I would be really surprised if it didn't.

VMs are a different beast on both platforms. In commercial ones there is always a "guest tools" part running in the guest OS mandatory for alive checks as well as graceful shutdown.

No idea what KVM does in this regard but there was an effort to implement guest tools for bhyve:

Kind regards,
Patrick
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
I've been putting some thought into this. Dropping a VM when doing a reboot is absolutely unacceptable. Obviously kernal panics etc are a case to themselves but I expect that if I reboot TN Scale or Core - then a VM MUST be shutdown cleanly (assuming it can be). It can never be 100% certain but in a normal running state the VM's should shutdown normally

Otherwise this risks data corruption, something of the opposite of which TrueNAS is marketed as.

This has to be a design flaw if this is actually the case that a VM is just powered off when the NAS is rebooted. I wouldn't mind a warning about VM's being running at any attempt to shutdown (from GUI or Commandline ideally). Its not ideal, and I would prefer that the shutdown process first shutdown the VM's - but I could accept a block on shutdowns if VM's are running

What do others think?

How about a configuration option to provide a shutdown and reboot script - and this script is accessed from the GUI shutdown (and restart) option - which lets face it is probably the way it happens now.

I've built a Ubuntu Server on TrueNAS Scale as a VM. It doesn't do anything at the moment

Virtual CPUs: 4 Cores: 1 Threads: 1 Memory Size: 16.00 GiB Boot Loader Type: UEFI System Clock: LOCAL Display Port: 5900 Description: Shutdown Timeout: 90 seconds

Its even got a shutdown timer - whatever that means.

If I hit restart on the VM then it seems to shutdown cleanly, but interestingly doesn't restart
If I hit poweroff then I guess that means drop it
If I hit restart on TrueNAS (as opposed to the VM) then as far as I can tell there is no attempt to shutdown the VM - it just gets dropped - with the corresponding risk of data corruption
 
Last edited:

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Last edited:

bingo1105

Cadet
Joined
May 4, 2022
Messages
8
It seems like guest VMs could be given an ACPI poweroff signal upon rebooting the server pretty easily. If the shutdown timeout value is exceeded, the server could forcibly shutdown the guest and the reboot could continue as normal.

When I was running an older CORE release, I was using ssh to send my linux VM a shutdown command as part of the server shutdown script. It worked fine, but it was annoying that the server didn't just handle the process automatically. I thought that CORE started properly shutting down guest VMs as part of the 12.0 release, so I assumed SCALE would be doing this as well...
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Thats probably how it should work. It clearly doesn't.
I might set up a bhyve VM in core just to see what happens
 

shadofall

Contributor
Joined
Jun 2, 2020
Messages
100
Normally KVM controls the behavior via /etc/default/libvirt-guests (paths vary based on distro, this would be Debians path ) No Clue what KVM's normal Default behavior is but given that RHEL sets theirs to suspend in /etc/sysconfig/libvert-guests (like i said different paths for different distros), Its probably safe to guess that KVM doesn't neither graceful or suspend on shutdown unless configured to do so

Of course there are commands that can be passed. so its entirely possible to script the behavior via other means.

so unless IX confirms if they are approaching it via other means. its likely just killing the guests

that said just editing libvirt-guests defiantly wont Survive upgrades, and may not survive reboots.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Looking at the file you pointed to
# URIs to check for running guests # example: URIS='default xen:///system vbox+tcp://host/system lxc:///system' #URIS=default # action taken on host boot # - start all guests which were running on shutdown are started on boot # regardless on their autostart settings # - ignore libvirt-guests init script won't start any guest on boot, however, # guests marked as autostart will still be automatically started by # libvirtd #ON_BOOT=ignore # Number of seconds to wait between each guest start. Set to 0 to allow # parallel startup. #START_DELAY=0 # action taken on host shutdown # - suspend all running guests are suspended using virsh managedsave # - shutdown all running guests are asked to shutdown. Please be careful with # this settings since there is no way to distinguish between a # guest which is stuck or ignores shutdown requests and a guest # which just needs a long time to shutdown. When setting # ON_SHUTDOWN=shutdown, you must also set SHUTDOWN_TIMEOUT to a # value suitable for your guests. #ON_SHUTDOWN=shutdown # Number of guests will be shutdown concurrently, taking effect when # "ON_SHUTDOWN" is set to "shutdown". If Set to 0, guests will be shutdown one # after another. Number of guests on shutdown at any time will not exceed number # set in this variable. #PARALLEL_SHUTDOWN=0 # Number of seconds we're willing to wait for a guest to shut down. If parallel # shutdown is enabled, this timeout applies as a timeout for shutting down all # guests on a single URI defined in the variable URIS. If this is 0, then there # is no time out (use with caution, as guests might not respond to a shutdown # request). The default value is 300 seconds (5 minutes). #SHUTDOWN_TIMEOUT=300 # If non-zero, try to bypass the file system cache when saving and # restoring guests, even though this may give slower operation for # some file systems. #BYPASS_CACHE=0 # If non-zero, try to sync guest time on domain resume. Be aware, that # this requires guest agent with support for time synchronization # running in the guest. By default, this functionality is turned off. #SYNC_TIME=1
Looks as though IX haven't even considered the issue - so I changed some settings and rebooted. The changes DID survive a reboot. Now to see if it actually does anything

I set the default file to shutdown with a parallel of 5 (I only have 2)
My test windows server is complaining about improper shutdown and the second test actually failed to boot the windows server the first time offering to reboot or advanced options. It did boot the second time.

Wonder if suspend would work (I suspect not). Nope suspend don't work either
 
Last edited:

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Somewhere in another thread I found
#!/bin/bash # # Shutdown all VM's # [I]touch ~/started.script[/I] /usr/bin/clear printf %s "VM's running: $(virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" list | tail -n +3 | awk '{print $2}') " echo "Shutdown all libvirt/QEMU/KVM guests" for i in $(virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" list | tail -n +3 | awk '{print $2}') do echo "Shutting down VM $i..." virsh -c "qemu+unix:///system?socket=/run/truenas_libvirt/libvirt-sock" shutdown $i done [I]sleep 30 touch ~/ended.script[/I] exit 0
I added the bits in italics.
If I run the script myself then it works - windows machines shutdown correctly and the touched files appear in ~ [its a crude test]

However if I run the script as a shutdown task [system settings/advanced/"Init/Shutdown Script" then the touched files appear but the windows server I am testing on does NOT shutdown cleanly and I wasn't certain the sleep 30 actually happened.

So I set sleep 240 and put a timeout on the script of 300 which definately put a delay in the restart process :smile:

But the windows server still shows an incorrect shutdown
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
I can confirm that @Patrick M. Hausen 's script DOES work at shutdown and windows shutsdown cleanly
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Yes - I should have said on Core. Sorry
 

Yamon234

Dabbler
Joined
Oct 5, 2021
Messages
13
Came looking for this thread after my TrueNAS Scale box shutdown during a power outage (It has a UPS the system was shutdown gracefully) and corrupted a game server I had running in a Windows VM because VMs don't gracefully shutdown. Fortunately I had a full backup from 3 weeks prior I was able to restore some data from.

Based on the ticket you submitted the issue is now marked as resolved. I'm on the latest release and not the nightlys. Have you been able to test on the nightly updates if they did indeed fix the issue?
 

amichelf

Dabbler
Joined
Apr 10, 2020
Messages
24
Hi,
on SCALE 22.0.2 this has been fixed. On a Windows VM with the Virtio drivers installed I see proper shutdown messages in the eventlog when the Scale box is shut down.
Information
Information 6/21/2022 8:15:59 PM User32 1074 None "The process C:\Windows\system32\winlogon.exe (DC) has initiated the power off of computer DC on behalf of user NT AUTHORITY\SYSTEM for the following reason: No title for this reason could be found
Reason Code: 0x500ff
Shutdown Type: power off
Comment: "
.....
Information 6/21/2022 8:21:58 PM EventLog 6009 None Microsoft (R) Windows (R) 10.00. 20348 Multiprocessor Free.

amichelf
 
Top