13.0-U3.1 ENOMEM during VM start: best practice to manage memory for VMs?

bal0an

Explorer
Joined
Mar 2, 2012
Messages
72
I just recently ran into an issue with a VM not starting due to a ENOMEM error. The VM is stopped for a snapshot and backup daily, then restarted. After a few days the VM failed to restart with below error. I could force a restart according to https://www.truenas.com/community/threads/force-vm-start-from-command-line.106408/ turning on overcommit. The issue also disappears temporarily for a few days after restarting the TrueNAS server.
My diagnosis: either I run into a memory issue, or zfs using more and more cache during runtime is normal behaviour - I can't say which.
In https://www.truenas.com/community/threads/vm-memory-allocation.83743/ Patrick recommends to limit the ARC cache size in order to prevent a memory shortage for the VMs.
My question: is there a recommended best practice to positively reserve VM memory?

Code:
root@nas1:/mnt/tank # /mnt/tank/take_vm_snapshot.sh mars ssd2/mars
2023-04-21 12:13:05 Taking snapshot of ssd2/mars for VM mars.
2023-04-21 12:13:05 vm mars has id 2.
2023-04-21 12:13:05 vm mars stopped
2023-04-21 12:13:05 Taking snapshot ssd2/mars@auto-20230421-1213
2023-04-21 12:13:05 Starting up VM mars
[ENOMEM] Cannot guarantee memory for guest mars
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 139, in call_method
    result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1246, in _call
    return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1151, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 979, in nf
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/vm.py", line 1594, in start
    self.middleware.call_sync('vm.init_guest_vmemory', vm, options['overcommit'])
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1294, in call_sync
    return self.run_coroutine(methodobj(*prepared_call.args))
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1334, in run_coroutine
    return fut.result()
  File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/vm.py", line 1229, in init_guest_vmemory
    raise CallError(f'Cannot guarantee memory for guest {vm["name"]}', errno.ENOMEM)
middlewared.service_exception.CallError: [ENOMEM] Cannot guarantee memory for guest mars

2023-04-21 12:13:06 Done.


Code:
#!/bin/bash
if [ $# -ne 2 ]; then
    echo "Usage: take_vm_snapshot <vm_name> <dataset>"
    exit 1
fi
VM_NAME=$1
DATASET=$2
RC=$(zfs list $DATASET)
if [ $? -eq 1 ]; then
    echo "Error: Dataset $DATASET not found. Aborting."
    exit 1
fi
echo $(date '+%Y-%m-%d %H:%M:%S') "Taking snapshot of $DATASET for VM $VM_NAME."
VM_ID=$(midclt call vm.query | jq ".[] | if .name == \"$VM_NAME\" then .id else empty end")
if [ "$VM_ID" == "" ]; then
    echo "Error: No VM found with name $VM_NAME. Aborting."
    exit 1
fi
echo $(date '+%Y-%m-%d %H:%M:%S') "vm $VM_NAME has id $VM_ID."
if [ $(midclt call vm.status $VM_ID | jq '.state') != "\"STOPPED\"" ]; then
    echo $(date '+%Y-%m-%d %H:%M:%S') "Shutting down VM $VM_NAME..."
    midclt call vm.stop $VM_ID
    while [ $(midclt call vm.status $VM_ID | jq '.state') != "\"STOPPED\"" ]; do
        echo $(date '+%Y-%m-%d %H:%M:%S') "Wait for vm $VM_NAME to terminate..."
        sleep 5
    done
fi
echo $(date '+%Y-%m-%d %H:%M:%S') "vm $VM_NAME stopped"
SNAPSHOT_NAME=$(date "+$DATASET@auto-%Y%m%d-%H%M")
echo $(date '+%Y-%m-%d %H:%M:%S') "Taking snapshot $SNAPSHOT_NAME"
zfs snapshot $SNAPSHOT_NAME
echo $(date '+%Y-%m-%d %H:%M:%S') "Starting up VM $VM_NAME"
midclt call vm.start $VM_ID
echo $(date '+%Y-%m-%d %H:%M:%S') "Done."
 
Last edited:
Top