bal0an
Explorer
- Joined
- Mar 2, 2012
- Messages
- 72
I just recently ran into an issue with a VM not starting due to a ENOMEM error. The VM is stopped for a snapshot and backup daily, then restarted. After a few days the VM failed to restart with below error. I could force a restart according to https://www.truenas.com/community/threads/force-vm-start-from-command-line.106408/ turning on overcommit. The issue also disappears temporarily for a few days after restarting the TrueNAS server.
My diagnosis: either I run into a memory issue, or zfs using more and more cache during runtime is normal behaviour - I can't say which.
In https://www.truenas.com/community/threads/vm-memory-allocation.83743/ Patrick recommends to limit the ARC cache size in order to prevent a memory shortage for the VMs.
My question: is there a recommended best practice to positively reserve VM memory?
My diagnosis: either I run into a memory issue, or zfs using more and more cache during runtime is normal behaviour - I can't say which.
In https://www.truenas.com/community/threads/vm-memory-allocation.83743/ Patrick recommends to limit the ARC cache size in order to prevent a memory shortage for the VMs.
My question: is there a recommended best practice to positively reserve VM memory?
Code:
root@nas1:/mnt/tank # /mnt/tank/take_vm_snapshot.sh mars ssd2/mars
2023-04-21 12:13:05 Taking snapshot of ssd2/mars for VM mars.
2023-04-21 12:13:05 vm mars has id 2.
2023-04-21 12:13:05 vm mars stopped
2023-04-21 12:13:05 Taking snapshot ssd2/mars@auto-20230421-1213
2023-04-21 12:13:05 Starting up VM mars
[ENOMEM] Cannot guarantee memory for guest mars
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 139, in call_method
result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1246, in _call
return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1151, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 979, in nf
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/vm.py", line 1594, in start
self.middleware.call_sync('vm.init_guest_vmemory', vm, options['overcommit'])
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1294, in call_sync
return self.run_coroutine(methodobj(*prepared_call.args))
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1334, in run_coroutine
return fut.result()
File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/vm.py", line 1229, in init_guest_vmemory
raise CallError(f'Cannot guarantee memory for guest {vm["name"]}', errno.ENOMEM)
middlewared.service_exception.CallError: [ENOMEM] Cannot guarantee memory for guest mars
2023-04-21 12:13:06 Done.
Code:
#!/bin/bash
if [ $# -ne 2 ]; then
echo "Usage: take_vm_snapshot <vm_name> <dataset>"
exit 1
fi
VM_NAME=$1
DATASET=$2
RC=$(zfs list $DATASET)
if [ $? -eq 1 ]; then
echo "Error: Dataset $DATASET not found. Aborting."
exit 1
fi
echo $(date '+%Y-%m-%d %H:%M:%S') "Taking snapshot of $DATASET for VM $VM_NAME."
VM_ID=$(midclt call vm.query | jq ".[] | if .name == \"$VM_NAME\" then .id else empty end")
if [ "$VM_ID" == "" ]; then
echo "Error: No VM found with name $VM_NAME. Aborting."
exit 1
fi
echo $(date '+%Y-%m-%d %H:%M:%S') "vm $VM_NAME has id $VM_ID."
if [ $(midclt call vm.status $VM_ID | jq '.state') != "\"STOPPED\"" ]; then
echo $(date '+%Y-%m-%d %H:%M:%S') "Shutting down VM $VM_NAME..."
midclt call vm.stop $VM_ID
while [ $(midclt call vm.status $VM_ID | jq '.state') != "\"STOPPED\"" ]; do
echo $(date '+%Y-%m-%d %H:%M:%S') "Wait for vm $VM_NAME to terminate..."
sleep 5
done
fi
echo $(date '+%Y-%m-%d %H:%M:%S') "vm $VM_NAME stopped"
SNAPSHOT_NAME=$(date "+$DATASET@auto-%Y%m%d-%H%M")
echo $(date '+%Y-%m-%d %H:%M:%S') "Taking snapshot $SNAPSHOT_NAME"
zfs snapshot $SNAPSHOT_NAME
echo $(date '+%Y-%m-%d %H:%M:%S') "Starting up VM $VM_NAME"
midclt call vm.start $VM_ID
echo $(date '+%Y-%m-%d %H:%M:%S') "Done."
Last edited: