bal0an
Explorer
- Joined
- Mar 2, 2012
- Messages
- 72
I just recently ran into an issue with a VM not starting due to a ENOMEM error. The VM is stopped for a snapshot and backup daily, then restarted. After a few days the VM failed to restart with below error. I could force a restart according to https://www.truenas.com/community/threads/force-vm-start-from-command-line.106408/ turning on overcommit. The issue also disappears temporarily for a few days after restarting the TrueNAS server.
My diagnosis: either I run into a memory issue, or zfs using more and more cache during runtime is normal behaviour - I can't say which.
In https://www.truenas.com/community/threads/vm-memory-allocation.83743/ Patrick recommends to limit the ARC cache size in order to prevent a memory shortage for the VMs.
My question: is there a recommended best practice to positively reserve VM memory?
My diagnosis: either I run into a memory issue, or zfs using more and more cache during runtime is normal behaviour - I can't say which.
In https://www.truenas.com/community/threads/vm-memory-allocation.83743/ Patrick recommends to limit the ARC cache size in order to prevent a memory shortage for the VMs.
My question: is there a recommended best practice to positively reserve VM memory?
Code:
root@nas1:/mnt/tank # /mnt/tank/take_vm_snapshot.sh mars ssd2/mars 2023-04-21 12:13:05 Taking snapshot of ssd2/mars for VM mars. 2023-04-21 12:13:05 vm mars has id 2. 2023-04-21 12:13:05 vm mars stopped 2023-04-21 12:13:05 Taking snapshot ssd2/mars@auto-20230421-1213 2023-04-21 12:13:05 Starting up VM mars [ENOMEM] Cannot guarantee memory for guest mars Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 139, in call_method result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self) File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1246, in _call return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args) File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1151, in run_in_executor return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs)) File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 979, in nf return f(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/vm.py", line 1594, in start self.middleware.call_sync('vm.init_guest_vmemory', vm, options['overcommit']) File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1294, in call_sync return self.run_coroutine(methodobj(*prepared_call.args)) File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1334, in run_coroutine return fut.result() File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result return self.__get_result() File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result raise self._exception File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/vm.py", line 1229, in init_guest_vmemory raise CallError(f'Cannot guarantee memory for guest {vm["name"]}', errno.ENOMEM) middlewared.service_exception.CallError: [ENOMEM] Cannot guarantee memory for guest mars 2023-04-21 12:13:06 Done.
Code:
#!/bin/bash if [ $# -ne 2 ]; then echo "Usage: take_vm_snapshot <vm_name> <dataset>" exit 1 fi VM_NAME=$1 DATASET=$2 RC=$(zfs list $DATASET) if [ $? -eq 1 ]; then echo "Error: Dataset $DATASET not found. Aborting." exit 1 fi echo $(date '+%Y-%m-%d %H:%M:%S') "Taking snapshot of $DATASET for VM $VM_NAME." VM_ID=$(midclt call vm.query | jq ".[] | if .name == \"$VM_NAME\" then .id else empty end") if [ "$VM_ID" == "" ]; then echo "Error: No VM found with name $VM_NAME. Aborting." exit 1 fi echo $(date '+%Y-%m-%d %H:%M:%S') "vm $VM_NAME has id $VM_ID." if [ $(midclt call vm.status $VM_ID | jq '.state') != "\"STOPPED\"" ]; then echo $(date '+%Y-%m-%d %H:%M:%S') "Shutting down VM $VM_NAME..." midclt call vm.stop $VM_ID while [ $(midclt call vm.status $VM_ID | jq '.state') != "\"STOPPED\"" ]; do echo $(date '+%Y-%m-%d %H:%M:%S') "Wait for vm $VM_NAME to terminate..." sleep 5 done fi echo $(date '+%Y-%m-%d %H:%M:%S') "vm $VM_NAME stopped" SNAPSHOT_NAME=$(date "+$DATASET@auto-%Y%m%d-%H%M") echo $(date '+%Y-%m-%d %H:%M:%S') "Taking snapshot $SNAPSHOT_NAME" zfs snapshot $SNAPSHOT_NAME echo $(date '+%Y-%m-%d %H:%M:%S') "Starting up VM $VM_NAME" midclt call vm.start $VM_ID echo $(date '+%Y-%m-%d %H:%M:%S') "Done."
Last edited: