Nodefield
Cadet
- Joined
- Aug 6, 2020
- Messages
- 4
I'm running three nearly identical virtualization setups each with:
- two to four virtualization hosts (Proxmox VE 6.2)
- two FreeNAS as VM image storage servers (one dedicated for SSD pool, one for HDD pool with SSD caches, 128G RAM each)
- 10G networking dedicated for VM storage traffic only (MTU 9000)
- NFS (4.1)
- Simple config: no special hacks or unsual configurations (both in TrueNAS/FreeNAS and in VM hypervisor)
- Mostly HP ProLiant hardware; with some SuperMicro servers; all JBOD disks
Recently I tested these virtualization clusters with TrueNAS Core 12.0-BETA. Initially everything looked fantastic: great performance, no immediate issues etc.
After seemingly running smooth and reliable for an average of 1-2 days, some virtual machines in each separate cluster (different locations, different hardware) started having random I/O issues and errors with their disks (qcow2 images mounted via NFSv4.1 from TrueNAS Core servers).
I found out that these qcow2 images had became internally corrupted. Some where beyond repair with `qemu-img` and some were repairable. After it became clear that issue started popping up in all three separate installations, I quickly reverted all storage servers back to FreeNAS 11.3-U4.1. I then rolled back affected qcow2 images from last ZFS snapshot that had uncorrupted file. After that - qcow2 corruption issues disappeared.
I must point out that during all this TrueNAS 12 servers itself seemed run smoothly without any apparent storage (or other) errors reported.
As I've rolled back to 11.3, I'm not immediately able to debug/retest. I'm interested in hearing if anyone else has experienced similar issues? Any thoughts from ixSystems?
- two to four virtualization hosts (Proxmox VE 6.2)
- two FreeNAS as VM image storage servers (one dedicated for SSD pool, one for HDD pool with SSD caches, 128G RAM each)
- 10G networking dedicated for VM storage traffic only (MTU 9000)
- NFS (4.1)
- Simple config: no special hacks or unsual configurations (both in TrueNAS/FreeNAS and in VM hypervisor)
- Mostly HP ProLiant hardware; with some SuperMicro servers; all JBOD disks
Recently I tested these virtualization clusters with TrueNAS Core 12.0-BETA. Initially everything looked fantastic: great performance, no immediate issues etc.
After seemingly running smooth and reliable for an average of 1-2 days, some virtual machines in each separate cluster (different locations, different hardware) started having random I/O issues and errors with their disks (qcow2 images mounted via NFSv4.1 from TrueNAS Core servers).
I found out that these qcow2 images had became internally corrupted. Some where beyond repair with `qemu-img` and some were repairable. After it became clear that issue started popping up in all three separate installations, I quickly reverted all storage servers back to FreeNAS 11.3-U4.1. I then rolled back affected qcow2 images from last ZFS snapshot that had uncorrupted file. After that - qcow2 corruption issues disappeared.
I must point out that during all this TrueNAS 12 servers itself seemed run smoothly without any apparent storage (or other) errors reported.
As I've rolled back to 11.3, I'm not immediately able to debug/retest. I'm interested in hearing if anyone else has experienced similar issues? Any thoughts from ixSystems?