Virtualized FreeNAS apparently crashing

smcclos

Dabbler
Joined
Jan 22, 2021
Messages
43
OK, I to am having a similar situation, and have been making some headway with this problem. Here is my config.
  • Host: Lenovo dx360 M4, 2 proc, 128gb memory
and I added the following components to it from the following 16-bay 3.5" DAS made from an ATX computer case:

  • LSI 9206-18e
  • SFF-8644 TO SFF-8088 Cable (2)
  • SFF-8088 to SFF-8087 Adapter
  • SFF-8087 to SATA breakout cable (2)
  • Hitachi HUH721010ALE600 10 TB drivers (6)
  • Intel SSDSC2KB480G7 480GB SSD (2)
  • External ATX Case with 650 W PSU
Built the external DAS with the 6 drives, configured and installed TrueNAS 13.0-U5.1 with the LSI 9206 as a passthru to the Guest, and starting adding drives.

I only added the 10TB drives without the intel SSD because I wanted to stress disks only and see how hard they would work. I would build a pool with the disks, either a 6 drive Z1 as 1 vdev, or 2 3 disk z1 vdev, which would give me 50TB or 40TB respectively. Also attached the SSD, so it was seen by a mounted device but not attached to any pools.

I would stress test the disks with the following commands:

Code:
fio --name=test --size=10g --rw=read --ioengine=posixaio --direct=1 --bs=1m
fio --name=test --size=40g --rw=read --ioengine=posixaio --direct=1 --bs=1m
fio --name=test --size=60g --rw=read --ioengine=posixaio --direct=1 --bs=1m
fio --name=test --size=10g --rw=write --ioengine=posixaio --direct=1 --bs=1m
fio --name=test --size=40g --rw=write --ioengine=posixaio --direct=1 --bs=1m
fio --name=test --size=60g --rw=write --ioengine=posixaio --direct=1 --bs=1m


and to say the system was stable would be a strech. Would get read erros, write errors, checksums, and at times the VM would power off randomly

Here is a snippet of my vmware.log
Code:
2023-07-03T05:24:50.452Z| vmx| E105: PANIC: PCI passthru device 0000:2c:00.0 caused an IOMMU fault type 6 at address 0xc0000000.  Powering off the virtual machine.  If the problem persists please contact the device's vendor.A core file is available in "/vmfs/volumes/5f87216c-25a7c0ba-e357-40f2e9029b58/VB-TRUENAS02/vmx-zdump.001"


At first I thought it was a temperature issue because the system would come up and after a while of testing, would shut down, used this find and monitor temperature of LSI card. the LSI 9206-16e has two SAS controllers so I monitored both of them. One was at 61c and the other was at 95c

When I stressed it the temp might have gone up to 102c for the second controller, but the VM was still crashing.

Next, I removed the SSD based upon this article and it was more stable, but still giving some errors.

Next, since this card has 4 SFF-8644 ports, I decide to move the cables from the upper 2 to the lower 2, with just 6 10TB drives, no SSD and it is rock solid.

Moved the cables to the 2 middle SFF-8644 ports and it crashed all the time, so on my card the 2nd highest port is unusable. All the cables, adapters, and HDD disks are fine.

The funny thing is that if I stress test the system continuously, the temp of the controller goes down to 81c, 56c. My guess is that when running this benchmark, that the CPU is used more, and the system realizes it and increases the fan speed, which pulls more air through the system, and also through and over the LSI card. Once the testing stops, the temp increases.

Next, I am going to add in the SSD and see how it affects a stable system.
 
Top