An io error bug in truenas11-12 version when using QEMU lun passthrough, raid card dell h730p

dianju

Cadet
Joined
Jan 13, 2023
Messages
6
env:
os:rhel 9
host:dell r730xd
raid card:dell h730p (in raid6 mode)
virtual machine:rhel 9 QEMU/KVM


problem config:
<disk type="block" [B]device="lun">[/B] <driver name="qemu" type="raw"/> <source dev="/dev/disk/by-path/xxxx"/> <target dev="sdc" bus="scsi"/> <address type="drive" controller="0" bus="0" target="0" unit="2"/> </disk>

error log:
freenas (da0:vtscsi0:0:0:2): CAM status: SCSI Status Error
freenas (da0:vtscsi0:0:0:2): SCSI status: Check Condition
freenas (da0:vtscsi0:0:0:2): SCSI sense: ABORTED COMMAND asc:0,6 (I/O process terminated)
freenas (da0:vtscsi0:0:0:2): Retrying command (per sense data)
freenas (da0:vtscsi0:0:0:2): READ(16). CDB:xxxxxx

root@freenas:~ # zpool status -v
……
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub canceled on XXXX

config:

NAME STATE READ WRITE CKSUM
xxxx DEGRADED 0 0 0
gptid/xxxx DEGRADED 703 891 1
……


Reproduce condition:
freenas 11.3-U5 upgrade to 12.0-U8.1 and upgrade to truenas 13.0-U3.1

description:
1,in freenas 11.3-U5 everything works fine
2,when upgrading to 12.0-U8.1 .
3,start reporting error (see error log).
4,rollback to freenas 11.3-U5,everything works fine.
5,continuously upgraded to truenas 13.0-U3.1 problem still exists
6,Change virtual machine configuration(Change lun passthrough to scsi disk dev ):
<disk type="block" [B]device="disk">[/B] <driver name="qemu" type="raw" cache="none" io="native" discard="unmap"/> <source dev="/dev/disk/by-path/XXXX" index="1"/> <backingStore/> <target dev="sdb" bus="scsi"/> <alias name="scsi0-0-0-1"/> <address type="drive" controller="0" bus="0" target="0" unit="1"/> </disk>
7,error does not reproduce.

issue:What is causing this problem, and whether this bug can be fixed in a future release?
I was afraid when I first saw this error.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
I was afraid when I first saw this error.
Well, you should be.
os:rhel 9
Sidenote: This seems like a masochist thing to do.
whether this bug can be fixed in a future release?
I have good news and bad news for you.

The good news is it's entirely in your hands to fix this.
The bad news is that you need to redo your setup as it is completely inadequate for use with ZFS.
dell h730p (in raid6 mode)
^ Terrible choice, as amply documented on the forums and elsewhere.
problem config:
<disk type="block" [B]device="lun">[/B] <driver name="qemu" type="raw"/> <source dev="/dev/disk/by-path/xxxx"/> <target dev="sdc" bus="scsi"/> <address type="drive" controller="0" bus="0" target="0" unit="2"/> </disk>
This is textbook what you should not do.

Here are the steps you need to take:
  1. Pray that your backups are okay. Don't have backups? Not a good place to be, but it's not a hopeless situation, though you will have to find someone with more in-depth knowledge than me to help you out.
  2. Read the recommended reading at the end of this post.
  3. Replace the H730P with an HBA
    1. Is it a mini card? If so, get a Dell HBA330 mini.
    2. Is it a standard PCIe card? Any SAS HBA with an LSI SAS3008 controller and stock LSI firmware will do the trick. Or go the HBA330 mini route.
  4. Wipe all your current disks
  5. Pass through the HBA to the VM
  6. Set up a suitable ZFS pool, presumably RAIDZ2, since that's broadly equivalent to RAID6
If you want other storage not passed through and exclusively controlled by the VM, you have some tough choices ahead. You could do NVMe for said storage, I guess, using the PCIe slots. The 2.5" R730XD also supports U.2 on four of the front drive bays so that you're not stuck with M.2 and the like if you have that version. You could also look into setting up the rear drive bays, which you could then use on a separate controller or just plain SATA from the motherboard...

Recommended reading:
 

dianju

Cadet
Joined
Jan 13, 2023
Messages
6
Well, you should be.

Sidenote: This seems like a masochist thing to do.

I have good news and bad news for you.

The good news is it's entirely in your hands to fix this.
The bad news is that you need to redo your setup as it is completely inadequate for use with ZFS.

^ Terrible choice, as amply documented on the forums and elsewhere.

This is textbook what you should not do.

Here are the steps you need to take:
  1. Pray that your backups are okay. Don't have backups? Not a good place to be, but it's not a hopeless situation, though you will have to find someone with more in-depth knowledge than me to help you out.
  2. Read the recommended reading at the end of this post.
  3. Replace the H730P with an HBA
    1. Is it a mini card? If so, get a Dell HBA330 mini.
    2. Is it a standard PCIe card? Any SAS HBA with an LSI SAS3008 controller and stock LSI firmware will do the trick. Or go the HBA330 mini route.
  4. Wipe all your current disks
  5. Pass through the HBA to the VM
  6. Set up a suitable ZFS pool, presumably RAIDZ2, since that's broadly equivalent to RAID6
If you want other storage not passed through and exclusively controlled by the VM, you have some tough choices ahead. You could do NVMe for said storage, I guess, using the PCIe slots. The 2.5" R730XD also supports U.2 on four of the front drive bays so that you're not stuck with M.2 and the like if you have that version. You could also look into setting up the rear drive bays, which you could then use on a separate controller or just plain SATA from the motherboard...

Recommended reading:
Thank you for taking the time to reply me
In fact, when I changed the VM settings from "lun mode" to "scsi disk dev"
So far no missing files have been found, and no other problems have been found for the time being.

and why rhel 9 is a bad choice?I'm curious about it all.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
and why rhel 9 is a bad choice?I'm curious about it all.
It's overall a painful, if not hostile experience. If I wanted the Big Blue experience, I'd buy a mainframe. Except that I don't because that's a terrible idea.
 

dianju

Cadet
Joined
Jan 13, 2023
Messages
6
It's overall a painful, if not hostile experience. If I wanted the Big Blue experience, I'd buy a mainframe. Except that I don't because that's a terrible idea.
So in general, is this kind of virtual machine configuration bad and fraught with danger?
Will it leave hidden dangers for the future?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
It is not fraught with danger, it is iredeemably and completely broken.

It bypasses, to no benefit, many of ZFS' key features. It is known to be unreliable due to the intricacies of virtualization which do not play well with the design of ZFS.
It has caused many people to lose their data, plenty of them irretrievably.

You can virtualize, but you must do it correctly and the cardinal rule is "the controller must be passed through". No hardware RAID, no "disk pass-through", no virtual storage whatsoever (except for the OS itself, which is disposable and less stressful).
 

dianju

Cadet
Joined
Jan 13, 2023
Messages
6
It is not fraught with danger, it is iredeemably and completely broken.

It bypasses, to no benefit, many of ZFS' key features. It is known to be unreliable due to the intricacies of virtualization which do not play well with the design of ZFS.
It has caused many people to lose their data, plenty of them irretrievably.

You can virtualize, but you must do it correctly and the cardinal rule is "the controller must be passed through". No hardware RAID, no "disk pass-through", no virtual storage whatsoever (except for the OS itself, which is disposable and less stressful).
it is iredeemably and completely broken.?what does this mean? I can still use those files normally, of course I didn't check each one in turn.What do you suggest I should do now?
At the current position, zfs did not burst out any other errors.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
The QEMU/KVM vtscsi paravirtualization driver is known not to work correctly with TrueNAS. You should pass your disk controller through to the VM, and make sure that the disk controller is either a SATA AHCI, SCU, or LSI HBA with the appropriate firmware. Failure to follow this design guideline means that you are very likely to run into other problems in the future. That's written up in the resource you were already provided with,

 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176

dianju

Cadet
Joined
Jan 13, 2023
Messages
6
The QEMU/KVM vtscsi paravirtualization driver is known not to work correctly with TrueNAS. You should pass your disk controller through to the VM, and make sure that the disk controller is either a SATA AHCI, SCU, or LSI HBA with the appropriate firmware. Failure to follow this design guideline means that you are very likely to run into other problems in the future. That's written up in the resource you were already provided with,

thank you for your reply
I know exactly what to do now
 
Top