PCI passthrough consistently causes SMART errors on unrelated devices

potatosword

Dabbler
Joined
Dec 30, 2018
Messages
44
I have tried three different PCI devices (USB card, onboard USB, and a GPU) and get SMART/Checksum errors each time I start the Debian VM that has them attached as a passthrough device. I find this confusing because these disks are not attached to those PCI devices in any way shape or form. The only relationship they have is that the disks are part of the pool that houses the Debian VM's storage. The errors stop once the system is restarted and the passthrough device is removed.

I have configured TrueNAS Core via the typical tunables (e.g. vmm_load and pptdevs) and while the device does indeed pass through to the Debian VM properly. I begin getting checksum/SMART errors on ada0 and ada1 immediately after starting the VM with a passthrough device attached.

* Pool Terrorbyte state is ONLINE: One or more devices has experienced an
unrecoverable error. An attempt was made to correct the error. Applications
are unaffected.
* Device: /dev/ada1, not capable of SMART self-check.
* Device: /dev/ada0, not capable of SMART self-check.

ada0 and ada1 are both part of an encrypted 6-drive Z2 pool. No other drives in the pool have this problem and if I remove the PCI card the errors go away and the pool is healthy. I let it sit for ~3 months just in case there was something wrong with the drives but nothing happened. However, the moment I installed a new PCI device and did pass through the errors resumed. So I'm pretty confident it has to do with the PCI passthrough. Any ideas on what the hell is going on? This honestly makes no sense to me.
 
Last edited:

potatosword

Dabbler
Joined
Dec 30, 2018
Messages
44
I tried every PCI port and card combination I have available to me last night. Nothing bad happens until I add the device as a pass through to the VM. After which I immediately get these checksum errors.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Are you passing through the disk controller to the VM?

Their are special considerations for running ZFS as a VM.

Other than that, I have no knowledge of virtualizing TrueNAS.
 

potatosword

Dabbler
Joined
Dec 30, 2018
Messages
44
Thanks for the reply! First point here is that TrueNAS is not running virtualized (I will clarify my original post). Both it and my ZFS are running as the native OS. I am trying to passthrough a USB controller to a Debian VM. I personally wouldn't want to touch virtualizing my NAS with a ten foot pole.

No disk controller passthrough is occurring - only the PCI USB card and a GTX 1050 as an experiment. Just now I tried using the PCI USB card for my boot media and using the onboard USB as my passthrough device and as soon as I passed the onboard USB device onto the VM and started it I got checksum errors again.

Here's everything I can think of configuration wise:
1701195931765.png

1701196084767.png

1701196098036.png

1701196108283.png

1701196140001.png

1701196225506.png
 
Last edited:

potatosword

Dabbler
Joined
Dec 30, 2018
Messages
44
Is it possible there's something going on with legacy encryption here? That's sort of the only unusual thing about this pool. Maybe I'll try migrating the VM's storage onto another pool that isn't encrypted and see what happens.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Sorry, I have no additional suggestions.
 

potatosword

Dabbler
Joined
Dec 30, 2018
Messages
44
I'm displeased to report that moving the VM to another pool did absolutely nothing. I guess I'm just not allowed to use PCI pass through. I am fully out of ideas.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
What version of TrueNAS are you running? I know I can't help as I don't run VM's on TrueNAS, I run TrueNAS in an ESXi VM. It's a bit different. You only posted yesterday and while I know we hope for a fast response, sometimes the folks who know about this stuff are busy with life. I'd recommend looking for a few posting where people are running VM's in TrueNAS and send one or two am direct message and see if they will engage. Most people here will not respond unless they can help. I'm responding so you post the version of CORE you are running and give you a method to seek help. Also, if nothing happens in a few days, BUMP the thread. We do allow that in moderation.

Best of luck to you. Hope you find the solution.
 

potatosword

Dabbler
Joined
Dec 30, 2018
Messages
44
I understand and I appreciate the input. I actually had a post a few months ago that was related to this error (though I didn't know it at the time). I'm running TrueNAS-13.0-U5.3 as I haven't updated to U6 that came out last week. I'm a bit old hat in that I let updates mature for about a month before I dive in.

At this point I just want to understand the error more than I want to solve it. If that makes sense.
 
Last edited:

potatosword

Dabbler
Joined
Dec 30, 2018
Messages
44
I ended up migrating to SCALE which seems to have resolved that issue but now the Data Protection tab doesn't load. That's worthy of it's own thread though. Thanks everyone.
 
Top