All NVMe pool shows DEGRADED, and disks are showing checksum error

SGT_GUO

Dabbler
Joined
Sep 6, 2019
Messages
11
I have recently switched from a Windows Server file server to using TrueNAS, however after I set up TrueNAS, it shows the pool is degraded, and disks are showing checksum errors, but DELL iDrac doesn't show any errors. Before when it was in the Windows Server it was in a raid 6 array with dell H755N raid controller, and it was not showing any errors. These disks are in non-raid mode now, which is HBA mode for Dell disk controllers.
TrueNAS is a virtual machine in VMWare ESXi, but the raid controller is passed through to the VM, so I don't think that would be a issue.
I bought the server new last year, and the disks with the server, so I don't think the disks have failed. Maybe some other error causing this?
Here are some screen shots of the server disk info:

QQ截图20221114152520.png

QQ截图20221114152718.png


Here are my server specs:
Dell R750
CPU: Intel 8380 x2
Memory: 512G in total, 128G to TrueNAS VM
Raid controller: Dell H755N
Drives: Dell 7.68T NVMe read intensive (I believe they are Intel P5500)
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Raid controller: Dell H755N
You're on the road to trouble and maybe you've already arrived:
 

SGT_GUO

Dabbler
Joined
Sep 6, 2019
Messages
11
You're on the road to trouble and maybe you've already arrived:
With Dell controllers, you can choose Raid mode or Non-Raid mode. Non-Raid mode is IT firmware, which is HBA. I have used H755 raid controllers on spinning disk and normal SSDs with out any problem in Truenas.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
If you're saying you've flashed the controller into IT mode and you have passed through the PCIE device to the VM, great.

I don't understand what that second picture is in your first post... seems like not a screen from TrueNAS, so it's a little strange to have a full view of all disks from ESXi or whatever if passthrough was done properly.
 

SGT_GUO

Dabbler
Joined
Sep 6, 2019
Messages
11
If you're saying you've flashed the controller into IT mode and you have passed through the PCIE device to the VM, great.

I don't understand what that second picture is in your first post... seems like not a screen from TrueNAS, so it's a little strange to have a full view of all disks from ESXi or whatever if passthrough was done properly.
Second picture is from Dell iDrac, which is Dell version of IMPI. Cannot see the disks from ESXi.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
OK, so I'll just take your word for it that you have the right controller in the right mode...

Checksum errors indicate cabling, backplane or controller problems... since all disks are impacted, you might want to consider SSD firmware also... but the evidence is pointing pretty squarely at the controller from my point of view.

First check the firmware and cabling/backplane connections (think about anything you did recently that may have unseated something).
 

SGT_GUO

Dabbler
Joined
Sep 6, 2019
Messages
11
OK, so I'll just take your word for it that you have the right controller in the right mode...

Checksum errors indicate cabling, backplane or controller problems... since all disks are impacted, you might want to consider SSD firmware also... but the evidence is pointing pretty squarely at the controller from my point of view.

First check the firmware and cabling/backplane connections (think about anything you did recently that may have unseated something).
I have opened a case with DELL support, and ask them to make sure that NVMe Non-Raid mode is the same as SATA/SAS Non-Raid mode. If this is correct, I will check for loose cable or other parts.
 

JoeAtWork

Contributor
Joined
Aug 20, 2018
Messages
165
try a different HBA, a generic LSI/Broadcom as I understand the Dell ones have low queue depths.
 

SGT_GUO

Dabbler
Joined
Sep 6, 2019
Messages
11
try a different HBA, a generic LSI/Broadcom as I understand the Dell ones have low queue depths.
I really wish I could, however this generation DELL servers have raid controllers directly attached to the backplane, they are not standered PCIE, and no cable or ports from the backplane.
 

dl9

Cadet
Joined
Jan 11, 2023
Messages
7
With Dell controllers, you can choose Raid mode or Non-Raid mode. Non-Raid mode is IT firmware, which is HBA. I have used H755 raid controllers on spinning disk and normal SSDs with out any problem in Truenas.

I think the problem stems from you thinking it is in HBA mode when it's not. I have a dell T550 with the H755N and I don't see an option to put it in HBA mode. I see the option you are talking about. The NVME mode is Non-Raid mode, but the controller mode is still RAID.

Go to Device Settings > Raid Controller H755N > View Server Profile > Controller Management > and check the controller mode. Mine Says Raid.

If you continue further in the menu to Advanced Controller Management > Manage Controller Mode > mine says controller mode RAID with no option to change.

Did you figure out a solution to your ZFS using H755N?
 
Top