checksum errors on u.2 disks connected to nvme after upgrade

SAK · Mar 11, 2023

I recently upgrade a 5-disk z1 pool. The disks are all u.2 disks connected to nvme. I never had a single problem with the old disks. I replaced and resilvered 1-by-1. I noticed errors and the pool in the "unhealthy" state after the last disk was replaced.

It reported corruption in 1 file, which I deleted and restored from backup. However, it does not report any other files corrupt when I do zpool status -v poolname. No known data errors it says.

All disks report fine using smartctl -a /dev/nvme0 (through 4)

I was foolish and tried hot-swapping one of the first few disks. Hopefully I didn't fry anything. Is there any way to get more info on these errors? The errors did reduce after a reboot. I have already powered down, disconnected power, checked all cables for the disks. Everything seems fine there. I could do that again. I know another step is to disconnect the disks 1-by-1 and see if that makes a difference. Just wondering if there are other steps I can take before doing that, while these checksum errors are showing.

root@truenas[~]# zpool status -v nvmepool
pool: nvmepool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 01:19:47 with 0 errors on Sat Mar 11 13:09:19 2023
config:

NAME STATE READ WRITE CKSUM
nvmepool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/79503b1e-bed2-11ed-bf19-e8ea6a27b990 ONLINE 0 036
gptid/b82f2aca-bee2-11ed-837b-e8ea6a27b990 ONLINE 0 036
gptid/3cdbabfa-beb8-11ed-9ff7-e8ea6a27b990 ONLINE 0 036
gptid/99196bee-bef3-11ed-8b1d-e8ea6a27b990 ONLINE 0 036
gptid/6580146a-bc53-11ed-86cd-e8ea6a27b990 ONLINE 0 036

errors: No known data errors

souporman · Mar 13, 2023

zpool clear nvmepool, scrub the pool. Everything's probably fine. That "one or more devices has experienced an unrecoverable error" will go away when you clear, and you already dealt with the error. Run long smart tests on all the drives, but chances are everything is fine.

Important Announcement for the TrueNAS Community.

checksum errors on u.2 disks connected to nvme after upgrade

SAK

Dabbler

souporman

Explorer

Similar threads

Important Announcement for the TrueNAS Community.

checksum errors on u.2 disks connected to nvme after upgrade

SAK

Dabbler

souporman

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "checksum errors on u.2 disks connected to nvme after upgrade"

Similar threads