I recently upgrade a 5-disk z1 pool. The disks are all u.2 disks connected to nvme. I never had a single problem with the old disks. I replaced and resilvered 1-by-1. I noticed errors and the pool in the "unhealthy" state after the last disk was replaced.
It reported corruption in 1 file, which I deleted and restored from backup. However, it does not report any other files corrupt when I do zpool status -v poolname. No known data errors it says.
All disks report fine using smartctl -a /dev/nvme0 (through 4)
I was foolish and tried hot-swapping one of the first few disks. Hopefully I didn't fry anything. Is there any way to get more info on these errors? The errors did reduce after a reboot. I have already powered down, disconnected power, checked all cables for the disks. Everything seems fine there. I could do that again. I know another step is to disconnect the disks 1-by-1 and see if that makes a difference. Just wondering if there are other steps I can take before doing that, while these checksum errors are showing.
It reported corruption in 1 file, which I deleted and restored from backup. However, it does not report any other files corrupt when I do zpool status -v poolname. No known data errors it says.
All disks report fine using smartctl -a /dev/nvme0 (through 4)
I was foolish and tried hot-swapping one of the first few disks. Hopefully I didn't fry anything. Is there any way to get more info on these errors? The errors did reduce after a reboot. I have already powered down, disconnected power, checked all cables for the disks. Everything seems fine there. I could do that again. I know another step is to disconnect the disks 1-by-1 and see if that makes a difference. Just wondering if there are other steps I can take before doing that, while these checksum errors are showing.
root@truenas[~]# zpool status -v nvmepool
pool: nvmepool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 01:19:47 with 0 errors on Sat Mar 11 13:09:19 2023
config:
NAME STATE READ WRITE CKSUM
nvmepool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/79503b1e-bed2-11ed-bf19-e8ea6a27b990 ONLINE 0 036
gptid/b82f2aca-bee2-11ed-837b-e8ea6a27b990 ONLINE 0 036
gptid/3cdbabfa-beb8-11ed-9ff7-e8ea6a27b990 ONLINE 0 036
gptid/99196bee-bef3-11ed-8b1d-e8ea6a27b990 ONLINE 0 036
gptid/6580146a-bc53-11ed-86cd-e8ea6a27b990 ONLINE 0 036
errors: No known data errors