Hi there,
I'm on TrueNAS Scale 22.12.2, running on ESXi 7.0 U3.
Since about two weeks I get read, write and especially checksum errors all the time on the pool, and they are getting more and more in number.
At the beginning there were only 20-50 checksum errors, now there are 200k and more within a very short time.
I have already changed the controller twice, checked and replaced all cables, re-seated all drives, changed drive bays and even migrated the system via vMotion to another ESXi host, because I did not want to exclude a damaged backplane either. Unfortunately, the errors are also present on the second ESXi host.
I really have no clue what could cause this many errors, on practically new drives, they were purchased in January this year. The controllers also seem to be fine.
The drives are 8 x ST12000NM002G, configured in RAID-Z2. The SMART data doesn't show anything suspicious, disks seem to be fine.
Specs ESXi host 01:
VMware ESXi, 7.0.3, 21686933
Supermicro X12SPL-F
Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz
192GB DDR4 ECC RAM
Specs ESXi host 02:
VMware ESXi, 7.0.3, 21686933
Supermicro X11SRA-RF
Intel(R) Xeon(R) W-2150B CPU @ 3.00GHz
64GB DDR4 ECC RAM
HBAs tested:
LSI 9300-16i (original HBA)
2 HPE Smart Array H240 (backup HBAs)
All flashed / configured in IT mode on latest firmware.
The HBAs are in PCI passthrough mode to the TrueNAS VM.
I still also got an old Adaptec 71605 laying around but didn't test it yet.
'zpool status tank01 -v' currently shows the following:
I just can't imagine 8 practically new hard drives going belly up at the same time.
Does anyone have a hint that could get me going in the right direction?
I'm on TrueNAS Scale 22.12.2, running on ESXi 7.0 U3.
Since about two weeks I get read, write and especially checksum errors all the time on the pool, and they are getting more and more in number.
At the beginning there were only 20-50 checksum errors, now there are 200k and more within a very short time.
I have already changed the controller twice, checked and replaced all cables, re-seated all drives, changed drive bays and even migrated the system via vMotion to another ESXi host, because I did not want to exclude a damaged backplane either. Unfortunately, the errors are also present on the second ESXi host.
I really have no clue what could cause this many errors, on practically new drives, they were purchased in January this year. The controllers also seem to be fine.
The drives are 8 x ST12000NM002G, configured in RAID-Z2. The SMART data doesn't show anything suspicious, disks seem to be fine.
Specs ESXi host 01:
VMware ESXi, 7.0.3, 21686933
Supermicro X12SPL-F
Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz
192GB DDR4 ECC RAM
Specs ESXi host 02:
VMware ESXi, 7.0.3, 21686933
Supermicro X11SRA-RF
Intel(R) Xeon(R) W-2150B CPU @ 3.00GHz
64GB DDR4 ECC RAM
HBAs tested:
LSI 9300-16i (original HBA)
2 HPE Smart Array H240 (backup HBAs)
All flashed / configured in IT mode on latest firmware.
The HBAs are in PCI passthrough mode to the TrueNAS VM.
I still also got an old Adaptec 71605 laying around but didn't test it yet.
'zpool status tank01 -v' currently shows the following:
Code:
pool: tank01 pool: tank01 state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: resilvered 49.7G in 00:12:14 with 2045 errors on Mon May 15 15:18:36 2023 config: NAME STATE READ WRITE CKSUM tank01 DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 c751ab6b-6fe5-49e2-a637-c373611c4c77 DEGRADED 0 0 5.89K too many errors 9d5db49f-15b6-44e2-8d50-3768cd32304b DEGRADED 0 0 5.90K too many errors 8fa742b0-bd8f-4423-a1d7-36e3a6d0781e DEGRADED 0 0 5.90K too many errors b490113d-cc59-4ccd-ab8e-8ca953f9820f DEGRADED 0 0 5.75K too many errors eed08a51-f1f3-4aa9-895a-9fef0aad0ecf DEGRADED 0 0 5.58K too many errors 6a58ae25-6998-4ea1-af30-83e0f1546f66 DEGRADED 0 0 5.41K too many errors 7e31423d-2a9f-412c-a1e2-954a85708974 DEGRADED 0 0 5.72K too many errors 62e33e68-1443-406f-becc-60e486da5889 DEGRADED 0 0 5.75K too many errors errors: Permanent errors have been detected in the following files: tank01/Backup:<0x0> tank01/Backup:<0x1903> tank01/Backup:<0x1905> tank01/Backup:<0x1719> tank01/Backup:<0x1830> tank01/Backup:<0x1940> tank01/Backup:<0x1943> tank01/Backup:<0x1945> /mnt/tank01/Backup/wordpress_backups/file1.zip /mnt/tank01/Backup/wordpress_backups/file2.zip /mnt/tank01/Backup/wordpress_backups/file3.zip /mnt/tank01/Backup/wordpress_backups/file4.zip tank01/Backup:<0x1755> tank01/Backup:<0x195a> tank01/Backup:<0x18a5> tank01/Backup:<0x19ad> tank01/Backup:<0x17ae> tank01/Backup:<0x17af> tank01/Backup:<0x17b0> /mnt/tank01/Backup/wordpress_backups/file5.zip
I just can't imagine 8 practically new hard drives going belly up at the same time.
Does anyone have a hint that could get me going in the right direction?
Last edited: