Hello,
I have a Supermicro 24-bay Chassis with a SAS2 dual-pathed Supermicro/LSI SAS2X36 backplane running TrueNas 12 U3. The backplane is connected to by dual LSI 9702-8i HBA's (one controls each path). The HBA's run FW 20.
In use are 24x 600GB 10k RPM Enterprise drives and 1 of them started to act strange a couple of weeks ago.
I started to get these kind of errors:
"Pool Pool01 state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected."
followed by:
"Multipath multipath/disk13 connection is not optimal. Please check disk cables."
The disk in question is seen by TrueNAS like this:
So right now, since the last reboot, "da13" failed.
If I reboot, "da38" can fail and da13 is fine. It alternates. I can reboot 50 times and get different results on which path of this HDD fails. It seems to be random. One is always active and the other failed. Depends on the alignment of the stars and the moon which failes on the next reboot. This already was the case in v12 U2 and the upgrade to U3 (which of course meant a reboot) caused da13 to fail this time (before the reboot, da38 failed and da13 was good).
The GUI still shows da13 as degraded. On the dashboard, the pool is marked as "unhealthy" (in v12 U2, before the U3 upgrade reboot, it was called "degraded").
When I do a "zpool status", all disks are online and everything is ok but it does mention the "unrecoverable error":
Is this HDD doing the funky chicken? Or is this an issue with software/firmware?
I have a Supermicro 24-bay Chassis with a SAS2 dual-pathed Supermicro/LSI SAS2X36 backplane running TrueNas 12 U3. The backplane is connected to by dual LSI 9702-8i HBA's (one controls each path). The HBA's run FW 20.
In use are 24x 600GB 10k RPM Enterprise drives and 1 of them started to act strange a couple of weeks ago.
I started to get these kind of errors:
"Pool Pool01 state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected."
followed by:
"Multipath multipath/disk13 connection is not optimal. Please check disk cables."
The disk in question is seen by TrueNAS like this:
multipath/disk13 | DEGRADED | |
da13 | FAIL | 50014ee7aaabefcc |
da38 | ACTIVE | 50014ee7aaabefcc |
So right now, since the last reboot, "da13" failed.
If I reboot, "da38" can fail and da13 is fine. It alternates. I can reboot 50 times and get different results on which path of this HDD fails. It seems to be random. One is always active and the other failed. Depends on the alignment of the stars and the moon which failes on the next reboot. This already was the case in v12 U2 and the upgrade to U3 (which of course meant a reboot) caused da13 to fail this time (before the reboot, da38 failed and da13 was good).
The GUI still shows da13 as degraded. On the dashboard, the pool is marked as "unhealthy" (in v12 U2, before the U3 upgrade reboot, it was called "degraded").
When I do a "zpool status", all disks are online and everything is ok but it does mention the "unrecoverable error":
Code:
nas01# zpool status pool: Pool01 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P scan: resilvered 385G in 01:03:07 with 0 errors on Fri Apr 16 15:35:41 2021 config: NAME STATE READ WRITE CKSUM Pool01 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gptid/199def9f-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1a91b995-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1a57856d-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1a72dc44-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1a923c8f-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1ac7ebfc-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1b12dc45-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1cdd56f2-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 gptid/1909a4a5-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/19be4b1d-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1b5b4c71-1c52-11eb-ad4b-a0369f19e510 ONLINE 3 0 0 gptid/1b7d57e8-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1b6c8d69-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1bf27e07-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1c3b860c-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1c7422ff-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 raidz2-2 ONLINE 0 0 0 gptid/1ca9f7d8-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1ee056a8-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1f1d6847-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1f2a0236-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1fb46ddb-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1fc89694-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1fe0f77d-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/20015769-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 logs mirror-3 ONLINE 0 0 0 gptid/1d5bded5-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 gptid/1d7f605a-1c52-11eb-ad4b-a0369f19e510 ONLINE 0 0 0 errors: No known data errors
Is this HDD doing the funky chicken? Or is this an issue with software/firmware?