Mannekino
Patron
- Joined
- Nov 14, 2012
- Messages
- 332
Yesterday during a reboot of my server I saw some error messages when looking at the IPMI console. After FreeNAS was done booting I saw that my primary data pool was degraded and one drive was missing. After a few minutes the drive reappeared and the pool resilvered in a few seconds and it was healthy again.
Last night at 1:00 in the morning my bi-weekly scrub of this pool started and around 3:34 I got an e-mail alert the pool was degraded again. I got this in the mail
The scrub just finished after about 6.5 hours. After the scrub was done it didn’t resilver like it did earlier yesterday, it was still degraded. I took this screenshot during the scrub (when it was at about 98% done). After the scrub was done I turned off the server and checked all the cables if they were properly fitted. Everything was in order, all the cables were plugged in all the way. When the server was back online the pool was healthy again.
Screenshot during scrub
After shutting down, checking cables and turning back on again
Pool status right now
How should I proceed in troubleshooting this issue? I’ve been running bi-weekly scrubs of this pool and daily SMART short self-test. Also I would like to point to this post where I added an exhaust fan for my SAS HBA and applied the common old school 7 V mod to the fan to reduce the noise. One of the first things I did was to disconnect the exhaust fan this morning. I already ordered a simple fan controller for this to solve it properly. Also I plugged in the SATA cable supplying power to my drives to another output a couple of days ago when I attached the Molex power connectors for the new exhaust fan. I don’t know if any of this is relevant but I mention it anyway.
I’ve ordered a new drive just to be sure and I’m going to pick it up later today.
Last night at 1:00 in the morning my bi-weekly scrub of this pool started and around 3:34 I got an e-mail alert the pool was degraded again. I got this in the mail
Code:
New alerts: * The volume data state is DEGRADED: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.
The scrub just finished after about 6.5 hours. After the scrub was done it didn’t resilver like it did earlier yesterday, it was still degraded. I took this screenshot during the scrub (when it was at about 98% done). After the scrub was done I turned off the server and checked all the cables if they were properly fitted. Everything was in order, all the cables were plugged in all the way. When the server was back online the pool was healthy again.
Screenshot during scrub
After shutting down, checking cables and turning back on again
Pool status right now
Code:
root@freenas[~]# zpool status pool: data state: ONLINE scan: scrub repaired 356K in 0 days 06:32:00 with 0 errors on Mon Feb 18 07:32:01 2019 config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 gptid/a18b468c-b464-11e8-9fd5-0025907457e1 ONLINE 0 0 0 gptid/a2b7dae8-b464-11e8-9fd5-0025907457e1 ONLINE 0 0 0 gptid/a3f67a48-b464-11e8-9fd5-0025907457e1 ONLINE 0 0 0 gptid/a52943ef-b464-11e8-9fd5-0025907457e1 ONLINE 0 0 0
How should I proceed in troubleshooting this issue? I’ve been running bi-weekly scrubs of this pool and daily SMART short self-test. Also I would like to point to this post where I added an exhaust fan for my SAS HBA and applied the common old school 7 V mod to the fan to reduce the noise. One of the first things I did was to disconnect the exhaust fan this morning. I already ordered a simple fan controller for this to solve it properly. Also I plugged in the SATA cable supplying power to my drives to another output a couple of days ago when I attached the Molex power connectors for the new exhaust fan. I don’t know if any of this is relevant but I mention it anyway.
I’ve ordered a new drive just to be sure and I’m going to pick it up later today.
Last edited: