Since a week I have the following problem. My pool works fine for a few days and after a while it marks the pool as degraded telling me a disk is faulty.
The device in question shows up always with 1-10 write errors and 100-300 read errors. Each time it is a different disk; checked device id with smartctl.
The first two times I ran a long SMART test on the disk in question which it passed with no problems. Seeing as there are no problems I brought the disk offline/online, resilvered and did a scrub. The pool functions then with no problems for a day and then the same thing happens again.
Now 2 different with disks are showing this and I'm sure they are not faulty:
I'm running FreeNAS-11.3-U5.
This started happening after I went into my bios and disabled C6 state (since I got a Zen+ cpu) and enabled ECC (which was set to Auto before); before this it has been running for about 2 years without any problems. I had to turn C6 off because after the upgrade to 11.3-U5 my system would hang after running for a month with no logs indication if something went wrong, the only thing to do was put the power off..... I never had this problem before upgrade (I had the last version with the old UI).
Could it be that I lost data after having a hard crash (system would not react to anything but a ping)?
Before I start doing more things that could potentially harm my pool even more I want some feedback on how to proceed.My first thought was setting ECC back to Auto.
I did yesterday do the suggested pool update (suggested by Freenas).
I got 6 of the following disks:
Seagate IronWolf - ST10000VN0004
Rest of the system:
AMD Ryzen 5 2600
ASRock B450M Pro4
IBM ServeRAID M1015 SAS/SATA Controller for System x (flashed it to only use it for the extra sata ports)
Intel Gigabit CT Desktop Adapter
2x Kingston ValueRAM KVR24E17S8K2/16I (for a total of 32GB ECC non-buffered ram)
Intel DC S3520 M.2 150GB (has protection against power failures)
Cooler Master V Series V550
This is my first post I hope I didn't forget anything; I have searched around the forum but could not find something similar, I might lack the wrong terminology for a effective search.
The device in question shows up always with 1-10 write errors and 100-300 read errors. Each time it is a different disk; checked device id with smartctl.
The first two times I ran a long SMART test on the disk in question which it passed with no problems. Seeing as there are no problems I brought the disk offline/online, resilvered and did a scrub. The pool functions then with no problems for a day and then the same thing happens again.
Now 2 different with disks are showing this and I'm sure they are not faulty:
Code:
NAME STATE READ WRITE CKSUM Volume1 DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 gptid/97216be1-177a-11e9-8c11-6805ca843b8a ONLINE 0 0 0 gptid/9806778a-177a-11e9-8c11-6805ca843b8a ONLINE 0 0 0 gptid/98dac192-177a-11e9-8c11-6805ca843b8a ONLINE 0 0 0 gptid/99b4a619-177a-11e9-8c11-6805ca843b8a FAULTED 10 298 0 too many errors gptid/3b44fea7-23f9-11e9-9f54-6805ca843b8a ONLINE 0 0 0 gptid/9b68622e-177a-11e9-8c11-6805ca843b8a FAULTED 8 160 0 too many errors
Code:
<INTEL SSDSCKJB150G7 N2010121> at scbus2 target 0 lun 0 (pass0,ada0) <ATA ST10000VN0004-1Z SC60> at scbus4 target 6 lun 0 (pass1,da0) <ATA ST10000VN0004-1Z SC60> at scbus4 target 7 lun 0 (pass2,da1) <ATA ST10000VN0004-1Z SC60> at scbus4 target 9 lun 0 (pass3,da2) <ATA ST10000VN0004-1Z SC60> at scbus4 target 10 lun 0 (pass4,da3) <ATA ST10000VN0004-1Z SC60> at scbus4 target 11 lun 0 (pass5,da4) <ATA ST10000VN0004-1Z SC60> at scbus4 target 12 lun 0 (pass6,da5)
I'm running FreeNAS-11.3-U5.
This started happening after I went into my bios and disabled C6 state (since I got a Zen+ cpu) and enabled ECC (which was set to Auto before); before this it has been running for about 2 years without any problems. I had to turn C6 off because after the upgrade to 11.3-U5 my system would hang after running for a month with no logs indication if something went wrong, the only thing to do was put the power off..... I never had this problem before upgrade (I had the last version with the old UI).
Could it be that I lost data after having a hard crash (system would not react to anything but a ping)?
Before I start doing more things that could potentially harm my pool even more I want some feedback on how to proceed.My first thought was setting ECC back to Auto.
I did yesterday do the suggested pool update (suggested by Freenas).
I got 6 of the following disks:
Seagate IronWolf - ST10000VN0004
Rest of the system:
AMD Ryzen 5 2600
ASRock B450M Pro4
IBM ServeRAID M1015 SAS/SATA Controller for System x (flashed it to only use it for the extra sata ports)
Intel Gigabit CT Desktop Adapter
2x Kingston ValueRAM KVR24E17S8K2/16I (for a total of 32GB ECC non-buffered ram)
Intel DC S3520 M.2 150GB (has protection against power failures)
Cooler Master V Series V550
This is my first post I hope I didn't forget anything; I have searched around the forum but could not find something similar, I might lack the wrong terminology for a effective search.