negabinary
Dabbler
- Joined
- Jun 24, 2016
- Messages
- 11
I'm beginning to think that my server is haunted.
About a year ago, I had a problem where drives would randomly start showing read/write/cksum errors and then become UNAVAIL (all as reported by
In trying to figure out where these errors were coming from, I replaced two of the four drives, added cooling, replaced the SATA cables, and switched to a SAS9211-8I SATA controller (from motherboard SATA). For unrelated reasons, I also upgraded the CPU. I wasn't sure which one of these things did the trick, but the system showed no disk errors whatsoever for the next nine months (up the whole time, never rebooted), and I thought the issue was finally solved.
However, I had to power cycle the server last week - and now the problem seems to be back. Initially, one drive started throwing write and checksum errors, then it went UNAVAIL. Then, another drive started throwing errors, and it went UNAVAIL. Then, a third drive started throwing errors. This is the same pattern I was dealing with last year. I shut down the server and replaced the first drive with a brand-new spare.
Four days later, the brand-new drive has now gone UNAVAIL with 5 write errors, and I'm about to scrap this whole machine and give up.
These drives sit around 30 degrees, even under load, and system load is fairly light in general. No SMART errors on the drives.
Specs are as follows:
Has anyone ever experienced something like this before? I'm willing to try anything at this point.
About a year ago, I had a problem where drives would randomly start showing read/write/cksum errors and then become UNAVAIL (all as reported by
zpool status
).In trying to figure out where these errors were coming from, I replaced two of the four drives, added cooling, replaced the SATA cables, and switched to a SAS9211-8I SATA controller (from motherboard SATA). For unrelated reasons, I also upgraded the CPU. I wasn't sure which one of these things did the trick, but the system showed no disk errors whatsoever for the next nine months (up the whole time, never rebooted), and I thought the issue was finally solved.
However, I had to power cycle the server last week - and now the problem seems to be back. Initially, one drive started throwing write and checksum errors, then it went UNAVAIL. Then, another drive started throwing errors, and it went UNAVAIL. Then, a third drive started throwing errors. This is the same pattern I was dealing with last year. I shut down the server and replaced the first drive with a brand-new spare.
Four days later, the brand-new drive has now gone UNAVAIL with 5 write errors, and I'm about to scrap this whole machine and give up.
These drives sit around 30 degrees, even under load, and system load is fairly light in general. No SMART errors on the drives.
Specs are as follows:
- Supermicro X10SLL-F-O
- Xeon E3-1271v3
- 32GB Crucial DDR3L ECC RAM
- 4x WD Red 3TB
- SAS9211-8I SATA Controller
- FreeNAS 9.10-Stable
Has anyone ever experienced something like this before? I'm willing to try anything at this point.