Perplexed with Read/Write Errors

Status
Not open for further replies.

fx24

Dabbler
Joined
Sep 27, 2013
Messages
38
Case: Norco RPC-4224
PSU: EVGA SuperNOVA 850G2
Motherboard: Supermicro X10SL7-F
Memory: Crucial 16GB
CPU: Xeon E3-1276
Controllers: LSI 9201-16i, onboard LSI 2308
HDDs: 24x 6TB WD Reds
Storage Config: 4 vdevs of 6-drive RAIDZ2
OS: 9.3
Driver: 16
FW: 16


This all started 3 days ago during a scrub. I woke up to a faulted drive showing read and write errors and no checksum errors. SMART data didn't reveal any errors on any drive. This being my first critical error encounter with FreeNAS I was scared and didn't do anything. I looked over a lot of posts and found someone else with a similar situation who had rebooted which made the critical pool status and faulted drive return to healthy.

I went home that evening, rebooted, and the pool and drive came up fine as if it never happened. I ran a long test on that drive and rechecked the SMART results and there was still zero issues with all attributes. Two days go by, and I have the same issue again except this time it is a different drive. I can tell because the gptid is different. I am using 2 reverse breakout cables, and I am not home so I don't know if 2 of those leads originate from the same cable, but I do know that both drives are on different backplane levels. However, they are on the same vdev.

When I get home tonight, I am going to power down the server, reseat all of the connections for the data cables and power connectors, but aside from that, I am at a loss about what other options I have left if the problem persists. These are fairly new drives that aren't any older than a year. All of them were initially burned in with conveyance, long and badblock testing and then transferred TBs of live data without any errors.

Please help FreeNAS gurus for I am out of my depth without room for error.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
had rebooted which made the critical pool status and faulted drive return to healthy.
Error counters are reset on reboot. That's why.

If SMART data is fine, I'd venture bad cables as the most likely option. Next up would be bad power, which is on the unlikely side, since that's a good model.
 

fx24

Dabbler
Joined
Sep 27, 2013
Messages
38
Error counters are reset on reboot. That's why.

If SMART data is fine, I'd venture bad cables as the most likely option. Next up would be bad power, which is on the unlikely side, since that's a good model.

Ahh, ok. Well, now that I am home, I took a look. One of the drives was connected to a reverse breakout cable (norco), and the other one was from a cable connected to my 9201-16i. So that is two different makes and types of cables affected. That leaves me very skeptical of the cables being at fault. I also doubt that it could be the power supply, but I can't think of any other culprit.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
It could be bad power without the PSU being directly at fault, though such scenarios are unlikely.

There's a couple more scenarios, but they boil down to "aliens!" or "Meh, stay vigilant and see if it happens again."

Make sure you have spares burned-in and ready to go, in case some drive does fail.
 

fx24

Dabbler
Joined
Sep 27, 2013
Messages
38
I have 2 spare drives that are burned in. I guess this is going to turn into a wait and see what happens next since it doesn't seem like drive failure? I just want to veer to the safe side if there is a reasonable one.
 
Last edited:
Status
Not open for further replies.
Top