Hey everyone,
Before I start out, system info:
Build 11.1-U5
i3-4130T
16GB ECC RAM
ASROCK E3C226D2I Mobo
6x WD Red 3TB drives (well, one is now a 4TB, but I'll get to that) set up as 3 mirrored vDevs all in one large zpool.
Been running more or less problem-free since 2014 (weekly scrubs and regular short smart tests), until now.
Okay, here goes...
So, last week I started getting CRITICAL alert e-mails from my box. Started checking it out and I was getting pending unreadable sectors. Ordered a new drive (4TB Red and was going to replace it when one of my other drives just dropped out of the vDev and my pool started showing as degraded. So I (probably too hastily) rushed and replaced that drive instead. Since then, and after the resilver, I've been getting some weird errors and things just aren't stable. Still getting the pending sectors error on ada2, but also, ada2 and ada3 (a mirrored pair) have both dropped out of the pool twice making my entire zpool unavailable (different reboots), and ada5 is starting to throw a lot of ahchi errors and occasionally will drop out of the pool. I shut it down for a couple days while I tried to wrap my head around what could be going on. Did a lot of reading in the forums, some of which led me checking cables and connections several times. Once, when ada5 dropped out, I found some *fantastic* advice that recommended force mounting. Didn't find out until later that this might have been a bad idea. And, I found out during all of this that I should have been doing regularly scheduled long smart tests along the way.
I'm guessing things are going to quickly get worse, so I'm trying to make sure I save as much data as I can, if possible. (This is where I mention that my backups aren't what you'd call backups...) However, troubleshooting-wise, I'm really not sure where to go from here first in order to nail down what's really going on. My original plan was to get as much data off as I can quickly, then start small with the cables first, then purchasing an HBA card to see if it's the SATA ports on my mobo before just throwing new HDDs at the problem. I'm attaching output from smartctl -x /var/log/messages, and zpool status -v. Anything else y'all need me to post just let me know. It's late, so I'm sure there are things I've left out... Crossing my fingers, but trying to steel myself for what I think the answer probably is.
Before I start out, system info:
Build 11.1-U5
i3-4130T
16GB ECC RAM
ASROCK E3C226D2I Mobo
6x WD Red 3TB drives (well, one is now a 4TB, but I'll get to that) set up as 3 mirrored vDevs all in one large zpool.
Been running more or less problem-free since 2014 (weekly scrubs and regular short smart tests), until now.
Okay, here goes...
So, last week I started getting CRITICAL alert e-mails from my box. Started checking it out and I was getting pending unreadable sectors. Ordered a new drive (4TB Red and was going to replace it when one of my other drives just dropped out of the vDev and my pool started showing as degraded. So I (probably too hastily) rushed and replaced that drive instead. Since then, and after the resilver, I've been getting some weird errors and things just aren't stable. Still getting the pending sectors error on ada2, but also, ada2 and ada3 (a mirrored pair) have both dropped out of the pool twice making my entire zpool unavailable (different reboots), and ada5 is starting to throw a lot of ahchi errors and occasionally will drop out of the pool. I shut it down for a couple days while I tried to wrap my head around what could be going on. Did a lot of reading in the forums, some of which led me checking cables and connections several times. Once, when ada5 dropped out, I found some *fantastic* advice that recommended force mounting. Didn't find out until later that this might have been a bad idea. And, I found out during all of this that I should have been doing regularly scheduled long smart tests along the way.
I'm guessing things are going to quickly get worse, so I'm trying to make sure I save as much data as I can, if possible. (This is where I mention that my backups aren't what you'd call backups...) However, troubleshooting-wise, I'm really not sure where to go from here first in order to nail down what's really going on. My original plan was to get as much data off as I can quickly, then start small with the cables first, then purchasing an HBA card to see if it's the SATA ports on my mobo before just throwing new HDDs at the problem. I'm attaching output from smartctl -x /var/log/messages, and zpool status -v. Anything else y'all need me to post just let me know. It's late, so I'm sure there are things I've left out... Crossing my fingers, but trying to steel myself for what I think the answer probably is.
Attachments
Last edited: