Pool degraded, drive showed checksum errors, now removed.

Status
Not open for further replies.
Joined
Oct 10, 2016
Messages
21
Pool became degraded but all drives were still online. ADA6 showed checksum errors but was still online. I ran the smart short and long tests against it and the smart tests both passed. I didn't save the output of those commands though :(

Today I had to reboot the FreeNAS Mini and now the drive shows as removed. I've been receiving lots of alerts via email today as well.

My question - is there anything I can do or should I just order a new hard drive and swap it out?
GnZoR1L.png


Some of the email alerts I got today.
Code:
Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Device: /dev/ada6, 21 Currently unreadable (pending) sectors
The volume Pool1 state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.

Code:
Device: /dev/ada1, 2 Currently unreadable (pending) sectors
The volume Pool1 state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.
Device: /dev/ada6, 21 Currently unreadable (pending) sectors

Code:
Device: /dev/ada1, 2 Currently unreadable (pending) sectors
The volume Pool1 state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.
Device: /dev/ada6, unable to open device


Email alert from yesterday:
Code:
freenas.local kernel log messages:
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Retrying command
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Retrying command
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Retrying command
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Retrying command
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Error 5, Retries exhausted
> GEOM_ELI: g_eli_read_done() failed (error=5) ada6p1.eli[READ(offset=8253440, length=4096)]
> swap_pager: I/O error - pagein failed; blkno 2623460,size 4096, error 5
> vm_fault: pager read error, pid 89787 (pkg)
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Retrying command
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Retrying command
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Retrying command
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Retrying command
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Error 5, Retries exhausted
> GEOM_ELI: g_eli_read_done() failed (error=5) ada6p1.eli[READ(offset=8253440, length=4096)]
> swap_pager: I/O error - pagein failed; blkno 2623460,size 4096, error 5
> vm_fault: pager read error, pid 89787 (pkg)
> Failed to fully fault in a core file segment at VA 0x80065f000 with size 0x1000 to be written at offset 0xf000 for process pkg
> pid 89787 (pkg), uid 0: exited on signal 11 (core dumped)
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Retrying command
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Retrying command
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Retrying command
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Retrying command
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Error 5, Retries exhausted
> GEOM_ELI: g_eli_read_done() failed (error=5) ada6p1.eli[READ(offset=8253440, length=4096)]
> swap_pager: I/O error - pagein failed; blkno 2623460,size 4096, error 5
> vm_fault: pager read error, pid 89882 (pkg)
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Retrying command
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Retrying command
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Retrying command
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Retrying command
> (ada6:ahcich13:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 3f 00 40 00 00 00 00 00 00
> (ada6:ahcich13:0:0:0): CAM status: ATA Status Error
> (ada6:ahcich13:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
> (ada6:ahcich13:0:0:0): RES: 41 40 78 3f 00 40 00 00 00 00 00
> (ada6:ahcich13:0:0:0): Error 5, Retries exhausted
> GEOM_ELI: g_eli_read_done() failed (error=5) ada6p1.eli[READ(offset=8253440, length=4096)]
> swap_pager: I/O error - pagein failed; blkno 2623460,size 4096, error 5
> vm_fault: pager read error, pid 89882 (pkg)
> Failed to fully fault in a core file segment at VA 0x80065f000 with size 0x1000 to be written at offset 0xf000 for process pkg
> pid 89882 (pkg), uid 0: exited on signal 11 (core dumped)
> (ada5:ahcich12:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 10 d0 87 37 40 ea 00 00 00 00 00
> (ada5:ahcich12:0:0:0): CAM status: ATA Status Error
> (ada5:ahcich12:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
> (ada5:ahcich12:0:0:0): RES: 41 10 d0 87 37 40 ea 00 00 00 00
> (ada5:ahcich12:0:0:0): Retrying command
> (ada5:ahcich12:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 30 e0 b7 3c 40 ea 00 00 00 00 00
> (ada5:ahcich12:0:0:0): CAM status: ATA Status Error
> (ada5:ahcich12:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
> (ada5:ahcich12:0:0:0): RES: 41 10 e0 b7 3c 40 ea 00 00 00 00
> (ada5:ahcich12:0:0:0): Retrying command
> (ada5:ahcich12:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
> (ada5:ahcich12:0:0:0): CAM status: ATA Status Error
> (ada5:ahcich12:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
> (ada5:ahcich12:0:0:0): RES: 51 04 80 7f 52 40 ac 00 00 00 00
> (ada5:ahcich12:0:0:0): Retrying command
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Today I had to reboot the FreeNAS Mini and now the drive shows as removed. I've been receiving lots of alerts via email today as well.

My question - is there anything I can do or should I just order a new hard drive and swap it out?
From the email info you posted, I would bet that your FreeNAS locked up because it could not page data in from the failed drive ada6 and you had to reboot it because it was not responding. It shows the drive as removed because the drive is not responding. The drive is probably failing the internal self-test at power on, so the drive never comes ready to the system, so the system marks it as removed. That drive, ada6, should be replaced first.
I would not replace them both at the same time, because the faulted drive ada1 is still working to some degree, but I would say they both need replaced. Once the system finishes resilver from the ada6 replacement, I would replace ada1 next and while you are getting drives, you might want to have another spare on hand.
How long have these drives been running and what kind of drives?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Joined
Oct 10, 2016
Messages
21
Thank you for the quick replies!

This is a FreeNAS mini I purchased from IXSystems last September 2016. It came with a 1yr hardware warranty from IX. The drives are Western Digital Reds 4TB.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,964
Set up an advance RMA with Western digital for both drives, they should still be under warranty.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Thank you for the quick replies!

This is a FreeNAS mini I purchased from IXSystems last September 2016. It came with a 1yr hardware warranty from IX. The drives are Western Digital Reds 4TB.
Warranty replacement drives will be great, but you should still have a spare on hand. It is not very unusual to have drives fail in their first year of service. I have a new server at work where I had three drives fail in the first year.
You might want to run a burnin on the two replacements and on the new drive you get as a spare. It is a way of making sure they are working properly before you put them into the array.
Since you are already running your NAS and you probably don't want to take it offline to do the testing on the new drives before you put them in, you can connect the new drives to any other computer you have and make a 'fake' NAS temporarily to do the testing on the new drives.
Here are some useful scripts for monitoring your system including one for the disk burnin:
https://forums.freenas.org/index.ph...for-freenas-scripts-including-disk-burnin.28/
Here is how to do the replacement:
https://forums.freenas.org/index.php?resources/replacing-a-failed-failing-disk.75/
Also, here is a handy guide to identifying your drives:
https://forums.freenas.org/index.php?resources/identify-your-drives-by-serial-number.64/
 
Joined
Oct 10, 2016
Messages
21
WD website wouldn't allow me to choose Advanced RMA so I had to do a standard RMA. I submitted it for ada6 and when I get the new drive and it's done rebuilding I'll manually fail ada1 and RMA that one as well.

Thanks all for the help here.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,856
WD website wouldn't allow me to choose Advanced RMA so I had to do a standard RMA. I submitted it for ada6 and when I get the new drive and it's done rebuilding I'll manually fail ada1 and RMA that one as well.
That is odd, not sure why they would not do an advanced RMA unless you just didn't want to use a credit card.

Also, did you do any SMART Short or Long Tests? Looking at those results would give you clear indication on which drive would be best replaced first. And I'd highly recommend that you backup any important data now.
 
Status
Not open for further replies.
Top