Keep getting warning on ZFS pool

Status
Not open for further replies.

BanditBBS

Dabbler
Joined
Mar 21, 2014
Messages
18
I keep(2 times now) getting this warning:
WARNING: The volume Volume_1 (ZFS) status is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'

But after looking at everything the pool shows healthy and I see no errors on any disks. Is there a log somewhere that I can see what it is talking about?
 

ser_rhaegar

Patron
Joined
Feb 2, 2014
Messages
358
Post the results of
Code:
zpool status -v
in code tags.
 

BanditBBS

Dabbler
Joined
Mar 21, 2014
Messages
18
Here ya go, looks like a couple checksum errors. Should I really worry to much about that and is that what is causing the warning in freenas?

Code:
pool: Volume_1
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: none requested
config:
 
NAME STATE READ WRITE CKSUM
Volume_1 ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
gptid/b8758157-ad56-11e3-b538-90e2ba18f61a ONLINE 0 0 0
gptid/b8e36d92-ad56-11e3-b538-90e2ba18f61a ONLINE 0 0 0
gptid/b95c7372-ad56-11e3-b538-90e2ba18f61a ONLINE 0 0 0
gptid/b9d73ff7-ad56-11e3-b538-90e2ba18f61a ONLINE 0 0 0
gptid/ba63c43a-ad56-11e3-b538-90e2ba18f61a ONLINE 0 0 0
gptid/baf509af-ad56-11e3-b538-90e2ba18f61a ONLINE 0 0 0
gptid/bbd9be85-ad56-11e3-b538-90e2ba18f61a ONLINE 0 0 0
gptid/bcab5da5-ad56-11e3-b538-90e2ba18f61a ONLINE 0 0 1
gptid/bd6b9d66-ad56-11e3-b538-90e2ba18f61a ONLINE 0 0 0
gptid/be2fd6bc-ad56-11e3-b538-90e2ba18f61a ONLINE 0 0 0
gptid/beef3cbd-ad56-11e3-b538-90e2ba18f61a ONLINE 0 0 1
 
errors: No known data errors 
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Those with the "1" in the last column have had chksum errors. Are you running ECC RAM? If not, stop and do a RAM test before reading or doing the rest of this:


I'm not sure how long you've had that pool, but you don't appear to have EVER done a scrub. It should be done no less frequently than monthly. So I'd do one... zpool scrub Volume_1

Other than that, I'd let the scrub run. If you get no additional errors above the 2 you have there then I'd do a "zpool clear" to clear the errors and then just watch it closely.

I'd also do short and long SMART tests of all of your disks and check their SMART info to ensure everything is good.
 

BanditBBS

Dabbler
Joined
Mar 21, 2014
Messages
18
I am running 32GB of ECC ram.

I'll make sure and schedule the SMART tests and I am pretty sure I have scrubs scheduled, but will verify. I cleared the first warning, so I know how to do that, was just hoping for someone else's opinion, so thanks for giving that to me!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah.. do a scrub. If the scrub is clean do a short and long test. If they come back clean just monitor for future errors. It's not uncommon to have an occasional error from time to time, but it shouldn't be very common, and you should be suspicious anytime you get them. ;)
 

BanditBBS

Dabbler
Joined
Mar 21, 2014
Messages
18
So, what do you consider clean? I got 10 and 13 CKSUM errors now after the scrub.
Code:
  pool: Volume_1
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 324K in 5h7m with 0 errors on Sun Mar 30 17:27:25 2014
config:
 
        NAME                                            STATE     READ WRITE CKSUM
        Volume_1                                        ONLINE       0     0     0
          raidz3-0                                      ONLINE       0     0     0
            gptid/b8758157-ad56-11e3-b538-90e2ba18f61a  ONLINE       0     0     0
            gptid/b8e36d92-ad56-11e3-b538-90e2ba18f61a  ONLINE       0     0     0
            gptid/b95c7372-ad56-11e3-b538-90e2ba18f61a  ONLINE       0     0     0
            gptid/b9d73ff7-ad56-11e3-b538-90e2ba18f61a  ONLINE       0     0     0
            gptid/ba63c43a-ad56-11e3-b538-90e2ba18f61a  ONLINE       0     0     0
            gptid/baf509af-ad56-11e3-b538-90e2ba18f61a  ONLINE       0     0     0
            gptid/bbd9be85-ad56-11e3-b538-90e2ba18f61a  ONLINE       0     0     0
            gptid/bcab5da5-ad56-11e3-b538-90e2ba18f61a  ONLINE       0     0    10
            gptid/bd6b9d66-ad56-11e3-b538-90e2ba18f61a  ONLINE       0     0     0
            gptid/be2fd6bc-ad56-11e3-b538-90e2ba18f61a  ONLINE       0     0     0
            gptid/beef3cbd-ad56-11e3-b538-90e2ba18f61a  ONLINE       0     0    13
 
errors: No known data errors
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Ahh those are what we call flakeydrives. :smile: Hit them up one at a time with some SMART tests and see what SMART thinks especially for a long test.
 

BanditBBS

Dabbler
Joined
Mar 21, 2014
Messages
18
One of them is an older drive so doesn't surprise me, the other is a brand new WD Red, so BAH!

Is is safe to assume the zpool status command is listing the drives in order from ada0 through ada10?

I do appreciate the help guys!
 

ser_rhaegar

Patron
Joined
Feb 2, 2014
Messages
358
To match gptids to devices, use:
Code:
glabel status
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Is is safe to assume the zpool status command is listing the drives in order from ada0 through ada10?

Nope. ser_rhaegar's command is the solution.
 

BanditBBS

Dabbler
Joined
Mar 21, 2014
Messages
18
Yeah, I used it and ran the short and long test on the problem drives, all passed fine. So guess I'll wait it out and if it continues I'll just cross ship rma with WD and get them replaced.

Thanks for the lessons everyone...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'd consider trying a different SATA cable just to rule out the cables.
 

alexg

Contributor
Joined
Nov 29, 2013
Messages
197
Had same issues with brand new WD RED drives and turned out to be bad SATA cable.
 

Starpulkka

Contributor
Joined
Apr 9, 2013
Messages
179
If you are unsure has scrubs done you can see pool history by typing in console (there should be once a month zpool scrub poolname)
Code:
zpool history |more
 

BanditBBS

Dabbler
Joined
Mar 21, 2014
Messages
18
Just to update everyone........

It ended up being the hot swap bays in the case I was using, so add that to the list of items to check....I ordered a new case and bam, crc errors have stopped.

I do appreciate the help, it got me looking in the right direction(away from the drives themselves).
 
Status
Not open for further replies.
Top