CAM Status, Medium Error, Unretryable?

Status
Not open for further replies.

BlueMagician

Explorer
Joined
Apr 24, 2015
Messages
56
Dear all,

I wonder if anyone could shed some light on an error I've received from my FreeNAS box overnight (during a scheduled Scrub):

Code:
 (da5:mps0:0:6:0): READ(16). CDB: 88 00 00 00 00 01 0e a0 06 40 00 00 01 00 00 00 length 131072 SMID 152 terminated ioc 804b scsi 0 state 0 xfer 0
> (da5:mps0:0:6:0): READ(16). CDB: 88 00 00 00 00 01 0e a0 06 40 00 00 01 00 00 00
> (da5:mps0:0:6:0): CAM status: CCB request completed with an error
> (da5:mps0:0:6:0): Retrying command
> (da5:mps0:0:6:0): READ(16). CDB: 88 00 00 00 00 01 0e a0 05 40 00 00 01 00 00 00
> (da5:mps0:0:6:0): CAM status: SCSI Status Error
> (da5:mps0:0:6:0): SCSI status: Check Condition
> (da5:mps0:0:6:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da5:mps0:0:6:0): Info: 0x10ea00540
> (da5:mps0:0:6:0): Error 5, Unretryable error


DA5 is one of the six WD Red drives that make up my RAIDz2 vDEV.

I queried ZPOOL STATUS that morning whilst the scrub was still in progress. It showed that 128KB of data had been fixed, and one of the drives in the status list had '(REPAIRED)' or such, tagged onto it.

I ran an Extended (long) SMART test on DA5 the following evening, and the test completed with no errors.

Most interestingly, the drive is not showing any pending nor any reallocated sectors - so does this mean that the read error was more likely a controller issue than a sector/disc problem?


The drive in question has about 3 months warranty left - so I'm wondering if I need to poke it harder to see if there's an underlying issue, or whether I should just forget this ever happened, and move on.


Sanity check much appreciated! Thank you in advance,

S.
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Did this error only occur once, or did it appear many times? I've seen something similar with one of my drives which turned out to be a bad cable.
 

BlueMagician

Explorer
Joined
Apr 24, 2015
Messages
56
Did this error only occur once, or did it appear many times? I've seen something similar with one of my drives which turned out to be a bad cable.
It has only happened once in recent times - although I couldn't "hand on heart" say that it hasn't cropped up once before.

My PERC card was new when I bought it, as were the ludicrously expensive LSI branded 8087 cables. I know that's not a guarantee to getting fault-free operation, but I do tend to over-buy on things like that, to try and avoid stuff like this happening!

I'd take the chassis apart and give all the connectors a good bit of compressed-air and a re-seat -- but I fear the powering down of all the drives may be more dangerous to the pool than this error.

Not sure. Thoughts appreciated though, thank you.

S.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410

BlueMagician

Explorer
Joined
Apr 24, 2015
Messages
56
1. What model?
2. Is it IT-mode flashed?
As per my system spec in my signature, it's a Dell PERC-H200, crossflashed to IT-mode P20 drivers.
The system has been running for about 2 years, and I've changed nothing recently except for updating from 9.10_U2 to 9.10_U3 about 2 weeks ago.

Thank you,
S.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
AS @Jailer pointed out, this is the typical error message for cable problems.
Does your smart output register anything >0 on ID#199 ?
If it does or does not, I'd poke around and re-seat the SATA cables. Then wait to see if the error returns.
 

BlueMagician

Explorer
Joined
Apr 24, 2015
Messages
56
Does your smart output register anything >0 on ID#199 ?
No, all the usual suspect SMART counters are zero - hence in my first post making the assumption that this was more likely controller or bus related.

I'm not sure whether to be happy that my drive is probably OK, or sad that the only way to possibly find out for sure is to risk a power-down to fiddle with potentially-dodgy/potentially-fine premium cables. Hmm.

S.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I'd keep on eye on it. Could've been a once-off. If it happens again, then investigate further.
 

BlueMagician

Explorer
Joined
Apr 24, 2015
Messages
56
What is it about powering down that makes you so nervous?
Only the fact that the discs have been powered up for a year without interruption, and then another year before that. Just a bit of paranoia really.

EDIT: After a quick email search - my suspicions at having seen this error before were confirmed - albeit more recently than I first thought...

It seems that a near identical error was flagged up a few weeks ago during a previous scrub.

Line for line the error from last month is the same as this latest one, except for in the first line the SMID was 973 in one, and 152 in the other. EVERY other detail of the error is the same.

That's two emails, a few weeks apart, both produced whilst performing a scrub - showing a single unretryable READ error from the same channel/disc, with identical error detail including all the long CDB numbers - except for that SMID.

I don't know the significance of the SMID number, but if this were truly a randomly failing part/connector/controller then I would expect a little more randomness to the failure?

S.
 
Last edited:

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
It seems to me that if powering down is going to hurt the system, that's just a problem waiting to happen anyway. The only reason to delay would be to refresh your backup first.
 
Status
Not open for further replies.
Top