Scrub reveals repairs two scrubs in a row

Status
Not open for further replies.

trsupernothing

Explorer
Joined
Sep 5, 2013
Messages
65
Hey guys, thanks in advance for the help. I have a rather large system (my first freenas build). I have 129 usable terabytes of storage. I am running ecc memory and have several r2 vdevs. Everything has been running smooth for months but recently my scrubs have had to repair some things. I have scrubs set up every 15 days. The last scrub 16 or so days ago I noticed I had to repair I believe less than 200k on 2 drives in my most recently added r2. I had also received an email alert

Code:
>	   (da34:mps0:0:45:0): READ(10). CDB: 28 00 08 27 e2 10 00 00 e0 00 length 114688 SMID 256 terminated ioc 804b scsi 0 state 0 xfer 0
> (da34:mps0:0:45:0): READ(10). CDB: 28 00 08 27 e2 10 00 00 e0 00
> (da34:mps0:0:45:0): CAM status: CCB request completed with an error
> (da34:mps0:0:45:0): Retrying command
> (da34:mps0:0:45:0): READ(10). CDB: 28 00 08 27 e1 30 00 00 e0 00
> (da34:mps0:0:45:0): CAM status: SCSI Status Error
> (da34:mps0:0:45:0): SCSI status: Check Condition
> (da34:mps0:0:45:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da34:mps0:0:45:0): Info: 0x827e130
> (da34:mps0:0:45:0): Error 5, Unretryable error

-- End of security output --


I ran a long smart test on this drive and saw nothing concerning (to my best judgement).

Now we're at my latest scrub and it has 432k and it shows repairing 3 drives. This has me concerned. I can provide further information as requested. Any tips on how to proceed to resolve this so I no longer have to repair bytes every time I scrub would be great.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Good post but you left out all the important info. What freenas version and hardware do you have? You have subverting wrong with your disk cables or controller. Include your controller firmware version also.

Sent from my Nexus 5X using Tapatalk
 

trsupernothing

Explorer
Joined
Sep 5, 2013
Messages
65
Good post but you left out all the important info. What freenas version and hardware do you have? You have subverting wrong with your disk cables or controller. Include your controller firmware version also.

Sent from my Nexus 5X using Tapatalk

I knew I was forgetting the obvious...


FreeNAS-9.10-STABLE-201605240427

Intel Xeon E5-1620 on a supermicro board (
SuperMicro X10SRL-F)
SAS9211-8I 8PORT Int 6GB Sata+sas Pcie 2.0 card
846E16-R1200B cases

I need to spend some time looking up how to tell you the firmware of my card (don't know if there is anything in the freenas gui or not) otherwise I know I can go into the controller menu on startup. When I first built the system freenas was telling me to upgrade to the latest lsi firmware... which I did at the time (earlier this year) And freenas has not complained since.

Is this logic sound?.... I have 32 disks in my system currently. I believe both scrubs have shown errors on the same vdev twice in a row now. Since I haven't seen errors on any of the other 4 vdevs (error vdev is the 5th vdev) would it be safe to assume that the issue would either be in that particular part of the backplane or either specifically those drives (I had them shipped all at once)

My logic would be that if it were a sata controller or cabling to the backplanes then I'd see errors on other drives etc.

Thanks again for the assistance!
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
dmesg | grep mps
Or just sas2flash -list

What chassie do you have and what backplane?

Sent from my Nexus 5X using Tapatalk
 

trsupernothing

Explorer
Joined
Sep 5, 2013
Messages
65
Or just sas2flash -list

What chassie do you have and what backplane?

Sent from my Nexus 5X using Tapatalk

I have
846E16-R1200B with BPN-SAS2-846EL1

This morning (just as the last scrub 16 days ago, I received this email. 16 days ago it was da32 and da34

Code:
Apollo.local kernel log messages:
>	   (da33:mps0:0:44:0): READ(10). CDB: 28 00 2c 3e 33 00 00 00 e0 00 length 114688 SMID 491 terminated ioc 804b scsi 0 state 0 xfer 0
> (da33:mps0:0:44:0): READ(10). CDB: 28 00 2c 3e 33 00 00 00 e0 00
> (da33:mps0:0:44:0): CAM status: CCB request completed with an error
> (da33:mps0:0:44:0): Retrying command
> (da33:mps0:0:44:0): READ(10). CDB: 28 00 2c 3e 32 20 00 00 e0 00
> (da33:mps0:0:44:0): CAM status: SCSI Status Error
> (da33:mps0:0:44:0): SCSI status: Check Condition
> (da33:mps0:0:44:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da33:mps0:0:44:0): Info: 0x2c3e3220
> (da33:mps0:0:44:0): Error 5, Unretryable error
>	   (da33:mps0:0:44:0): READ(10). CDB: 28 00 34 30 2b b8 00 00 e0 00 length 114688 SMID 253 terminated ioc 804b scsi 0 state 0 xfer 0
> (da33:mps0:0:44:0): READ(10). CDB: 28 00 34 30 2b b8 00 00 e0 00
> (da33:mps0:0:44:0): CAM status: CCB request completed with an error
> (da33:mps0:0:44:0): Retrying command
> (da33:mps0:0:44:0): READ(10). CDB: 28 00 34 30 2a d8 00 00 e0 00
> (da33:mps0:0:44:0): CAM status: SCSI Status Error
> (da33:mps0:0:44:0): SCSI status: Check Condition
> (da33:mps0:0:44:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da33:mps0:0:44:0): Info: 0x34302ad8
> (da33:mps0:0:44:0): Error 5, Unretryable error
>	   (da32:mps0:0:43:0): READ(10). CDB: 28 00 80 2f 11 98 00 00 e0 00 length 114688 SMID 626 terminated ioc 804b scsi 0 state 0 xfer 0
> (da32:mps0:0:43:0): READ(10). CDB: 28 00 80 2f 11 98 00 00 e0 00
> (da32:mps0:0:43:0): CAM status: CCB request completed with an error
> (da32:mps0:0:43:0): Retrying command
> (da32:mps0:0:43:0): READ(10). CDB: 28 00 80 2f 10 b8 00 00 e0 00
> (da32:mps0:0:43:0): CAM status: SCSI Status Error
> (da32:mps0:0:43:0): SCSI status: Check Condition
> (da32:mps0:0:43:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da32:mps0:0:43:0): Info: 0x802f10b8
> (da32:mps0:0:43:0): Error 5, Unretryable error
>	   (da35:mps0:0:46:0): READ(10). CDB: 28 00 a6 04 b0 e0 00 00 e0 00 length 114688 SMID 741 terminated ioc 804b scsi 0 state 0 xfer 0
> (da35:mps0:0:46:0): READ(10). CDB: 28 00 a6 04 b0 e0 00 00 e0 00
> (da35:mps0:0:46:0): CAM status: CCB request completed with an error
> (da35:mps0:0:46:0): Retrying command
> (da35:mps0:0:46:0): READ(10). CDB: 28 00 a6 04 b0 00 00 00 e0 00
> (da35:mps0:0:46:0): CAM status: SCSI Status Error
> (da35:mps0:0:46:0): SCSI status: Check Condition
> (da35:mps0:0:46:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da35:mps0:0:46:0): Info: 0xa604b000
> (da35:mps0:0:46:0): Error 5, Unretryable error
>	   (da32:mps0:0:43:0): READ(10). CDB: 28 00 e8 37 0f f8 00 00 e0 00 length 114688 SMID 488 terminated ioc 804b scsi 0 state 0 xfer 0
> (da32:mps0:0:43:0): READ(10). CDB: 28 00 e8 37 0f f8 00 00 e0 00
> (da32:mps0:0:43:0): CAM status: CCB request completed with an error
> (da32:mps0:0:43:0): Retrying command
> (da32:mps0:0:43:0): READ(10). CDB: 28 00 e8 37 0f 18 00 00 e0 00
> (da32:mps0:0:43:0): CAM status: SCSI Status Error
> (da32:mps0:0:43:0): SCSI status: Check Condition
> (da32:mps0:0:43:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da32:mps0:0:43:0): Info: 0xe8370f18
> (da32:mps0:0:43:0): Error 5, Unretryable error

-- End of security output --


Now it's showing 32,33, and 35..... All in the same vdev

What is everyone's thoughts on how likely I've experienced any data loss and or corruption up to this point?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
It's just having a hard time reading the data over the data cable. No writing issues yet so no corruption. I could try a new SAS cable and make sure your drives are seated nice and tight. How do you have 30 drives in that chassie it only holds 24?

Sent from my Nexus 5X using Tapatalk
 

trsupernothing

Explorer
Joined
Sep 5, 2013
Messages
65
It's just having a hard time reading the data over the data cable. No writing issues yet so no corruption. I could try a new SAS cable and make sure your drives are seated nice and tight. How do you have 30 drives in that chassie it only holds 24?

Sent from my Nexus 5X using Tapatalk
My apologies, I have 2 of those chasis.. I have an external mini sas cable running from the second port on the lsi card to the second chasis. I actually use a few cables... 1 cable from the lsi card to a external mini sas pci bracket, the external mini sas cable, then another pci bracket on the second chasis, then an internal mini sas cable to the backplane. 3 cables in total.

If the errors are just on read, what is the scrub correcting? The scrub finished and said no known data errors so thats semi good i'd hope. Thanks again for the help!
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Drives which are falling out of the array for a few seconds?

How are your HDs powered
 
Last edited:

trsupernothing

Explorer
Joined
Sep 5, 2013
Messages
65
In lieu of the advice given thus far I have reseated all of the drives in this particular raidz as well as bypassed the the external mini sas pci brackets and external cable and ran a sas cable directly from the lsi card on the primary chasis to the secondary backplane. This removes the additional cables as a point of failure. I am now currently running a new scrub of the pool. This troubleshooting may take some time as my scrubs tend to take 60 hours and the repairs have always been closer to the end of the scrub.
 

trsupernothing

Explorer
Joined
Sep 5, 2013
Messages
65
As I said previously I had removed the additional paths for the sas cables and went directly from port 2 on the lsi card to the secondary backplane and proceeded to run a new scrub. This new scrub has again resulted in repairing on 2 drives. I also received an identical email as above (read issues) on drive da33 again. I have ordered a new mini sas cable which is arriving tomorrow. I will replace the cable and run a scrub again. If replacing the sas cable still results in scrub repairs then as far as I can tell there would only be the following possibilities left....

1. port issue on the lsi card
2. issue with the backplane
3. identical read issues on up to 3 drives with passing smart tests
 

trsupernothing

Explorer
Joined
Sep 5, 2013
Messages
65
I have replaced the sas cable and am scrubbing again. Scrub will finish tomorrow. If errors continue then the issue must be with either the backplane or hard drives.
 

trsupernothing

Explorer
Joined
Sep 5, 2013
Messages
65
It's all coming apart. Slowly but surely just falling apart.

Code:
> (da32:mps0:0:43:0): READ(10). CDB: 28 00 b8 d3 02 88 00 00 08 00
> (da32:mps0:0:43:0): CAM status: SCSI Status Error
> (da32:mps0:0:43:0): SCSI status: Check Condition
> (da32:mps0:0:43:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da32:mps0:0:43:0): Retrying command (per sense data)
>	   (da32:mps0:0:43:0): READ(10). CDB: 28 00 b8 d3 02 88 00 00 08 00 length 4096 SMID 861 terminated ioc 804b scsi 0 state c xfer 0
> (da32:mps0:0:43:0): READ(10). CDB: 28 00 b8 d3 02 88 00 00 08 00
> (da32:mps0:0:43:0): CAM status: CCB request completed with an error
> (da32:mps0:0:43:0): Retrying command
> (da32:mps0:0:43:0): READ(10). CDB: 28 00 b8 d3 02 88 00 00 08 00
> (da32:mps0:0:43:0): CAM status: SCSI Status Error
> (da32:mps0:0:43:0): SCSI status: Check Condition
> (da32:mps0:0:43:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da32:mps0:0:43:0): Retrying command (per sense data)
>	   (da37:mps0:0:48:0): WRITE(10). CDB: 2a 00 00 40 03 98 00 00 08 00 length 4096 SMID 906 terminated ioc 804b scsi 0 state c xfer 0
>	   (da37:mps0:0:48:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f1 98 00 00 00 08 00 00 length 4096 SMID 161 terminated ioc 804b (da37:mps0:0:48:0): WRITE(10). CDB: 2a 00 00 40 03 98 00 00 08 00
> scsi 0 state c xfer 0
> (da37:mps0:0:48:0): CAM status: CCB request completed with an error
>	   (da37:mps0:0:48:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f3 98 00 00 00 08 00 00 length 4096 SMID 223 terminated ioc 804b (da37:scsi 0 state c xfer 0
> mps0:0:48:0): Retrying command
> (da37:mps0:0:48:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f1 98 00 00 00 08 00 00
> (da37:mps0:0:48:0): CAM status: CCB request completed with an error
> (da37:mps0:0:48:0): Retrying command
> (da37:mps0:0:48:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f3 98 00 00 00 08 00 00
> (da37:mps0:0:48:0): CAM status: CCB request completed with an error
> (da37:mps0:0:48:0): Retrying command
> (da37:mps0:0:48:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f3 98 00 00 00 08 00 00
> (da37:mps0:0:48:0): CAM status: SCSI Status Error
> (da37:mps0:0:48:0): SCSI status: Check Condition
> (da37:mps0:0:48:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da37:mps0:0:48:0): Retrying command (per sense data)
>	   (da34:mps0:0:45:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 35 08 00 00 00 08 00 00 length 4096 SMID 426 terminated ioc 804b scsi 0 state c xfer 0
> (da34:mps0:0:45:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 35 08 00 00 00 08 00 00
> (da34:mps0:0:45:0): CAM status: CCB request completed with an error
> (da34:mps0:0:45:0): Retrying command
> (da34:mps0:0:45:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 35 08 00 00 00 08 00 00
> (da34:mps0:0:45:0): CAM status: SCSI Status Error
> (da34:mps0:0:45:0): SCSI status: Check Condition
> (da34:mps0:0:45:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da34:mps0:0:45:0): Retrying command (per sense data)
> (da25:mps0:0:34:0): READ(16). CDB: 88 00 00 00 00 01 50 45 eb 48 00 00 00 08 00 00
> (da25:mps0:0:34:0): CAM status: SCSI Status Error
> (da25:mps0:0:34:0): SCSI status: Check Condition
> (da25:mps0:0:34:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da25:mps0:0:34:0): Retrying command (per sense data)
>	   (da30:mps0:0:41:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 91 60 00 00 00 08 00 00 length 4096 SMID 124 terminated ioc 804b scsi 0 state c xfer 4096
> (da30:mps0:0:41:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 91 60 00 00 00 08 00 00
> (da30:mps0:0:41:0): CAM status: CCB request completed with an error
> (da30:mps0:0:41:0): Retrying command
> (da30:mps0:0:41:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 91 60 00 00 00 08 00 00
> (da30:mps0:0:41:0): CAM status: SCSI Status Error
> (da30:mps0:0:41:0): SCSI status: Check Condition
> (da30:mps0:0:41:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da30:mps0:0:41:0): Retrying command (per sense data)
>	   (da28:mps0:0:37:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 91 a0 00 00 00 38 00 00 length 28672 SMID 689 terminated ioc 804b scsi 0 state c xfer 28672
> (da28:mps0:0:37:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 91 a0 00 00 00 38 00 00
> (da28:mps0:0:37:0): CAM status: CCB request completed with an error
> (da28:mps0:0:37:0): Retrying command
> (da28:mps0:0:37:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 91 a0 00 00 00 38 00 00
> (da28:mps0:0:37:0): CAM status: SCSI Status Error
> (da28:mps0:0:37:0): SCSI status: Check Condition
> (da28:mps0:0:37:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da28:mps0:0:37:0): Retrying command (per sense data)
>	   (da31:mps0:0:42:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 36 60 00 00 00 30 00 00 length 24576 SMID 497 terminated ioc 804b scsi 0 state c xfer 24576
> (da31:mps0:0:42:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 36 60 00 00 00 30 00 00
> (da31:mps0:0:42:0): CAM status: CCB request completed with an error
> (da31:mps0:0:42:0): Retrying command
> (da31:mps0:0:42:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 36 60 00 00 00 30 00 00
> (da31:mps0:0:42:0): CAM status: SCSI Status Error
> (da31:mps0:0:42:0): SCSI status: Check Condition
> (da31:mps0:0:42:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da31:mps0:0:42:0): Retrying command (per sense data)
>	   (da26:mps0:0:35:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 92 10 00 00 00 08 00 00 length 4096 SMID 624 terminated ioc 804b scsi 0 state c xfer 0
>	   (da25:mps0:0:34:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 92 10 00 00 00 58 00 00 length 45056 SMID 211 terminated ioc 804b(da26:mps0:0:35:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 92 10 00 00 00 08 00 00
>  scsi 0 state c xfer 45056
> (da26:mps0:0:35:0): CAM status: CCB request completed with an error
> (da25:mps0:0:34:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 92 10 00 00 00 58 00 00
> (da26:(da25:mps0:0:34:0): CAM status: CCB request completed with an error
> mps0:0:(da25:35:mps0:0:0): 34:Retrying command
> 0): Retrying command
> (da25:mps0:0:34:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 92 10 00 00 00 58 00 00
> (da26:mps0:0:35:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 92 10 00 00 00 08 00 00
> (da25:mps0:0:34:0): CAM status: SCSI Status Error
> (da26:mps0:0:35:0): CAM status: SCSI Status Error
> (da25:mps0:0:34:0): SCSI status: Check Condition
> (da26:mps0:0:35:0): SCSI status: Check Condition
> (da25:mps0:0:34:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da26:mps0:0:35:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da25:(da26:mps0:0:mps0:0:34:35:0): 0): Retrying command (per sense data)
> Retrying command (per sense data)
> (da32:mps0:0:43:0): READ(10). CDB: 28 00 b8 d3 02 88 00 00 08 00
> (da32:mps0:0:43:0): CAM status: SCSI Status Error
> (da32:mps0:0:43:0): SCSI status: Check Condition
> (da32:mps0:0:43:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da32:mps0:0:43:0): Retrying command (per sense data)
> (da32:mps0:0:43:0): READ(10). CDB: 28 00 b8 d3 02 88 00 00 08 00
> (da32:mps0:0:43:0): CAM status: SCSI Status Error
> (da32:mps0:0:43:0): SCSI status: Check Condition
> (da32:mps0:0:43:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da32:mps0:0:43:0): Retrying command (per sense data)
>	   (da27:mps0:0:36:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 92 a8 00 00 00 28 00 00 length 20480 SMID 501 terminated ioc 804b scsi 0 state c xfer 0
> (da27:mps0:0:36:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 92 a8 00 00 00 28 00 00
> (da27:mps0:0:36:0): CAM status: CCB request completed with an error
> (da27:mps0:0:36:0): Retrying command
> (da27:mps0:0:36:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 92 a8 00 00 00 28 00 00
> (da27:mps0:0:36:0): CAM status: SCSI Status Error
> (da27:mps0:0:36:0): SCSI status: Check Condition
> (da27:mps0:0:36:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da27:mps0:0:36:0): Retrying command (per sense data)
>	   (da31:mps0:0:42:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 37 70 00 00 00 08 00 00 length 4096 SMID 778 terminated ioc 804b scsi 0 state c xfer 0
> (da31:mps0:0:42:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 37 70 00 00 00 08 00 00
> (da31:mps0:0:42:0): CAM status: CCB request completed with an error
> (da31:mps0:0:42:0): Retrying command
> (da31:mps0:0:42:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 37 70 00 00 00 08 00 00
> (da31:mps0:0:42:0): CAM status: SCSI Status Error
> (da31:mps0:0:42:0): SCSI status: Check Condition
> (da31:mps0:0:42:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da31:mps0:0:42:0): Retrying command (per sense data)
>	   (da27:mps0:0:36:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 93 08 00 00 00 28 00 00 length 20480 SMID 1003 terminated ioc 804b scsi 0 state c xfer 0
> (da27:mps0:0:36:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 93 08 00 00 00 28 00 00
> (da27:mps0:0:36:0): CAM status: CCB request completed with an error
> (da27:mps0:0:36:0): Retrying command
> (da27:mps0:0:36:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 93 08 00 00 00 28 00 00
> (da27:mps0:0:36:0): CAM status: SCSI Status Error
> (da27:mps0:0:36:0): SCSI status: Check Condition
> (da27:mps0:0:36:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da27:mps0:0:36:0): Retrying command (per sense data)
>	   (da29:mps0:0:38:0): WRITE(16). CDB: 8a 00 00 00 00 01 d1 c0 bd d0 00 00 00 08 00 00 length 4096 SMID 859 terminated ioc 804b scsi 0 state c xfer 0
>	   (da29:mps0:0:38:0): WRITE(16). CDB: 8a 00 00 00 00 01 d1 c0 bb d0 00 00 00 08 00 00 length 4096 SMID 691 terminated ioc 804b (da29:mps0:0:38:0): WRITE(16). CDB: 8a 00 00 00 00 01 d1 c0 bd d0 00 00 00 08 00 00
> scsi 0 state c xfer 0
> (da29:mps0:0:38:0): CAM status: CCB request completed with an error
>	   (da29:mps0:0:38:0): WRITE(10). CDB: 2a 00 00 40 03 d0 00 00 08 00 length 4096 SMID 803 terminated ioc 804b scsi 0 state c xfe(da29:r 4096
> mps0:0:38:0): Retrying command
> (da29:mps0:0:38:0): WRITE(16). CDB: 8a 00 00 00 00 01 d1 c0 bb d0 00 00 00 08 00 00
> (da29:mps0:0:38:0): CAM status: CCB request completed with an error
> (da29:mps0:0:38:0): Retrying command
> (da29:mps0:0:38:0): WRITE(10). CDB: 2a 00 00 40 03 d0 00 00 08 00
> (da29:mps0:0:38:0): CAM status: CCB request completed with an error
> (da29:mps0:0:38:0): Retrying command
> (da29:mps0:0:38:0): WRITE(16). CDB: 8a 00 00 00 00 01 d1 c0 bd d0 00 00 00 08 00 00
> (da29:mps0:0:38:0): CAM status: SCSI Status Error
> (da29:mps0:0:38:0): SCSI status: Check Condition
> (da29:mps0:0:38:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da29:mps0:0:38:0): Retrying command (per sense data)
>	   (da33:mps0:0:44:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 39 50 00 00 00 08 00 00 length 4096 SMID 763 terminated ioc 804b scsi 0 state c xfer 4096
> (da33:mps0:0:44:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 39 50 00 00 00 08 00 00
> (da33:mps0:0:44:0): CAM status: CCB request completed with an error
> (da33:mps0:0:44:0): Retrying command
> (da33:mps0:0:44:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 39 50 00 00 00 08 00 00
> (da33:mps0:0:44:0): CAM status: SCSI Status Error
> (da33:mps0:0:44:0): SCSI status: Check Condition
> (da33:mps0:0:44:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da33:mps0:0:44:0): Retrying command (per sense data)
>	   (da34:mps0:0:45:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 39 28 00 00 00 10 00 00 length 8192 SMID 507 terminated ioc 804b scsi 0 state c xfer 0
> (da34:mps0:0:45:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 39 28 00 00 00 10 00 00
> (da34:mps0:0:45:0): CAM status: CCB request completed with an error
> (da34:mps0:0:45:0): Retrying command
> (da34:mps0:0:45:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 39 28 00 00 00 10 00 00
> (da34:mps0:0:45:0): CAM status: SCSI Status Error
> (da34:mps0:0:45:0): SCSI status: Check Condition
> (da34:mps0:0:45:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da34:mps0:0:45:0): Retrying command (per sense data)
>	   (da36:mps0:0:47:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 3a 40 00 00 00 18 00 00 length 12288 SMID 653 terminated ioc 804b scsi 0 state c xfer 0
> (da36:mps0:0:47:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 3a 40 00 00 00 18 00 00
> (da36:mps0:0:47:0): CAM status: CCB request completed with an error
> (da36:mps0:0:47:0): Retrying command
> (da36:mps0:0:47:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 3a 40 00 00 00 18 00 00
> (da36:mps0:0:47:0): CAM status: SCSI Status Error
> (da36:mps0:0:47:0): SCSI status: Check Condition
> (da36:mps0:0:47:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da36:mps0:0:47:0): Retrying command (per sense data)
>	   (da33:mps0:0:44:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f3 f0 00 00 00 08 00 00 length 4096 SMID 143 terminated ioc 804b scsi 0 state c xfer 0
>	   (da33:mps0:0:44:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f1 f0 00 00 00 08 00 00 length 4096 SMID 291 terminated ioc 804b (da33:mps0:0:44:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f3 f0 00 00 00 08 00 00
> scsi 0 state c xfer 0
> (da33:mps0:0:44:0): CAM status: CCB request completed with an error
>	   (da33:mps0:0:44:0): WRITE(10). CDB: 2a 00 00 40 03 f0 00 00 08 00 length 4096 SMID 946 terminated ioc 804b scsi 0 state c xfe(da33:r 0
> mps0:0:	   (da33:mps0:0:44:0): WRITE(10). CDB: 2a 00 00 40 01 f0 00 00 08 00 length 4096 SMID 479 terminated ioc 804b scsi 0 state c xfe44:r 0
> 0): Retrying command
> (da33:mps0:0:44:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f1 f0 00 00 00 08 00 00
> (da33:mps0:0:44:0): CAM status: CCB request completed with an error
> (da33:mps0:0:44:0): Retrying command
> (da33:mps0:0:44:0): WRITE(10). CDB: 2a 00 00 40 03 f0 00 00 08 00
> (da33:mps0:0:44:0): CAM status: CCB request completed with an error
> (da33:mps0:0:44:0): Retrying command
> (da33:mps0:0:44:0): WRITE(10). CDB: 2a 00 00 40 01 f0 00 00 08 00
> (da33:mps0:0:44:0): CAM status: CCB request completed with an error
> (da33:mps0:0:44:0): Retrying command
> (da33:mps0:0:44:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f3 f0 00 00 00 08 00 00
> (da33:mps0:0:44:0): CAM status: SCSI Status Error
> (da33:mps0:0:44:0): SCSI status: Check Condition
> (da33:mps0:0:44:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da33:mps0:0:44:0): Retrying command (per sense data)
>	   (da29:mps0:0:38:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 98 70 00 00 00 40 00 00 length 32768 SMID 497 terminated ioc 804b scsi 0 state c xfer 0
> (da29:mps0:0:38:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 98 70 00 00 00 40 00 00
> (da29:mps0:0:38:0): CAM status: CCB request completed with an error
> (da29:mps0:0:38:0): Retrying command
> (da29:mps0:0:38:0): WRITE(16). CDB: 8a 00 00 00 00 01 98 4a 98 70 00 00 00 40 00 00
> (da29:mps0:0:38:0): CAM status: SCSI Status Error
> (da29:mps0:0:38:0): SCSI status: Check Condition
> (da29:mps0:0:38:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da29:mps0:0:38:0): Retrying command (per sense data)
> (da37:mps0:0:48:0): READ(10). CDB: 28 00 b8 d3 02 80 00 00 08 00
> (da37:mps0:0:48:0): CAM status: SCSI Status Error
> (da37:mps0:0:48:0): SCSI status: Check Condition
> (da37:mps0:0:48:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da37:mps0:0:48:0): Retrying command (per sense data)
>	   (da32:mps0:0:43:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 3e b8 00 00 00 08 00 00 length 4096 SMID 300 terminated ioc 804b scsi 0 state c xfer 0
> (da32:mps0:0:43:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 3e b8 00 00 00 08 00 00
> (da32:mps0:0:43:0): CAM status: CCB request completed with an error
> (da32:mps0:0:43:0): Retrying command
> (da32:mps0:0:43:0): WRITE(16). CDB: 8a 00 00 00 00 01 28 4a 3e b8 00 00 00 08 00 00
> (da32:mps0:0:43:0): CAM status: SCSI Status Error
> (da32:mps0:0:43:0): SCSI status: Check Condition
> (da32:mps0:0:43:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da32:mps0:0:43:0): Retrying command (per sense data)
> ses1: da31,pass33: Element descriptor: 'Slot 03'
> ses1: da31,pass33: SAS Device Slot Element: 1 Phys at Slot 2
> ses1:  phy 0: SATA device
> ses1:  phy 0: parent 5003048001da417f addr 5003048001da414e
> ses1: da32,pass34: Element descriptor: 'Slot 04'
> ses1: da32,pass34: SAS Device Slot Element: 1 Phys at Slot 3
> ses1:  phy 0: SATA device
> ses1:  phy 0: parent 5003048001da417f addr 5003048001da414f
> ses1: da24,pass25: Element descriptor: 'Slot 05'
> ses1: da24,pass25: SAS Device Slot Element: 1 Phys at Slot 4
> ses1:  phy 0: SATA device
> ses1:  phy 0: parent 5003048001da417f addr 5003048001da4150
> ses1: da25,pass26: Element descriptor: 'Slot 06'
> ses1: da25,pass26: SAS Device Slot Element: 1 Phys at Slot 5
> ses1:  phy 0: SATA device
> ses1:  phy 0: parent 5003048001da417f addr 5003048001da4151
> ses1: da33,pass35: Element descriptor: 'Slot 09'
> ses1: da33,pass35: SAS Device Slot Element: 1 Phys at Slot 8
> ses1:  phy 0: SATA device
> ses1:  phy 0: parent 5003048001da417f addr 5003048001da4154
> ses1: da34,pass36: Element descriptor: 'Slot 10'
> ses1: da34,pass36: SAS Device Slot Element: 1 Phys at Slot 9
> ses1:  phy 0: SATA device
> ses1:  phy 0: parent 5003048001da417f addr 5003048001da4155
> ses1: da26,pass27: Element descriptor: 'Slot 11'
> ses1: da26,pass27: SAS Device Slot Element: 1 Phys at Slot 10
> ses1:  phy 0: SATA device
> ses1:  phy 0: parent 5003048001da417f addr 5003048001da4156
> ses1: da27,pass28: Element descriptor: 'Slot 12'
> ses1: da27,pass28: SAS Device Slot Element: 1 Phys at Slot 11
> ses1:  phy 0: SATA device
> ses1:  phy 0: parent 5003048001da417f addr 5003048001da4157
> ses1: da35,pass37: Element descriptor: 'Slot 15'
> ses1: da35,pass37: SAS Device Slot Element: 1 Phys at Slot 14
> ses1:  phy 0: SATA device
> ses1:  phy 0: parent 5003048001da417f addr 5003048001da415a
> ses1: da36,pass38: Element descriptor: 'Slot 16'
> ses1: da36,pass38: SAS Device Slot Element: 1 Phys at Slot 15
> ses1:  phy 0: SATA device
> ses1:  phy 0: parent 5003048001da417f addr 5003048001da415b
> ses1: da39,pass41: Element descriptor: 'Slot 17'
> ses1: da39,pass41: SAS Device Slot Element: 1 Phys at Slot 16
> ses1:  phy 0: SATA device
> ses1:  phy 0: parent 5003048001da417f addr 5003048001da415c
> ses1: da28,pass29: Element descriptor: 'Slot 18'
> ses1: da28,pass29: SAS Device Slot Element: 1 Phys at Slot 17
> ses1:  phy 0: SATA device
> ses1:  phy 0: parent 5003048001da417f addr 5003048001da415d
> ses1: da37,pass39: Element descriptor: 'Slot 21'
> ses1: da37,pass39: SAS Device Slot Element: 1 Phys at Slot 20
> ses1:  phy 0: SATA device
> ses1:  phy 0: parent 5003048001da417f addr 5003048001da4160
> ses1: da38,pass40: Element descriptor: 'Slot 22'
> ses1: da38,pass40: SAS Device Slot Element: 1 Phys at Slot 21
> ses1:  phy 0: SATA device
> ses1:  phy 0: parent 5003048001da417f addr 5003048001da4161
> ses1: da30,pass32: Element descriptor: 'Slot 23'
> ses1: da30,pass32: SAS Device Slot Element: 1 Phys at Slot 22
> ses1:  phy 0: SATA device
> ses1:  phy 0: parent 5003048001da417f addr 5003048001da4162
> ses1: da29,pass30: Element descriptor: 'Slot 24'
> ses1: da29,pass30: SAS Device Slot Element: 1 Phys at Slot 23
> ses1:  phy 0: SATA device
> ses1:  phy 0: parent 5003048001da417f addr 5003048001da4163

-- End of security output --
 

trsupernothing

Explorer
Joined
Sep 5, 2013
Messages
65
the scrub should have ended early yesterday (testing the new sas cable) however once it reached 90% it began to slow down.... slow down to the point where it only moved 6 percent in 24 hours. I woke up this morning to a daily security run email that was so long and full of read errors that gmail actually had to truncate it. I have now swapped out the chasis with one i had earmarked for a backup server. Will run scrub again. At this point the only thing left would be the hba card.
 

trsupernothing

Explorer
Joined
Sep 5, 2013
Messages
65
I'm at a complete loss... I've tried everything. I have yet to this day had a clean scrub. Every 15 days like clockwork, regardless of my efforts, I receive emails with ccb read errors. I've changed sas cables, changed cases (which changes backplanes), changed lsi cards. All of the errors have remained in the very same vdev raid z2. no other vdevs are affected. These are western digital reds 6tbs. The vdev consists of 8 drives... I've replaced 7 of them over the course of the past several months, and yet i continue to receive random errors. It's usually 3 or so drives... never the same 3 drives.
 
Last edited:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Backplane?

Sent from my Nexus 5X using Tapatalk
 

trsupernothing

Explorer
Joined
Sep 5, 2013
Messages
65
Backplane?

Sent from my Nexus 5X using Tapatalk
yea buddy. i've tried 2 different cases and backplanes. i'm thinking about posting some smart results here of the drives reporting read errors. The read errors remain so inconsistent between the various 8 drives in the vdev. one day its these 3, then another 3, then 4 drives.
 

trsupernothing

Explorer
Joined
Sep 5, 2013
Messages
65
these are yesterdays ccb read error contestants
Code:
########## SMART status report for da35 drive (Western Digital Red: WD-WXxxxxx) ##########
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   197   197   021	Pre-fail  Always	   -	   9116
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   11
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   097   097   000	Old_age   Always	   -	   2402
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   11
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   9
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   32
194 Temperature_Celsius	 0x0022   108   102   000	Old_age   Always	   -	   44
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

No Errors Logged

Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
Short offline	   Completed without error	   00%	  2331		 -



########## SMART status report for da36 drive (Western Digital Red: WD-WXxxxxxxxxxxx) ##########
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   197   197   021	Pre-fail  Always	   -	   9141
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   11
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   098   098   000	Old_age   Always	   -	   1879
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   11
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   9
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   31
194 Temperature_Celsius	 0x0022   109   103   000	Old_age   Always	   -	   43
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

No Errors Logged

Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
Short offline	   Completed without error	   00%	  1808		 -



########## SMART status report for da38 drive (Western Digital Red: WD-Wxxxxxxxxxxxxx) ##########
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   100   253   021	Pre-fail  Always	   -	   0
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   5
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   099   099   000	Old_age   Always	   -	   730
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   5
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   3
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   16
194 Temperature_Celsius	 0x0022   112   099   000	Old_age   Always	   -	   40
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

No Errors Logged

Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
Short offline	   Completed without error	   00%	   659		 -
 
Status
Not open for further replies.
Top