Multiple small repairs during Scrubs

joel3452 · Dec 11, 2016

I have been running Freenas for around 2 months without any known issues. This morning during the scrub I got this set of errors in the email:

> (da2:mpr0:0:1:0): READ(10). CDB: 28 00 50 ae 34 30 00 00 e8 00 length 118784 SMID 712 terminated ioc 804b scsi 0 state 0 xfer 0
> (da2:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 de e3 e1 28 00 00 00 28 00 00 length 20480 SMID 929 terminated ioc 804b sc(da2:mpr0:0:1:0): READ(10). CDB: 28 00 50 ae 34 30 00 00 e8 00
> si 0 state 0 xfer 0
> (da2:mpr0:0:1:0): CAM status: CCB request completed with an error
> (da2:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 de e3 e1 58 00 00 00 28 00 00 length 20480 SMID 212 terminated ioc 804b sc(da2:si 0 state 0 xfer 0
> mpr0:0: (da2:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 de e4 3b 60 00 00 00 30 00 00 length 24576 SMID 773 terminated ioc 804b sc1:si 0 state 0 xfer 0
> 0): (da2:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 ea 08 31 e0 00 00 00 08 00 00 length 4096 SMID 991 terminated ioc 804b scsRetrying command
> i 0 state 0 xfer 0
> (da2:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 de e3 e1 28 00 00 00 28 00 00
> (da2:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 ea 08 32 18 00 00 00 08 00 00 length 4096 SMID 433 terminated ioc 804b scs(da2:mpr0:0:1:0): CAM status: CCB request completed with an error
> i 0 state 0 xfer 0
> (da2: (da2:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 ea 07 b3 20 00 00 00 08 00 00 length 4096 SMID 568 terminated ioc 804b scsmpr0:0:i 0 state 0 xfer 0
> 1:0): (da2:mpr0:0:1:0): READ(10). CDB: 28 00 e7 d3 42 c8 00 00 28 00 length 20480 SMID 753 terminated ioc 804b scsi 0 state 0 xfer Retrying command
> 0
> (da2:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 de e3 e1 58 00 00 00 28 00 00
> (da2:mpr0:0:1:0): CAM status: CCB request completed with an error
> (da2:mpr0:0:1:0): Retrying command
> (da2:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 de e4 3b 60 00 00 00 30 00 00
> (da2:mpr0:0:1:0): CAM status: CCB request completed with an error
> (da2:mpr0:0:1:0): Retrying command
> (da2:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 ea 08 31 e0 00 00 00 08 00 00
> (da2:mpr0:0:1:0): CAM status: CCB request completed with an error
> (da2:mpr0:0:1:0): Retrying command
> (da2:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 ea 08 32 18 00 00 00 08 00 00
> (da2:mpr0:0:1:0): CAM status: CCB request completed with an error
> (da2:mpr0:0:1:0): Retrying command
> (da2:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 ea 07 b3 20 00 00 00 08 00 00
> (da2:mpr0:0:1:0): CAM status: CCB request completed with an error
> (da2:mpr0:0:1:0): Retrying command
> (da2:mpr0:0:1:0): READ(10). CDB: 28 00 e7 d3 42 c8 00 00 28 00
> (da2:mpr0:0:1:0): CAM status: CCB request completed with an error
> (da2:mpr0:0:1:0): Retrying command
> (da2:mpr0:0:1:0): READ(10). CDB: 28 00 50 ae 33 50 00 00 e0 00
> (da2:mpr0:0:1:0): CAM status: SCSI Status Error
> (da2:mpr0:0:1:0): SCSI status: Check Condition
> (da2:mpr0:0:1:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da2:mpr0:0:1:0): Info: 0x50ae3350
> (da2:mpr0:0:1:0): Error 5, Unretryable error

My Hardware is:
Motherboard: Supermicro X11SSH-CTF ($418.90)
CPU: Intel Xeon E3-1230 V5 3.4GHz Quad-Core Processor ($271.66 @ Newegg)
CPU Cooler: Cooler Master Hyper 212 EVO 82.9 CFM Sleeve Bearing CPU Cooler ($24.99 @ Newegg)
Memory: Crucial 32GB Kit (2 x 16GB) DDR4-2133 ECC
2* Storage: Sandisk Extreme Pro 240GB 2.5" Solid State Drive ($119.99 @ Newegg)
8* Storage: Western Digital Red 6TB 3.5" 5400RPM Internal Hard Drive ($234.52 @ Newegg)
Case: Fractal Design Define R5 (Black) ATX Mid Tower Case ($119.98 @ Newegg)
Power Supply: SeaSonic 660W 80+ Platinum Certified Fully-Modular ATX Power Supply ($104.99 @ Newegg)
Freenas Version: FreeNAS-9.10.1-U4 (ec9a7d3)

The system is configured as ESX 6.0U2 with the Motherboard's LSI controller flashed to IT mode and passthrough to the Freenas VM. The Freenas VM is assigned 2 VCPU and 20 GB reserved memory.
My hunch says there is something wrong with hard drive DA2. I did do about 2 weeks of stress testing on the drives prior to the install and all drives showed perfect SMART data during that time. I don't know the easiest/best way to look at the current SMART data.

joel3452 · Dec 11, 2016

I logged into the shell and ran a zpool status:

Code:

[root@freenas ~]# zpool status																									
  pool: freenas-boot																												
state: ONLINE																													
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Dec 11 03:45:09 2016														
config:																															
																																	
		NAME		STATE	 READ WRITE CKSUM																					
		freenas-boot  ONLINE	   0	 0	 0																					
		  da0p2	 ONLINE	   0	 0	 0																					
																																	
errors: No known data errors																										
																																	
  pool: tank																														
state: ONLINE																													
  scan: scrub in progress since Sun Dec 11 00:00:02 2016																			
		20.7T scanned out of 27.6T at 529M/s, 3h48m to go																		 
		108K repaired, 74.92% done																								
config:																															
																																	
		NAME											STATE	 READ WRITE CKSUM												
		tank											ONLINE	   0	 0	 0												
		  raidz2-0									  ONLINE	   0	 0	 0												
			gptid/c375e1a1-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0												
			gptid/c42a5d2c-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0  (repairing)									
			gptid/c4d3f78e-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0												
			gptid/c576f47a-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0												
			gptid/c621429c-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0												
			gptid/c6c8e874-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0												
			gptid/c771fa0c-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0												
			gptid/c82043ca-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0												
																																	
errors: No known data errors																										
[root@freenas ~]#

Dice · Dec 11, 2016

joel3452 said:
I don't know the easiest/best way to look at the current SMART data.

SSH into the box and type
smartctl -a /dev/da2

joel3452 · Dec 11, 2016

I logged in via SSH and pulled the smart data:

Code:

[root@freenas] ~# smartctl -a /dev/da2
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Red
Device Model:	 WDC WD60EFRX-68L0BN1
Serial Number:	WD-WXC1HB4XS4YJ
LU WWN Device Id: 5 0014ee 2b83e4df0
Firmware Version: 82.00A82
User Capacity:	6,001,175,126,016 bytes [6.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5700 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Sun Dec 11 11:32:07 2016 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
										was never started.
										Auto Offline Data Collection: Disabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever
										been run.
Total time to complete Offline
data collection:				( 6524) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		( 719) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x303d) SCT Status supported.
										SCT Error Recovery Control supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   231   196   021	Pre-fail  Always	   -	   7425
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   30
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   099   099   000	Old_age   Always	   -	   984
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   27
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   25
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   18
194 Temperature_Celsius	 0x0022   120   115   000	Old_age   Always	   -	   32
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   100   253   000	Old_age   Offline	  -	   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%		 0		 -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I didn't see anything unusual. Although it looks like by default Freenas only schedules short tests, so I might need to schedule some full/long tests once the scrub is complete. Do any of you guys/gals see anything off in the smart results?

Dice · Dec 11, 2016

If you've rebooted the machine da2 maybe named something else now. The labels pointing to a specific drive do not necessarily survive reboots.
I'd recommend you check all drives using smartctl.

This however;

joel3452 said:
# 1 Short offline Completed without error 00% 0

Points that you've not successfully configure regular SMART controls.

There is a link in my signature.

Ericloewe · Dec 11, 2016

joel3452 said:
I might need to schedule some full/long tests once the scrub is complete.

Definitely.

joel3452 · Dec 11, 2016

How does this new schedule look? If I read the guide/gui correct this should do a short self test every day at 22:00 and a long test every sunday at 23:00

Ericloewe · Dec 11, 2016

joel3452 said:
How does this new schedule look? If I read the guide/gui correct this should do a short self test every day at 22:00 and a long test every sunday at 23:00
View attachment 14906

That's about twice as frequent as is typical, but it should be fine. Don't forget to schedule scrubs, too - and don't overlap them.

Dice · Dec 12, 2016

https://forums.freenas.org/index.php?threads/scrub-and-smart-testing-schedules.20108/

In case you didnt manage to find the intended link.

joel3452 · Dec 12, 2016

I did find the link and had it schedule a long/extended test last night, I have pasted the smartctl output from the drive. I also checked the output for the other 7 drives as well, just in case the labeling was different, but all 8 drives showed Completed without error and 0 values for the important smart data such as reallocated sectors, pending sectors, etc. So I can not find a good reason why any corrections were needed. Granted the scrub only "repaired" 108K of 21TB, but I imagine if everything is working correctly there should be 0 repairs needed or do I misunderstand scrubs?

Code:

[root@freenas] ~# smartctl -a /dev/da2
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Red
Device Model:	 WDC WD60EFRX-68L0BN1
Serial Number:	WD-WXC1HB4XS4YJ
LU WWN Device Id: 5 0014ee 2b83e4df0
Firmware Version: 82.00A82
User Capacity:	6,001,175,126,016 bytes [6.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5700 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Mon Dec 12 16:26:04 2016 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
										was never started.
										Auto Offline Data Collection: Disabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever
										been run.
Total time to complete Offline
data collection:				( 6524) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		( 719) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x303d) SCT Status supported.
										SCT Error Recovery Control supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   231   196   021	Pre-fail  Always	   -	   7425
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   30
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   099   099   000	Old_age   Always	   -	   1013
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   27
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   25
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   19
194 Temperature_Celsius	 0x0022   120   115   000	Old_age   Always	   -	   32
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline	Completed without error	   00%	  1009		 -
# 2  Short offline	   Completed without error	   00%	   995		 -
# 3  Short offline	   Completed without error	   00%		 0		 -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

joel3452 · Dec 20, 2016

So since last week I have been having each disk in the array doing a daily short test and a weekly long test. I also scheduled a weekly scrub. All of the disk tests short & extended have "Completed without error" and no drives have any pending sectors, seek errors, or offline-uncorrectable values. I am running a server class motherboard with ECC memory that went through days of Memtest86 before install and all drives were tested for several full read/write cycles before installation without any errors.

But this week's scrub (still in progress) is showing 324K repaired.
1. Is this normal that each scrub should be repairing small amounts of data?
2. Should I be concerned that this weeks scrub is only going at 245 MB/s versus last week's 500+ MB/s? There is not a lot of activity going on in the pool right now that would be slowing it down.

I have posted the zpool status below:

Code:

[root@freenas] ~# zpool status
  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Dec 11 03:45:09 2016
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  da0p2	 ONLINE	   0	 0	 0

errors: No known data errors

  pool: tank
state: ONLINE
  scan: scrub in progress since Mon Dec 19 22:00:02 2016
		20.5T scanned out of 26.3T at 246M/s, 6h50m to go
		324K repaired, 78.01% done
config:

		NAME											STATE	 READ WRITE CKSUM
		tank											ONLINE	   0	 0	 0
		  raidz2-0									  ONLINE	   0	 0	 0
			gptid/c375e1a1-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c42a5d2c-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c4d3f78e-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c576f47a-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0  (repairing)
			gptid/c621429c-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c6c8e874-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c771fa0c-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0
			gptid/c82043ca-a141-11e6-8641-0cc47ac565d4  ONLINE	   0	 0	 0

errors: No known data errors

SweetAndLow · Dec 20, 2016

Looks like maybe a bad cable or back plane

Sent from my Nexus 5X using Tapatalk

joel3452 · Dec 21, 2016

So after the scrub completed I got the daily security run output (shown below) and it is listing another CAM status: CCB request completed with an error, but this time on da4. Since that is another drive, but on the same SAS port/expander cable that would explain that output. Is there a good place for SAS cables I need a SFF-8643 to 4xsata? The initial 2 sets are from a 3rd party Amazon seller with a high rating. https://www.amazon.com/dp/B01GPDBHDY/?tag=ozlp-20

Code:

freenas.local kernel log messages:
>	   (da4:mpr0:0:3:0): READ(10). CDB: 28 00 5f d7 54 d8 00 00 e0 00 length 114688 SMID 1016 terminated ioc 804b scsi 0 state 0 xfer 0
> (da4:mpr0:0:3:0): READ(10). CDB: 28 00 5f d7 54 d8 00 00 e0 00
> (da4:mpr0:0:3:0): CAM status: CCB request completed with an error
> (da4:mpr0:0:3:0): Retrying command
> (da4:mpr0:0:3:0): READ(10). CDB: 28 00 5f d7 53 f8 00 00 e0 00
> (da4:mpr0:0:3:0): CAM status: SCSI Status Error
> (da4:mpr0:0:3:0): SCSI status: Check Condition
> (da4:mpr0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da4:mpr0:0:3:0): Info: 0x5fd753f8
> (da4:mpr0:0:3:0): Error 5, Unretryable error
>	   (da4:mpr0:0:3:0): READ(10). CDB: 28 00 d8 00 e8 c0 00 00 e0 00 length 114688 SMID 156 terminated ioc 804b scsi 0 state 0 xfer 0
> (da4:mpr0:0:3:0): READ(10). CDB: 28 00 d8 00 e8 c0 00 00 e0 00
> (da4:mpr0:0:3:0): CAM status: CCB request completed with an error
> (da4:mpr0:0:3:0): Retrying command
> (da4:mpr0:0:3:0): READ(10). CDB: 28 00 d8 00 e7 e0 00 00 e0 00
> (da4:mpr0:0:3:0): CAM status: SCSI Status Error
> (da4:mpr0:0:3:0): SCSI status: Check Condition
> (da4:mpr0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da4:mpr0:0:3:0): Info: 0xd800e7e0
> (da4:mpr0:0:3:0): Error 5, Unretryable error
>	   (da4:mpr0:0:3:0): READ(16). CDB: 88 00 00 00 00 01 99 a1 58 28 00 00 00 e0 00 00 length 114688 SMID 764 terminated ioc 804b scsi 0 state 0 xfer 0
> (da4:mpr0:0:3:0): READ(16). CDB: 88 00 00 00 00 01 99 a1 58 28 00 00 00 e0 00 00
> (da4:mpr0:0:3:0): CAM status: CCB request completed with an error
> (da4:mpr0:0:3:0): Retrying command
> (da4:mpr0:0:3:0): READ(16). CDB: 88 00 00 00 00 01 99 a1 57 48 00 00 00 e0 00 00
> (da4:mpr0:0:3:0): CAM status: SCSI Status Error
> (da4:mpr0:0:3:0): SCSI status: Check Condition
> (da4:mpr0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da4:mpr0:0:3:0): Info: 0x199a15748
> (da4:mpr0:0:3:0): Error 5, Unretryable error

-- End of security output --

SweetAndLow · Dec 21, 2016

joel3452 said:

So after the scrub completed I got the daily security run output (shown below) and it is listing another CAM status: CCB request completed with an error, but this time on da4. Since that is another drive, but on the same SAS port/expander cable that would explain that output. Is there a good place for SAS cables I need a SFF-8643 to 4xsata? The initial 2 sets are from a 3rd party Amazon seller with a high rating. https://www.amazon.com/dp/B01GPDBHDY/?tag=ozlp-20

Code:

freenas.local kernel log messages:
>	   (da4:mpr0:0:3:0): READ(10). CDB: 28 00 5f d7 54 d8 00 00 e0 00 length 114688 SMID 1016 terminated ioc 804b scsi 0 state 0 xfer 0
> (da4:mpr0:0:3:0): READ(10). CDB: 28 00 5f d7 54 d8 00 00 e0 00
> (da4:mpr0:0:3:0): CAM status: CCB request completed with an error
> (da4:mpr0:0:3:0): Retrying command
> (da4:mpr0:0:3:0): READ(10). CDB: 28 00 5f d7 53 f8 00 00 e0 00
> (da4:mpr0:0:3:0): CAM status: SCSI Status Error
> (da4:mpr0:0:3:0): SCSI status: Check Condition
> (da4:mpr0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da4:mpr0:0:3:0): Info: 0x5fd753f8
> (da4:mpr0:0:3:0): Error 5, Unretryable error
>	   (da4:mpr0:0:3:0): READ(10). CDB: 28 00 d8 00 e8 c0 00 00 e0 00 length 114688 SMID 156 terminated ioc 804b scsi 0 state 0 xfer 0
> (da4:mpr0:0:3:0): READ(10). CDB: 28 00 d8 00 e8 c0 00 00 e0 00
> (da4:mpr0:0:3:0): CAM status: CCB request completed with an error
> (da4:mpr0:0:3:0): Retrying command
> (da4:mpr0:0:3:0): READ(10). CDB: 28 00 d8 00 e7 e0 00 00 e0 00
> (da4:mpr0:0:3:0): CAM status: SCSI Status Error
> (da4:mpr0:0:3:0): SCSI status: Check Condition
> (da4:mpr0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da4:mpr0:0:3:0): Info: 0xd800e7e0
> (da4:mpr0:0:3:0): Error 5, Unretryable error
>	   (da4:mpr0:0:3:0): READ(16). CDB: 88 00 00 00 00 01 99 a1 58 28 00 00 00 e0 00 00 length 114688 SMID 764 terminated ioc 804b scsi 0 state 0 xfer 0
> (da4:mpr0:0:3:0): READ(16). CDB: 88 00 00 00 00 01 99 a1 58 28 00 00 00 e0 00 00
> (da4:mpr0:0:3:0): CAM status: CCB request completed with an error
> (da4:mpr0:0:3:0): Retrying command
> (da4:mpr0:0:3:0): READ(16). CDB: 88 00 00 00 00 01 99 a1 57 48 00 00 00 e0 00 00
> (da4:mpr0:0:3:0): CAM status: SCSI Status Error
> (da4:mpr0:0:3:0): SCSI status: Check Condition
> (da4:mpr0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da4:mpr0:0:3:0): Info: 0x199a15748
> (da4:mpr0:0:3:0): Error 5, Unretryable error

-- End of security output --

What cable do you need? I think you need a sff-8087 reverse breakout cable. I'm not familiar with the cable you described in your post.

Sent from my Nexus 5X using Tapatalk

Stux · Dec 21, 2016

joel3452 said:

So after the scrub completed I got the daily security run output (shown below) and it is listing another CAM status: CCB request completed with an error, but this time on da4. Since that is another drive, but on the same SAS port/expander cable that would explain that output. Is there a good place for SAS cables I need a SFF-8643 to 4xsata? The initial 2 sets are from a 3rd party Amazon seller with a high rating. https://www.amazon.com/dp/B01GPDBHDY/?tag=ozlp-20

Code:

freenas.local kernel log messages:
>	   (da4:mpr0:0:3:0): READ(10). CDB: 28 00 5f d7 54 d8 00 00 e0 00 length 114688 SMID 1016 terminated ioc 804b scsi 0 state 0 xfer 0
> (da4:mpr0:0:3:0): READ(10). CDB: 28 00 5f d7 54 d8 00 00 e0 00
> (da4:mpr0:0:3:0): CAM status: CCB request completed with an error
> (da4:mpr0:0:3:0): Retrying command
> (da4:mpr0:0:3:0): READ(10). CDB: 28 00 5f d7 53 f8 00 00 e0 00
> (da4:mpr0:0:3:0): CAM status: SCSI Status Error
> (da4:mpr0:0:3:0): SCSI status: Check Condition
> (da4:mpr0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da4:mpr0:0:3:0): Info: 0x5fd753f8
> (da4:mpr0:0:3:0): Error 5, Unretryable error
>	   (da4:mpr0:0:3:0): READ(10). CDB: 28 00 d8 00 e8 c0 00 00 e0 00 length 114688 SMID 156 terminated ioc 804b scsi 0 state 0 xfer 0
> (da4:mpr0:0:3:0): READ(10). CDB: 28 00 d8 00 e8 c0 00 00 e0 00
> (da4:mpr0:0:3:0): CAM status: CCB request completed with an error
> (da4:mpr0:0:3:0): Retrying command
> (da4:mpr0:0:3:0): READ(10). CDB: 28 00 d8 00 e7 e0 00 00 e0 00
> (da4:mpr0:0:3:0): CAM status: SCSI Status Error
> (da4:mpr0:0:3:0): SCSI status: Check Condition
> (da4:mpr0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da4:mpr0:0:3:0): Info: 0xd800e7e0
> (da4:mpr0:0:3:0): Error 5, Unretryable error
>	   (da4:mpr0:0:3:0): READ(16). CDB: 88 00 00 00 00 01 99 a1 58 28 00 00 00 e0 00 00 length 114688 SMID 764 terminated ioc 804b scsi 0 state 0 xfer 0
> (da4:mpr0:0:3:0): READ(16). CDB: 88 00 00 00 00 01 99 a1 58 28 00 00 00 e0 00 00
> (da4:mpr0:0:3:0): CAM status: CCB request completed with an error
> (da4:mpr0:0:3:0): Retrying command
> (da4:mpr0:0:3:0): READ(16). CDB: 88 00 00 00 00 01 99 a1 57 48 00 00 00 e0 00 00
> (da4:mpr0:0:3:0): CAM status: SCSI Status Error
> (da4:mpr0:0:3:0): SCSI status: Check Condition
> (da4:mpr0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da4:mpr0:0:3:0): Info: 0x199a15748
> (da4:mpr0:0:3:0): Error 5, Unretryable error

-- End of security output --

I believe that is the right cable.

joel3452 · Dec 21, 2016

Well that is the cable I bought when I setup the server, but it felt somewhat cheap, and maybe (hopefully) what is causing the
CAM status: CCB request completed with an error and the small (a few hundred KB) repairs during scrubs.

So I was hoping for a higher quality version of that cable. I would prefer to keep the angled sata ends and clips. What brand/store do most people use for their SAS breakout cables?

Stux · Dec 21, 2016

The other thing that could be causing it is a faulty PSU

Or less than optimal power distribution.

tealcomp · Mar 4, 2017

Hi @joel3452, not to revive an old thread, but did you ever resolve this problem? I am running into a very similar problem myself (working to update my problem in a new thread).

Thanks,
-Dan

tobiasbp · Aug 28, 2017

joel3452 said:
Code:
Device Model: WDC WD60EFRX-68L0BN1

tealcomp said:
As promised, here is the output for smartctl -a from all drives, ada0-ada1, da0-da3

I think we have the same problem. After having replaced almost everything in my machine, it turns out, the drive with worst behaviour is of type WD60EFRX-68L0BN1 just like yours. Other disks with device type WD60EFRX-68MYMN1 in my zpool, do not exibit the same behaviour (Throwing SCSI errors).

tealcomp · Aug 28, 2017

I am glad to see this is not just a problem I am encountering. The problem is, the drives pass all of the Western Digital tests I can throw at them; and I have at this point replaced everything related to the storage; minus the CPU and memory. I have contacted WD about this and their customer service leaves much to be desired. There are times the scrubs work without issue, whereas this scrub cycle only one drive of the 8 in the RAIDZ2 array recorded a small repair error of 88k. I really don't have much choice but to just maintain another level of backups (which I do as a matter of routine) and when the drives start to throw more concerning errors to replace them (likely with another brand). Right now, I run routine short and long SMART tests and nothing concerning is coming back. What if any plans do you have in mind with yours?

Important Announcement for the TrueNAS Community.

Multiple small repairs during Scrubs

Dabbler

Dabbler

Wizard

Dabbler

Wizard

Server Wrangler

Dabbler

Server Wrangler

Wizard

Dabbler

Dabbler

Sweet'NASty

Dabbler

Sweet'NASty

MVP

Dabbler

MVP

Explorer

Patron

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Multiple small repairs during Scrubs"

Similar threads