Help to confirm my drive's gone bad, please

Murac · Dec 28, 2017

Hello friends!

EDIT: i've updated the post to include the full smart results for all drives and output from dmesg describing CAM errors from /dev/da3 and /dev/da4

I'm currently running another long test on all the drives, will update when they finish

dmesg | grep mpr can be found here: https://gist.github.com/murac/07d1e798ea60968a6e822bf8635bb956
smartctl -x /dev/daX for all drives in order from da0 to da17 can be found here: https://gist.github.com/murac/0cccf40e4bc7a9adf11a18214ecb1246

I've a SUPERMICRO 4U 846E16-R1200B 2x Xeon L5640, with 3 vdevs containing 6 WD Red 4TB drives in RAIDZ2 in my zpool.

It might be relevant to mention that i recently upgraded to 11.1 and also decided to upgrade my pool at the same time.

So one of my drives failed a smart test last week but didn't fail additional tests afterwards. I then deleted about 4TB from my .recycle folder in the midst of troubleshooting this, and then ran a scrub. I got an alert for an error, and the same drive which failed the one smart test was being resilvered and now the zpool status says all is well but reports errors under the read and write columns.

From my reading around, the Raw_Read_Error_Rate and Multi_Zone_Error_Rate can be an indicator of a drive going bad. I've noticed this number beginning to creep up in a few of the other drives as well. Luckily most of these drives are still under warranty.

So, is my /dev/da3 drive going bad and best to just RMA for a refurbished now, or do I wait? After a full year of solid performance, could this be caused by something like a cable or controller gone bad?

Also, would it be wise to reorganize my vdevs so that the ages of the drives in the vdev are evenly distributed? I purchased and installed each of the 6 drive vdevs around the same time so they are all grouped together. I wonder if it would be worth the hassle of moving drives around and resilvering. Thoughts?

Thanks for your time in helping me in advance!

Here are some diagnostics:

Code:

########## ZPool status report for unimatrix_zero ##########

  pool: unimatrix_zero
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
   attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
   using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 4.44M in 0 days 00:00:45 with 0 errors on Thu Dec 28 10:55:10 2017
config:

   NAME											STATE	 READ WRITE CKSUM
   unimatrix_zero								  ONLINE	   0	 0	 0
	 raidz2-0									  ONLINE	   0	 0	 0
	   gptid/6b1ff6e5-7d76-11e6-b037-0025903595b8  ONLINE	   0	 0	 0
	   gptid/6c495fa0-7d76-11e6-b037-0025903595b8  ONLINE	   0	 0	 0
	   gptid/6d6cb121-7d76-11e6-b037-0025903595b8  ONLINE	   0	 0	 0
	   gptid/6e8a65aa-7d76-11e6-b037-0025903595b8  ONLINE	 114   176	 0
	   gptid/6fa9e749-7d76-11e6-b037-0025903595b8  ONLINE	   0	 0	 0
	   gptid/70d0abe4-7d76-11e6-b037-0025903595b8  ONLINE	   0	 0	 0
	 raidz2-1									  ONLINE	   0	 0	 0
	   gptid/20f3c5c1-835d-11e6-9cd1-0025903595b8  ONLINE	   0	 0	 0
	   gptid/21b71d50-835d-11e6-9cd1-0025903595b8  ONLINE	   0	 0	 0
	   gptid/2278ede6-835d-11e6-9cd1-0025903595b8  ONLINE	   0	 0	 0
	   gptid/233cd3b3-835d-11e6-9cd1-0025903595b8  ONLINE	   0	 0	 0
	   gptid/24060f46-835d-11e6-9cd1-0025903595b8  ONLINE	   0	 0	 0
	   gptid/24d18aa6-835d-11e6-9cd1-0025903595b8  ONLINE	   0	 0	 0
	 raidz2-2									  ONLINE	   0	 0	 0
	   gptid/fd72a841-3f3a-11e7-a5d2-0025903595b8  ONLINE	   0	 0	 0
	   gptid/ff9faac2-3f3a-11e7-a5d2-0025903595b8  ONLINE	   0	 0	 0
	   gptid/019be4ef-3f3b-11e7-a5d2-0025903595b8  ONLINE	   0	 0	 0
	   gptid/039e876d-3f3b-11e7-a5d2-0025903595b8  ONLINE	   0	 0	 0
	   gptid/0589dd2c-3f3b-11e7-a5d2-0025903595b8  ONLINE	   0	 0	 0
	   gptid/07ae2cde-3f3b-11e7-a5d2-0025903595b8  ONLINE	   0	 0	 0

errors: No known data errors

My guess is the problem drive is either /dev/da3 or /dev/da4 or both, so look there first

Because the output is too long, I'm including the full outputs for /dev/da3 and linking the full output for all drives to the gist noted above

Code:

########## SMART status report for da3 drive (Western Digital Red: WD-WCC4E3TEFCJX) ##########
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   53
  3 Spin_Up_Time			0x0027   185   180   021	Pre-fail  Always	   -	   7725
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   85
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   085   085   000	Old_age   Always	   -	   10998
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   85
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   83
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   176
194 Temperature_Celsius	 0x0022   120   105   000	Old_age   Always	   -	   32
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   4

No Errors Logged

smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.1-STABLE amd64] (local build)

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Red
Device Model:	 WDC WD40EFRX-68WT0N0
Serial Number:	WD-WCC4E3TEFCJX
LU WWN Device Id: 5 0014ee 2b7eccc0b
Firmware Version: 82.00A82
User Capacity:	4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5400 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Thu Dec 28 23:12:25 2017 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:	  ( 244)	Self-test routine in progress...
					40% of test remaining.
Total time to complete Offline 
data collection:		 (52380) seconds.
Offline data collection
capabilities:			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:			(0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:		(0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time:	 (   2) minutes.
Extended self-test routine
recommended polling time:	 ( 524) minutes.
Conveyance self-test routine
recommended polling time:	 (   5) minutes.
SCT capabilities:			(0x703d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

ID# ATTRIBUTE_NAME		  FLAGS	VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate	 POSR-K   200   200   051	-	53
  3 Spin_Up_Time			POS--K   185   180   021	-	7725
  4 Start_Stop_Count		-O--CK   100   100   000	-	85
  5 Reallocated_Sector_Ct   PO--CK   200   200   140	-	0
  7 Seek_Error_Rate		 -OSR-K   200   200   000	-	0
  9 Power_On_Hours		  -O--CK   085   085   000	-	10998
 10 Spin_Retry_Count		-O--CK   100   253   000	-	0
 11 Calibration_Retry_Count -O--CK   100   253   000	-	0
 12 Power_Cycle_Count	   -O--CK   100   100   000	-	85
192 Power-Off_Retract_Count -O--CK   200   200   000	-	83
193 Load_Cycle_Count		-O--CK   200   200   000	-	176
194 Temperature_Celsius	 -O---K   120   105   000	-	32
196 Reallocated_Event_Count -O--CK   200   200   000	-	0
197 Current_Pending_Sector  -O--CK   200   200   000	-	0
198 Offline_Uncorrectable   ----CK   100   253   000	-	0
199 UDMA_CRC_Error_Count	-O--CK   200   200   000	-	0
200 Multi_Zone_Error_Rate   ---R--   200   200   000	-	4
							||||||_ K auto-keep
							|||||__ C event count
							||||___ R error rate
							|||____ S speed/performance
							||_____ O updated online
							|______ P prefailure warning

General Purpose Log Directory Version 1
SMART		   Log Directory Version 1 [multi-sector log support]
Address	Access  R/W   Size  Description
0x00	   GPL,SL  R/O	  1  Log Directory
0x01		   SL  R/O	  1  Summary SMART error log
0x02		   SL  R/O	  5  Comprehensive SMART error log
0x03	   GPL	 R/O	  6  Ext. Comprehensive SMART error log
0x06		   SL  R/O	  1  SMART self-test log
0x07	   GPL	 R/O	  1  Extended self-test log
0x09		   SL  R/W	  1  Selective self-test log
0x10	   GPL	 R/O	  1  SATA NCQ Queued Error log
0x11	   GPL	 R/O	  1  SATA Phy Event Counters log
0x21	   GPL	 R/O	  1  Write stream error log
0x22	   GPL	 R/O	  1  Read stream error log
0x80-0x9f  GPL,SL  R/W	 16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS	  16  Device vendor specific log
0xa8-0xb6  GPL,SL  VS	   1  Device vendor specific log
0xb7	   GPL,SL  VS	  39  Device vendor specific log
0xbd	   GPL,SL  VS	   1  Device vendor specific log
0xc0	   GPL,SL  VS	   1  Device vendor specific log
0xc1	   GPL	 VS	  93  Device vendor specific log
0xe0	   GPL,SL  R/W	  1  SCT Command/Status
0xe1	   GPL,SL  R/W	  1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 25 (device log contains only the most recent 24 errors)
	CR	 = Command Register
	FEATR  = Features Register
	COUNT  = Count (was: Sector Count) Register
	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
	LH	 = LBA High (was: Cylinder High) Register	]   LBA
	LM	 = LBA Mid (was: Cylinder Low) Register	  ] Register
	LL	 = LBA Low (was: Sector Number) Register	 ]
	DV	 = Device (was: Device/Head) Register
	DC	 = Device Control Register
	ER	 = Error register
	ST	 = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 25 [0] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 52 cb a8 40 00  Error: UNC at LBA = 0x0952cba8 = 156421032

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 01 00 00 38 00 00 09 52 d5 40 40 00 14d+14:36:36.432  READ FPDMA QUEUED
  60 01 00 00 30 00 00 09 52 d4 40 40 00 14d+14:36:36.432  READ FPDMA QUEUED
  60 01 00 00 28 00 00 09 52 d3 40 40 00 14d+14:36:36.432  READ FPDMA QUEUED
  60 01 00 00 20 00 00 09 52 d2 40 40 00 14d+14:36:36.432  READ FPDMA QUEUED
  60 01 00 00 18 00 00 09 52 d1 40 40 00 14d+14:36:36.432  READ FPDMA QUEUED

Error 24 [23] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 52 33 d8 40 00  Error: WP at LBA = 0x095233d8 = 156382168

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 40 00 28 00 00 2c 00 52 88 40 00 14d+14:36:25.310  WRITE FPDMA QUEUED
  61 00 40 00 20 00 00 2c 00 52 c8 40 00 14d+14:36:25.310  WRITE FPDMA QUEUED
  60 00 80 00 18 00 00 09 52 35 d8 40 00 14d+14:36:25.310  READ FPDMA QUEUED
  60 01 00 00 10 00 00 09 52 34 d8 40 00 14d+14:36:25.309  READ FPDMA QUEUED
  60 01 00 00 08 00 00 09 52 33 d8 40 00 14d+14:36:25.309  READ FPDMA QUEUED

Error 23 [22] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 52 32 70 40 00  Error: WP at LBA = 0x09523270 = 156381808

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 40 00 30 00 00 2c 00 52 88 40 00 14d+14:36:21.953  WRITE FPDMA QUEUED
  61 00 40 00 00 00 00 2c 00 52 c8 40 00 14d+14:36:21.953  WRITE FPDMA QUEUED
  60 00 80 00 28 00 00 09 52 35 d8 40 00 14d+14:36:21.556  READ FPDMA QUEUED
  60 01 00 00 20 00 00 09 52 34 d8 40 00 14d+14:36:21.556  READ FPDMA QUEUED
  60 01 00 00 18 00 00 09 52 33 d8 40 00 14d+14:36:21.556  READ FPDMA QUEUED

Error 22 [21] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 50 29 c0 40 00  Error: UNC at LBA = 0x095029c0 = 156248512

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 10 00 58 00 01 d1 c0 bc 90 40 00 14d+14:36:11.237  READ FPDMA QUEUED
  60 00 10 00 50 00 01 d1 c0 ba 90 40 00 14d+14:36:11.237  READ FPDMA QUEUED
  60 00 10 00 48 00 00 00 40 02 90 40 00 14d+14:36:11.237  READ FPDMA QUEUED
  60 00 80 00 40 00 00 09 50 2f 18 40 00 14d+14:36:11.237  READ FPDMA QUEUED
  60 01 00 00 38 00 00 09 50 2e 18 40 00 14d+14:36:11.237  READ FPDMA QUEUED

Error 21 [20] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 50 27 f0 40 00  Error: UNC at LBA = 0x095027f0 = 156248048

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 10 00 70 00 01 d1 c0 bc 90 40 00 14d+14:36:07.317  READ FPDMA QUEUED
  60 00 10 00 68 00 01 d1 c0 ba 90 40 00 14d+14:36:07.317  READ FPDMA QUEUED
  60 00 10 00 60 00 00 00 40 02 90 40 00 14d+14:36:07.317  READ FPDMA QUEUED
  60 00 80 00 58 00 00 09 50 2f 18 40 00 14d+14:36:07.317  READ FPDMA QUEUED
  60 01 00 00 50 00 00 09 50 2e 18 40 00 14d+14:36:07.317  READ FPDMA QUEUED

Error 20 [19] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 50 24 30 40 00  Error: UNC at LBA = 0x09502430 = 156247088

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 c8 00 40 00 00 09 50 2b 50 40 00 14d+14:36:03.575  READ FPDMA QUEUED
  60 01 00 00 38 00 00 09 50 2a 50 40 00 14d+14:36:03.575  READ FPDMA QUEUED
  60 01 00 00 30 00 00 09 50 29 50 40 00 14d+14:36:03.575  READ FPDMA QUEUED
  60 01 00 00 28 00 00 09 50 28 50 40 00 14d+14:36:03.575  READ FPDMA QUEUED
  60 01 00 00 20 00 00 09 50 27 50 40 00 14d+14:36:03.575  READ FPDMA QUEUED

Error 19 [18] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 50 16 08 40 00  Error: UNC at LBA = 0x09501608 = 156243464

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 c0 00 08 00 00 09 50 16 d8 40 00 14d+14:35:59.821  READ FPDMA QUEUED
  60 01 00 00 00 00 00 09 50 15 d8 40 00 14d+14:35:59.821  READ FPDMA QUEUED
  ea 00 00 00 00 00 00 00 00 00 00 40 00 14d+14:35:59.821  FLUSH CACHE EXT
  ea 00 00 00 00 00 00 00 00 00 00 40 00 14d+14:35:59.821  FLUSH CACHE EXT
  ea 00 00 00 00 00 00 00 00 00 00 40 00 14d+14:35:59.821  FLUSH CACHE EXT

Error 18 [17] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 50 13 58 40 00  Error: WP at LBA = 0x09501358 = 156242776

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 40 00 48 00 00 2b ff f8 d0 40 00 14d+14:35:56.406  WRITE FPDMA QUEUED
  61 00 40 00 40 00 00 2b ff f8 90 40 00 14d+14:35:56.406  WRITE FPDMA QUEUED
  61 00 10 00 38 00 00 2b ff f8 80 40 00 14d+14:35:56.405  WRITE FPDMA QUEUED
  61 00 40 00 30 00 00 2b ff f8 40 40 00 14d+14:35:56.405  WRITE FPDMA QUEUED
  60 00 c0 00 28 00 00 09 50 16 d8 40 00 14d+14:35:56.405  READ FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	 10994		 -
# 2  Short offline	   Completed without error	   00%	 10930		 -
# 3  Extended offline	Completed without error	   00%	 10843		 -
# 4  Short offline	   Completed without error	   00%	 10763		 -
# 5  Extended offline	Completed without error	   00%	 10759		 -
# 6  Extended offline	Completed without error	   00%	 10714		 -
# 7  Extended offline	Completed: read failure	   90%	 10565		 32332392
# 8  Short offline	   Completed without error	   00%	 10495		 -
# 9  Short offline	   Completed without error	   00%	 10279		 -
#10  Extended offline	Completed without error	   00%	 10191		 -
#11  Short offline	   Completed without error	   00%	 10111		 -
#12  Short offline	   Completed without error	   00%	  9943		 -
#13  Extended offline	Completed without error	   00%	  9856		 -
#14  Short offline	   Completed without error	   00%	  9775		 -
#15  Short offline	   Completed without error	   00%	  9534		 -
#16  Extended offline	Completed without error	   00%	  9447		 -
#17  Short offline	   Completed without error	   00%	  9221		 -
#18  Extended offline	Completed without error	   00%	  9134		 -
1 of 1 failed self-tests are outdated by newer successful extended offline self-test # 3

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:				  3
SCT Version (vendor specific):	   258 (0x0102)
SCT Support Level:				   1
Device State:						DST executing in background (3)
Current Temperature:					32 Celsius
Power Cycle Min/Max Temperature:	 26/35 Celsius
Lifetime	Min/Max Temperature:	 20/47 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:	 2
Temperature Sampling Period:		 1 minute
Temperature Logging Interval:		1 minute
Min/Max recommended Temperature:	  0/60 Celsius
Min/Max Temperature Limit:		   -41/85 Celsius
Temperature History Size (Index):	478 (374)

Index	Estimated Time   Temperature Celsius
 375	2017-12-28 15:15	32  *************
 ...	..( 20 skipped).	..  *************
 396	2017-12-28 15:36	32  *************
 397	2017-12-28 15:37	30  ***********
 ...	..(  4 skipped).	..  ***********
 402	2017-12-28 15:42	30  ***********
 403	2017-12-28 15:43	29  **********
 ...	..(208 skipped).	..  **********
 134	2017-12-28 19:12	29  **********
 135	2017-12-28 19:13	30  ***********
 ...	..(  3 skipped).	..  ***********
 139	2017-12-28 19:17	30  ***********
 140	2017-12-28 19:18	31  ************
 ...	..( 13 skipped).	..  ************
 154	2017-12-28 19:32	31  ************
 155	2017-12-28 19:33	32  *************
 ...	..(218 skipped).	..  *************
 374	2017-12-28 23:12	32  *************

SCT Error Recovery Control:
		   Read:	 70 (7.0 seconds)
		  Write:	 70 (7.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID	  Size	 Value  Description
0x0001  2			0  Command failed due to ICRC error
0x0002  2			0  R_ERR response for data FIS
0x0003  2			0  R_ERR response for device-to-host data FIS
0x0004  2			0  R_ERR response for host-to-device data FIS
0x0005  2			0  R_ERR response for non-data FIS
0x0006  2			0  R_ERR response for device-to-host non-data FIS
0x0007  2			0  R_ERR response for host-to-device non-data FIS
0x0008  2			0  Device-to-host non-data FIS retries
0x0009  2			4  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2			5  Device-to-host register FISes sent due to a COMRESET
0x000b  2			0  CRC errors within host-to-device FIS
0x000f  2			0  R_ERR response for host-to-device data FIS, CRC
0x0012  2			0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4	  1306459  Vendor specific

Jailer · Dec 28, 2017

Do you have regular smart tests enabled or did you cut off the output? From what you posted da3 is the only drive that has had regular smart tests completed.

You need to check the smart status of all your drives as I don't think you've identified the problem drive yet.

Murac · Dec 28, 2017

Jailer said:
Do you have regular smart tests enabled or did you cut off the output?

I truncated the history from the other drives because I've never had a failure. They are all scheduled to run short and long tests multiple times a month. I da3 is the only drive that's shown any real sign. Also, i just ran dmesg and found these CAM errors. it looks like /dev/da4 had a few errors also but only 2 compared to 18 or so.

The following is an excerpt, i created a gist for the full output showing all CAM errors on da3 and da4 here: https://gist.github.com/murac/07d1e798ea60968a6e822bf8635bb956

Code:

mps0: <Avago Technologies (LSI) SAS2008> port 0xe000-0xe0ff mem 0xfaf3c000-0xfaf3ffff,0xfaf40000-0xfaf7ffff irq 28 at device 0.0 on pci1
mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
mps0: SAS Address for SATA device = 49665d61e0aed28b
mps0: SAS Address for SATA device = 4965655aeda8b793
mps0: SAS Address for SATA device = 4966636df996d385
mps0: SAS Address for SATA device = 4965655aeda1d195
mps0: SAS Address for SATA device = 4966636df9b1c88d
mps0: SAS Address for SATA device = 4966525de0a9d16e
mps0: SAS Address for SATA device = 4976666eed95bc92
mps0: SAS Address for SATA device = 497d6665faaabe73
mps0: SAS Address for SATA device = 49676160ef91c86d
mps0: SAS Address for SATA device = 497a5b56018fe083
mps0: SAS Address from SATA device = 49665d61e0aed28b
mps0: SAS Address from SATA device = 4965655aeda8b793
mps0: SAS Address from SATA device = 4966636df996d385
mps0: SAS Address from SATA device = 4965655aeda1d195
mps0: SAS Address from SATA device = 4966636df9b1c88d
mps0: SAS Address from SATA device = 4966525de0a9d16e
mps0: SAS Address from SATA device = 4976666eed95bc92
mps0: SAS Address from SATA device = 497d6665faaabe73
mps0: SAS Address from SATA device = 49676160ef91c86d
mps0: SAS Address from SATA device = 497a5b56018fe083
mps0: SAS Address for SATA device = 497d6156d7a3d982
mps0: SAS Address for SATA device = 497d6156d7b0db8b
mps0: SAS Address for SATA device = 4f64564fb8b6cf96
mps0: SAS Address for SATA device = 4f64564fb890ba94
mps0: SAS Address for SATA device = 4f655f64d6b8c096
mps0: SAS Address for SATA device = 4f665462cab0bc98
mps0: SAS Address for SATA device = 4f655f64d6b2ba8e
mps0: SAS Address for SATA device = 4f666160d6abbb71
mps0: SAS Address from SATA device = 497d6156d7a3d982
mps0: SAS Address from SATA device = 497d6156d7b0db8b
mps0: SAS Address from SATA device = 4f64564fb8b6cf96
mps0: SAS Address from SATA device = 4f64564fb890ba94
mps0: SAS Address from SATA device = 4f655f64d6b8c096
mps0: SAS Address from SATA device = 4f665462cab0bc98
mps0: SAS Address from SATA device = 4f655f64d6b2ba8e
mps0: SAS Address from SATA device = 4f666160d6abbb71
da0 at mps0 bus 0 scbus0 target 8 lun 0
da1 at mps0 bus 0 scbus0 target 9 lun 0
da4 at mps0 bus 0 scbus0 target 12 lun 0
da3 at mps0 bus 0 scbus0 target 11 lun 0
da5 at mps0 bus 0 scbus0 target 13 lun 0
da10 at mps0 bus 0 scbus0 target 18 lun 0
da9 at mps0 bus 0 scbus0 target 17 lun 0
da13 at mps0 bus 0 scbus0 target 22 lun 0
da6 at mps0 bus 0 scbus0 target 14 lun 0
da2 at mps0 bus 0 scbus0 target 10 lun 0
da11 at mps0 bus 0 scbus0 target 19 lun 0
da14 at mps0 bus 0 scbus0 target 23 lun 0
da15 at mps0 bus 0 scbus0 target 24 lun 0
da7 at mps0 bus 0 scbus0 target 15 lun 0
da8 at mps0 bus 0 scbus0 target 16 lun 0
da16 at mps0 bus 0 scbus0 target 25 lun 0
da12 at mps0 bus 0 scbus0 target 21 lun 0
da17 at mps0 bus 0 scbus0 target 26 lun 0
ses0 at mps0 bus 0 scbus0 target 20 lun 0
	 (da3:mps0:0:11:0): READ(10). CDB: 28 00 01 ed 5a 98 00 01 00 00 length 131072 SMID 152 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
	 (da3:mps0:0:11:0): READ(10). CDB: 28 00 01 ed 5b 98 00 01 00 00 length 131072 SMID 897 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
(da3:mps0:0:11:0): READ(10). CDB: 28 00 01 ed 5a 98 00 01 00 00
	 (da3:mps0:0:11:0): READ(10). CDB: 28 00 92 a3 f7 20 00 00 40 00 length 32768 SMID 460 terminated ioc 804b loginfo 31080000 sc(da3:mps0:0:11:0): CAM status: CCB request completed with an error
(da3:mps0:0:11:0): Retrying command
(da3:mps0:0:11:0): READ(10). CDB: 28 00 01 ed 5b 98 00 01 00 00
	 (da3:mps0:0:11:0): READ(10). CDB: 28 00 01 ed 5c 98 00 01 00 00 length 131072 SMID 551 terminated ioc 804b loginfo 31080000 s(da3:mps0:0:11:0): CAM status: CCB request completed with an error
(da3:mps0:0:csi 0 state 0 xfer 0
	 (da3:mps0:0:11:0): READ(10). CDB: 28 00 01 ed 5d 98 00 01 00 00 length 131072 SMID 913 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
	 (da3:mps0:0:11:0): READ(10). CDB: 28 00 01 ed 5e 98 00 01 00 00 length 131072 SMID 237 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
	 (da3:mps0:0:11:0): READ(10). CDB: 28 00 01 ed 5f 98 00 01 00 00 length 131072 SMID 301 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
	 (da3:mps0:0:11:0): READ(10). CDB: 28 00 01 ed 60 98 00 01 00 00 length 131072 SMID 396 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
	 (da3:mps0:0:11:0): READ(10). CDB: 28 00 01 ed 61 98 00 01 00 00 length 131072 SMID 418 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
	 (da3:mps0:0:11:0): READ(10). CDB: 28 00 01 ed 62 98 00 01 00 00 length 131072 SMID 569 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
	 (da3:mps0:0:11:0): READ(10). CDB: 28 00 01 ed 63 98 00 01 00 00 length 131072 SMID 655 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
	 (da3:mps0:0:11:0): READ(10). CDB: 28 00 01 ed 64 98 00 01 00 00 length 131072 SMID 761 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
	 (da3:mps0:0:11:0): READ(10). CDB: 28 00 92 a3 8e 60 00 00 38 00 length 28672 SMID 268 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
	 (da3:mps0:0:11:0): READ(16). CDB: 88 00 00 00 00 01 ba 2e 6a 78 00 00 00 40 00 00 length 32768 SMID 527 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
	 (da3:mps0:0:11:0): READ(10). CDB: 28 00 01 ed 65 98 00 01 00 00 length 131072 SMID 807 terminated ioc 804b loginfo 31080000 s(da3:mps0:0:11:0): READ(10). CDB: 28 00 92 a3 f7 20 00 00 40 00
	 (da3:mps0:0:11:0): READ(10). CDB: 28 00 01 ed 66 98 00 01 00 00 length 131072 SMID 341 terminated ioc 804b loginfo 31080000 s(da3:mps0:0:11:0): CAM status: CCB request completed with an error
	 (da3:mps0:0:11:0): READ(10). CDB: 28 00 01 ed 67 98 00 00 b0 00 length 90112 SMID 176 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
(da3:mps0:0:11:0): Retrying command
(da3:mps0:0:11:0): READ(10). CDB: 28 00 01 ed 5c 98 00 01 00 00
(da3:mps0:0:11:0): CAM status: CCB request completed with an error
(da3:mps0:0:11:0): Retrying command

I also ran smartctl -x /dev/daX for /dev/da3 and /dev/da4 respectively ad they report:

/dev/da3

Code:

Error 25 [0] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 52 cb a8 40 00  Error: UNC at LBA = 0x0952cba8 = 156421032

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 01 00 00 38 00 00 09 52 d5 40 40 00 14d+14:36:36.432  READ FPDMA QUEUED
  60 01 00 00 30 00 00 09 52 d4 40 40 00 14d+14:36:36.432  READ FPDMA QUEUED
  60 01 00 00 28 00 00 09 52 d3 40 40 00 14d+14:36:36.432  READ FPDMA QUEUED
  60 01 00 00 20 00 00 09 52 d2 40 40 00 14d+14:36:36.432  READ FPDMA QUEUED
  60 01 00 00 18 00 00 09 52 d1 40 40 00 14d+14:36:36.432  READ FPDMA QUEUED

Error 24 [23] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 52 33 d8 40 00  Error: WP at LBA = 0x095233d8 = 156382168

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 40 00 28 00 00 2c 00 52 88 40 00 14d+14:36:25.310  WRITE FPDMA QUEUED
  61 00 40 00 20 00 00 2c 00 52 c8 40 00 14d+14:36:25.310  WRITE FPDMA QUEUED
  60 00 80 00 18 00 00 09 52 35 d8 40 00 14d+14:36:25.310  READ FPDMA QUEUED
  60 01 00 00 10 00 00 09 52 34 d8 40 00 14d+14:36:25.309  READ FPDMA QUEUED
  60 01 00 00 08 00 00 09 52 33 d8 40 00 14d+14:36:25.309  READ FPDMA QUEUED

Error 23 [22] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 52 32 70 40 00  Error: WP at LBA = 0x09523270 = 156381808

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 40 00 30 00 00 2c 00 52 88 40 00 14d+14:36:21.953  WRITE FPDMA QUEUED
  61 00 40 00 00 00 00 2c 00 52 c8 40 00 14d+14:36:21.953  WRITE FPDMA QUEUED
  60 00 80 00 28 00 00 09 52 35 d8 40 00 14d+14:36:21.556  READ FPDMA QUEUED
  60 01 00 00 20 00 00 09 52 34 d8 40 00 14d+14:36:21.556  READ FPDMA QUEUED
  60 01 00 00 18 00 00 09 52 33 d8 40 00 14d+14:36:21.556  READ FPDMA QUEUED

Error 22 [21] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 50 29 c0 40 00  Error: UNC at LBA = 0x095029c0 = 156248512

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 10 00 58 00 01 d1 c0 bc 90 40 00 14d+14:36:11.237  READ FPDMA QUEUED
  60 00 10 00 50 00 01 d1 c0 ba 90 40 00 14d+14:36:11.237  READ FPDMA QUEUED
  60 00 10 00 48 00 00 00 40 02 90 40 00 14d+14:36:11.237  READ FPDMA QUEUED
  60 00 80 00 40 00 00 09 50 2f 18 40 00 14d+14:36:11.237  READ FPDMA QUEUED
  60 01 00 00 38 00 00 09 50 2e 18 40 00 14d+14:36:11.237  READ FPDMA QUEUED

Error 21 [20] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 50 27 f0 40 00  Error: UNC at LBA = 0x095027f0 = 156248048

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 10 00 70 00 01 d1 c0 bc 90 40 00 14d+14:36:07.317  READ FPDMA QUEUED
  60 00 10 00 68 00 01 d1 c0 ba 90 40 00 14d+14:36:07.317  READ FPDMA QUEUED
  60 00 10 00 60 00 00 00 40 02 90 40 00 14d+14:36:07.317  READ FPDMA QUEUED
  60 00 80 00 58 00 00 09 50 2f 18 40 00 14d+14:36:07.317  READ FPDMA QUEUED
  60 01 00 00 50 00 00 09 50 2e 18 40 00 14d+14:36:07.317  READ FPDMA QUEUED

Error 20 [19] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 50 24 30 40 00  Error: UNC at LBA = 0x09502430 = 156247088

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 c8 00 40 00 00 09 50 2b 50 40 00 14d+14:36:03.575  READ FPDMA QUEUED
  60 01 00 00 38 00 00 09 50 2a 50 40 00 14d+14:36:03.575  READ FPDMA QUEUED
  60 01 00 00 30 00 00 09 50 29 50 40 00 14d+14:36:03.575  READ FPDMA QUEUED
  60 01 00 00 28 00 00 09 50 28 50 40 00 14d+14:36:03.575  READ FPDMA QUEUED
  60 01 00 00 20 00 00 09 50 27 50 40 00 14d+14:36:03.575  READ FPDMA QUEUED

Error 19 [18] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 50 16 08 40 00  Error: UNC at LBA = 0x09501608 = 156243464

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 c0 00 08 00 00 09 50 16 d8 40 00 14d+14:35:59.821  READ FPDMA QUEUED
  60 01 00 00 00 00 00 09 50 15 d8 40 00 14d+14:35:59.821  READ FPDMA QUEUED
  ea 00 00 00 00 00 00 00 00 00 00 40 00 14d+14:35:59.821  FLUSH CACHE EXT
  ea 00 00 00 00 00 00 00 00 00 00 40 00 14d+14:35:59.821  FLUSH CACHE EXT
  ea 00 00 00 00 00 00 00 00 00 00 40 00 14d+14:35:59.821  FLUSH CACHE EXT

Error 18 [17] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 50 13 58 40 00  Error: WP at LBA = 0x09501358 = 156242776

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 40 00 48 00 00 2b ff f8 d0 40 00 14d+14:35:56.406  WRITE FPDMA QUEUED
  61 00 40 00 40 00 00 2b ff f8 90 40 00 14d+14:35:56.406  WRITE FPDMA QUEUED
  61 00 10 00 38 00 00 2b ff f8 80 40 00 14d+14:35:56.405  WRITE FPDMA QUEUED
  61 00 40 00 30 00 00 2b ff f8 40 40 00 14d+14:35:56.405  WRITE FPDMA QUEUED
  60 00 c0 00 28 00 00 09 50 16 d8 40 00 14d+14:35:56.405  READ FPDMA QUEUED

/dev/da4

Code:

Error 2 [1] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 13 36 b8 40 00  Error: UNC at LBA = 0x091336b8 = 152254136

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 10 00 58 00 01 d1 c0 bc 90 40 00 14d+14:35:04.769  READ FPDMA QUEUED
  60 00 10 00 10 00 01 d1 c0 ba 90 40 00 14d+14:35:04.769  READ FPDMA QUEUED
  60 00 10 00 50 00 00 00 40 02 90 40 00 14d+14:35:04.769  READ FPDMA QUEUED
  60 01 00 00 48 00 00 09 13 3c 40 40 00 14d+14:35:04.769  READ FPDMA QUEUED
  60 01 00 00 40 00 00 09 13 3b 40 40 00 14d+14:35:04.769  READ FPDMA QUEUED

Error 1 [0] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 13 34 f8 40 00  Error: UNC at LBA = 0x091334f8 = 152253688

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 40 00 10 00 00 09 13 35 00 40 00 14d+14:35:01.001  READ FPDMA QUEUED
  60 01 00 00 08 00 00 09 13 34 00 40 00 14d+14:35:01.000  READ FPDMA QUEUED
  60 01 00 00 00 00 00 09 13 33 00 40 00 14d+14:35:01.000  READ FPDMA QUEUED
  60 00 c0 00 08 00 00 09 13 09 00 40 00 14d+14:35:00.999  READ FPDMA QUEUED
  60 01 00 00 00 00 00 09 13 08 00 40 00 14d+14:35:00.999  READ FPDMA QUEUED

Jailer · Dec 28, 2017

Post the entire output of the drives smart data if you want others to help you. Truncating the results isn't helping anyone who may be looking at your thread to help figure out the results.

Murac · Dec 28, 2017

You're right, my apologies.

dmesg | grep mpr output can be found here: https://gist.github.com/murac/07d1e798ea60968a6e822bf8635bb956
smartctl -x /dev/daX output for all drives in order from da0 to da17 can be found here: https://gist.github.com/murac/0cccf40e4bc7a9adf11a18214ecb1246

rs225 · Dec 28, 2017

I would say RMA da3 now. If it is taking ZFS read/write errors, it is a problem. I don't think cable or controller would be a factor here. Your other errors on other drives are probably naturally occurring, but I would consider whether vibration could be contributing. Also whether da3 and da4 have something in common (like proximity), since they coincidentally are well elevated in errors compared to the others.

Johnnie Black · Dec 29, 2017

IMO both da3 and da4 should be replaced, both had read errors on dmesg and show double digits raw read errors, the other disks with single digit raw read errors should be fine for now but it's never a good sign, keep an eye on them and check if that attribute doesn't increase much more.

tvsjr · Dec 29, 2017

da3 and da4 both need to go... bad news is, there's some chance that they are both in the same vdev. I would replace da3 first (you do know which drive serial number is in which bay, yes?), allow resilvering to complete, then do da4.

You should also review your SMART test configuration. Short tests should run daily, long tests every 14 days.

Ericloewe · Dec 29, 2017

If you have two spare ports, you can resilver both of them in place simultaneously. If you have one spare port, you can resilver the worst offender in-place, first.

Murac · Dec 30, 2017

Thanks everyone for the verification. It looks like da3 failed the last long test. and da4, while it passed now has 57 raw read errors and more CAM errors. Sending them both out for RMA. hopefully nothing else goes wrong during the interim

Code:

########## SMART status report for da3 drive (Western Digital Red: WD-WCC4E3TEFCJX) ##########
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   53
  3 Spin_Up_Time			0x0027   185   180   021	Pre-fail  Always	   -	   7725
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   85
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   085   085   000	Old_age   Always	   -	   11039
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   85
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   83
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   177
194 Temperature_Celsius	 0x0022   120   105   000	Old_age   Always	   -	   32
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   11

No Errors Logged

smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.1-STABLE amd64] (local build)

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Red
Device Model:	 WDC WD40EFRX-68WT0N0
Serial Number:	WD-WCC4E3TEFCJX
LU WWN Device Id: 5 0014ee 2b7eccc0b
Firmware Version: 82.00A82
User Capacity:	4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5400 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Sat Dec 30 15:38:12 2017 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
				   was never started.
				   Auto Offline Data Collection: Disabled.
Self-test execution status:	  ( 113)	The previous self-test completed having
				   the read element of the test failed.
Total time to complete Offline
data collection:		 (52380) seconds.
Offline data collection
capabilities:			  (0x7b) SMART execute Offline immediate.
				   Auto Offline data collection on/off support.
				   Suspend Offline collection upon new
				   command.
				   Offline surface scan supported.
				   Self-test supported.
				   Conveyance Self-test supported.
				   Selective Self-test supported.
SMART capabilities:			(0x0003)	Saves SMART data before entering
				   power-saving mode.
				   Supports SMART auto save timer.
Error logging capability:		(0x01)	Error logging supported.
				   General Purpose Logging supported.
Short self-test routine
recommended polling time:	  (   2) minutes.
Extended self-test routine
recommended polling time:	  ( 524) minutes.
Conveyance self-test routine
recommended polling time:	  (   5) minutes.
SCT capabilities:			(0x703d)	SCT Status supported.
				   SCT Error Recovery Control supported.
				   SCT Feature Control supported.
				   SCT Data Table supported.

ID# ATTRIBUTE_NAME		  FLAGS	VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate	 POSR-K   200   200   051	-	53
  3 Spin_Up_Time			POS--K   185   180   021	-	7725
  4 Start_Stop_Count		-O--CK   100   100   000	-	85
  5 Reallocated_Sector_Ct   PO--CK   200   200   140	-	0
  7 Seek_Error_Rate		 -OSR-K   200   200   000	-	0
  9 Power_On_Hours		  -O--CK   085   085   000	-	11039
 10 Spin_Retry_Count		-O--CK   100   253   000	-	0
 11 Calibration_Retry_Count -O--CK   100   253   000	-	0
 12 Power_Cycle_Count	   -O--CK   100   100   000	-	85
192 Power-Off_Retract_Count -O--CK   200   200   000	-	83
193 Load_Cycle_Count		-O--CK   200   200   000	-	177
194 Temperature_Celsius	 -O---K   120   105   000	-	32
196 Reallocated_Event_Count -O--CK   200   200   000	-	0
197 Current_Pending_Sector  -O--CK   200   200   000	-	0
198 Offline_Uncorrectable   ----CK   100   253   000	-	0
199 UDMA_CRC_Error_Count	-O--CK   200   200   000	-	0
200 Multi_Zone_Error_Rate   ---R--   200   200   000	-	11
						   ||||||_ K auto-keep
						   |||||__ C event count
						   ||||___ R error rate
						   |||____ S speed/performance
						   ||_____ O updated online
						   |______ P prefailure warning

General Purpose Log Directory Version 1
SMART		   Log Directory Version 1 [multi-sector log support]
Address	Access  R/W   Size  Description
0x00	   GPL,SL  R/O	  1  Log Directory
0x01		   SL  R/O	  1  Summary SMART error log
0x02		   SL  R/O	  5  Comprehensive SMART error log
0x03	   GPL	 R/O	  6  Ext. Comprehensive SMART error log
0x06		   SL  R/O	  1  SMART self-test log
0x07	   GPL	 R/O	  1  Extended self-test log
0x09		   SL  R/W	  1  Selective self-test log
0x10	   GPL	 R/O	  1  SATA NCQ Queued Error log
0x11	   GPL	 R/O	  1  SATA Phy Event Counters log
0x21	   GPL	 R/O	  1  Write stream error log
0x22	   GPL	 R/O	  1  Read stream error log
0x80-0x9f  GPL,SL  R/W	 16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS	  16  Device vendor specific log
0xa8-0xb6  GPL,SL  VS	   1  Device vendor specific log
0xb7	   GPL,SL  VS	  39  Device vendor specific log
0xbd	   GPL,SL  VS	   1  Device vendor specific log
0xc0	   GPL,SL  VS	   1  Device vendor specific log
0xc1	   GPL	 VS	  93  Device vendor specific log
0xe0	   GPL,SL  R/W	  1  SCT Command/Status
0xe1	   GPL,SL  R/W	  1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 29 (device log contains only the most recent 24 errors)
   CR	 = Command Register
   FEATR  = Features Register
   COUNT  = Count (was: Sector Count) Register
   LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
   LH	 = LBA High (was: Cylinder High) Register	]   LBA
   LM	 = LBA Mid (was: Cylinder Low) Register	  ] Register
   LL	 = LBA Low (was: Sector Number) Register	 ]
   DV	 = Device (was: Device/Head) Register
   DC	 = Device Control Register
   ER	 = Error register
   ST	 = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 29 [4] occurred at disk power-on lifetime: 11024 hours (459 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 01 ed 68 38 40 00  Error: UNC at LBA = 0x01ed6838 = 32335928

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 40 00 08 00 00 01 ed 68 08 40 00 16d+04:39:59.486  READ FPDMA QUEUED
  60 00 40 00 00 00 00 01 ed 67 c8 40 00 16d+04:39:59.484  READ FPDMA QUEUED
  60 00 40 00 08 00 00 01 ec e7 d8 40 00 16d+04:39:59.480  READ FPDMA QUEUED
  60 00 40 00 00 00 00 01 ec e7 98 40 00 16d+04:39:59.470  READ FPDMA QUEUED
  61 00 08 00 00 00 00 0e d7 58 a0 40 00 16d+04:39:59.467  WRITE FPDMA QUEUED

Error 28 [3] occurred at disk power-on lifetime: 11020 hours (459 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 00 c2 b4 28 40 00  Error: UNC at LBA = 0x00c2b428 = 12760104

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 10 00 68 00 01 d1 c0 bc 90 40 00 16d+00:36:37.846  READ FPDMA QUEUED
  60 00 10 00 60 00 01 d1 c0 ba 90 40 00 16d+00:36:37.846  READ FPDMA QUEUED
  60 00 10 00 58 00 00 00 40 02 90 40 00 16d+00:36:37.846  READ FPDMA QUEUED
  60 01 00 00 50 00 00 00 c2 ba e0 40 00 16d+00:36:37.846  READ FPDMA QUEUED
  60 01 00 00 48 00 00 00 c2 b9 e0 40 00 16d+00:36:37.846  READ FPDMA QUEUED

Error 27 [2] occurred at disk power-on lifetime: 11020 hours (459 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 00 c2 b4 28 40 00  Error: UNC at LBA = 0x00c2b428 = 12760104

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 40 00 00 00 00 00 c2 b4 20 40 00 16d+00:36:34.488  READ FPDMA QUEUED
  60 00 40 00 00 00 00 00 c2 b3 a0 40 00 16d+00:36:34.477  READ FPDMA QUEUED
  60 00 40 00 00 00 00 00 c2 b3 20 40 00 16d+00:36:34.451  READ FPDMA QUEUED
  60 00 40 00 00 00 00 00 c2 b2 a0 40 00 16d+00:36:34.440  READ FPDMA QUEUED
  60 00 40 00 00 00 00 00 c2 b2 60 40 00 16d+00:36:34.428  READ FPDMA QUEUED

Error 26 [1] occurred at disk power-on lifetime: 11019 hours (459 days + 3 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 00 c2 a6 60 40 00  Error: UNC at LBA = 0x00c2a660 = 12756576

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 40 00 08 00 00 00 c2 a5 58 40 00 15d+23:34:11.512  READ FPDMA QUEUED
  60 00 40 00 00 00 00 00 c2 a6 58 40 00 15d+23:34:11.506  READ FPDMA QUEUED
  60 00 40 00 00 00 00 00 c2 a5 18 40 00 15d+23:34:11.495  READ FPDMA QUEUED
  60 00 40 00 00 00 00 00 c2 a6 18 40 00 15d+23:34:11.487  READ FPDMA QUEUED
  60 00 40 00 08 00 00 00 c2 a6 98 40 00 15d+23:34:11.484  READ FPDMA QUEUED

Error 25 [0] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 52 cb a8 40 00  Error: UNC at LBA = 0x0952cba8 = 156421032

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 01 00 00 38 00 00 09 52 d5 40 40 00 14d+14:36:36.432  READ FPDMA QUEUED
  60 01 00 00 30 00 00 09 52 d4 40 40 00 14d+14:36:36.432  READ FPDMA QUEUED
  60 01 00 00 28 00 00 09 52 d3 40 40 00 14d+14:36:36.432  READ FPDMA QUEUED
  60 01 00 00 20 00 00 09 52 d2 40 40 00 14d+14:36:36.432  READ FPDMA QUEUED
  60 01 00 00 18 00 00 09 52 d1 40 40 00 14d+14:36:36.432  READ FPDMA QUEUED

Error 24 [23] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 52 33 d8 40 00  Error: WP at LBA = 0x095233d8 = 156382168

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 40 00 28 00 00 2c 00 52 88 40 00 14d+14:36:25.310  WRITE FPDMA QUEUED
  61 00 40 00 20 00 00 2c 00 52 c8 40 00 14d+14:36:25.310  WRITE FPDMA QUEUED
  60 00 80 00 18 00 00 09 52 35 d8 40 00 14d+14:36:25.310  READ FPDMA QUEUED
  60 01 00 00 10 00 00 09 52 34 d8 40 00 14d+14:36:25.309  READ FPDMA QUEUED
  60 01 00 00 08 00 00 09 52 33 d8 40 00 14d+14:36:25.309  READ FPDMA QUEUED

Error 23 [22] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 52 32 70 40 00  Error: WP at LBA = 0x09523270 = 156381808

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 40 00 30 00 00 2c 00 52 88 40 00 14d+14:36:21.953  WRITE FPDMA QUEUED
  61 00 40 00 00 00 00 2c 00 52 c8 40 00 14d+14:36:21.953  WRITE FPDMA QUEUED
  60 00 80 00 28 00 00 09 52 35 d8 40 00 14d+14:36:21.556  READ FPDMA QUEUED
  60 01 00 00 20 00 00 09 52 34 d8 40 00 14d+14:36:21.556  READ FPDMA QUEUED
  60 01 00 00 18 00 00 09 52 33 d8 40 00 14d+14:36:21.556  READ FPDMA QUEUED

Error 22 [21] occurred at disk power-on lifetime: 10986 hours (457 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 09 50 29 c0 40 00  Error: UNC at LBA = 0x095029c0 = 156248512

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 10 00 58 00 01 d1 c0 bc 90 40 00 14d+14:36:11.237  READ FPDMA QUEUED
  60 00 10 00 50 00 01 d1 c0 ba 90 40 00 14d+14:36:11.237  READ FPDMA QUEUED
  60 00 10 00 48 00 00 00 40 02 90 40 00 14d+14:36:11.237  READ FPDMA QUEUED
  60 00 80 00 40 00 00 09 50 2f 18 40 00 14d+14:36:11.237  READ FPDMA QUEUED
  60 01 00 00 38 00 00 09 50 2e 18 40 00 14d+14:36:11.237  READ FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline	Completed: read failure	   10%	 11019		 156238792
# 2  Short offline	   Completed without error	   00%	 10994		 -
# 3  Short offline	   Completed without error	   00%	 10930		 -
# 4  Extended offline	Completed without error	   00%	 10843		 -
# 5  Short offline	   Completed without error	   00%	 10763		 -
# 6  Extended offline	Completed without error	   00%	 10759		 -
# 7  Extended offline	Completed without error	   00%	 10714		 -
# 8  Extended offline	Completed: read failure	   90%	 10565		 32332392
# 9  Short offline	   Completed without error	   00%	 10495		 -
#10  Short offline	   Completed without error	   00%	 10279		 -
#11  Extended offline	Completed without error	   00%	 10191		 -
#12  Short offline	   Completed without error	   00%	 10111		 -
#13  Short offline	   Completed without error	   00%	  9943		 -
#14  Extended offline	Completed without error	   00%	  9856		 -
#15  Short offline	   Completed without error	   00%	  9775		 -
#16  Short offline	   Completed without error	   00%	  9534		 -
#17  Extended offline	Completed without error	   00%	  9447		 -
#18  Short offline	   Completed without error	   00%	  9221		 -
1 of 2 failed self-tests are outdated by newer successful extended offline self-test # 4

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
   1		0		0  Not_testing
   2		0		0  Not_testing
   3		0		0  Not_testing
   4		0		0  Not_testing
   5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:				  3
SCT Version (vendor specific):	   258 (0x0102)
SCT Support Level:				   1
Device State:						Active (0)
Current Temperature:					32 Celsius
Power Cycle Min/Max Temperature:	 26/35 Celsius
Lifetime	Min/Max Temperature:	 20/47 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:	 2
Temperature Sampling Period:		 1 minute
Temperature Logging Interval:		1 minute
Min/Max recommended Temperature:	  0/60 Celsius
Min/Max Temperature Limit:		   -41/85 Celsius
Temperature History Size (Index):	478 (406)

Index	Estimated Time   Temperature Celsius
 407	2017-12-30 07:41	32  *************
 ...	..(329 skipped).	..  *************
 259	2017-12-30 13:11	32  *************
 260	2017-12-30 13:12	33  **************
 ...	..( 63 skipped).	..  **************
 324	2017-12-30 14:16	33  **************
 325	2017-12-30 14:17	32  *************
 ...	..( 80 skipped).	..  *************
 406	2017-12-30 15:38	32  *************

SCT Error Recovery Control:
		  Read:	 70 (7.0 seconds)
		 Write:	 70 (7.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID	  Size	 Value  Description
0x0001  2			0  Command failed due to ICRC error
0x0002  2			0  R_ERR response for data FIS
0x0003  2			0  R_ERR response for device-to-host data FIS
0x0004  2			0  R_ERR response for host-to-device data FIS
0x0005  2			0  R_ERR response for non-data FIS
0x0006  2			0  R_ERR response for device-to-host non-data FIS
0x0007  2			0  R_ERR response for host-to-device non-data FIS
0x0008  2			0  Device-to-host non-data FIS retries
0x0009  2			4  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2			5  Device-to-host register FISes sent due to a COMRESET
0x000b  2			0  CRC errors within host-to-device FIS
0x000f  2			0  R_ERR response for host-to-device data FIS, CRC
0x0012  2			0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4	  1451861  Vendor specific

Murac · Dec 30, 2017

Code:

########## SMART status report for da4 drive (Western Digital Red: WD-WCC4E4RXRSAP) ##########
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   57
  3 Spin_Up_Time			0x0027   182   177   021	Pre-fail  Always	   -	   7875
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   85
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   085   085   000	Old_age   Always	   -	   11039
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   85
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   83
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   175
194 Temperature_Celsius	 0x0022   120   106   000	Old_age   Always	   -	   32
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   40

No Errors Logged

smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.1-STABLE amd64] (local build)

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Red
Device Model:	 WDC WD40EFRX-68WT0N0
Serial Number:	WD-WCC4E4RXRSAP
LU WWN Device Id: 5 0014ee 2b7fd20cf
Firmware Version: 82.00A82
User Capacity:	4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5400 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Sat Dec 30 15:38:13 2017 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
				   was never started.
				   Auto Offline Data Collection: Disabled.
Self-test execution status:	  (   0)	The previous self-test routine completed
				   without error or no self-test has ever
				   been run.
Total time to complete Offline
data collection:		 (52560) seconds.
Offline data collection
capabilities:			  (0x7b) SMART execute Offline immediate.
				   Auto Offline data collection on/off support.
				   Suspend Offline collection upon new
				   command.
				   Offline surface scan supported.
				   Self-test supported.
				   Conveyance Self-test supported.
				   Selective Self-test supported.
SMART capabilities:			(0x0003)	Saves SMART data before entering
				   power-saving mode.
				   Supports SMART auto save timer.
Error logging capability:		(0x01)	Error logging supported.
				   General Purpose Logging supported.
Short self-test routine
recommended polling time:	  (   2) minutes.
Extended self-test routine
recommended polling time:	  ( 526) minutes.
Conveyance self-test routine
recommended polling time:	  (   5) minutes.
SCT capabilities:			(0x703d)	SCT Status supported.
				   SCT Error Recovery Control supported.
				   SCT Feature Control supported.
				   SCT Data Table supported.

ID# ATTRIBUTE_NAME		  FLAGS	VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate	 POSR-K   200   200   051	-	57
  3 Spin_Up_Time			POS--K   182   177   021	-	7875
  4 Start_Stop_Count		-O--CK   100   100   000	-	85
  5 Reallocated_Sector_Ct   PO--CK   200   200   140	-	0
  7 Seek_Error_Rate		 -OSR-K   200   200   000	-	0
  9 Power_On_Hours		  -O--CK   085   085   000	-	11039
 10 Spin_Retry_Count		-O--CK   100   253   000	-	0
 11 Calibration_Retry_Count -O--CK   100   253   000	-	0
 12 Power_Cycle_Count	   -O--CK   100   100   000	-	85
192 Power-Off_Retract_Count -O--CK   200   200   000	-	83
193 Load_Cycle_Count		-O--CK   200   200   000	-	175
194 Temperature_Celsius	 -O---K   120   106   000	-	32
196 Reallocated_Event_Count -O--CK   200   200   000	-	0
197 Current_Pending_Sector  -O--CK   200   200   000	-	0
198 Offline_Uncorrectable   ----CK   100   253   000	-	0
199 UDMA_CRC_Error_Count	-O--CK   200   200   000	-	0
200 Multi_Zone_Error_Rate   ---R--   200   200   000	-	40
						   ||||||_ K auto-keep
						   |||||__ C event count
						   ||||___ R error rate
						   |||____ S speed/performance
						   ||_____ O updated online
						   |______ P prefailure warning

General Purpose Log Directory Version 1
SMART		   Log Directory Version 1 [multi-sector log support]
Address	Access  R/W   Size  Description
0x00	   GPL,SL  R/O	  1  Log Directory
0x01		   SL  R/O	  1  Summary SMART error log
0x02		   SL  R/O	  5  Comprehensive SMART error log
0x03	   GPL	 R/O	  6  Ext. Comprehensive SMART error log
0x06		   SL  R/O	  1  SMART self-test log
0x07	   GPL	 R/O	  1  Extended self-test log
0x09		   SL  R/W	  1  Selective self-test log
0x10	   GPL	 R/O	  1  SATA NCQ Queued Error log
0x11	   GPL	 R/O	  1  SATA Phy Event Counters log
0x21	   GPL	 R/O	  1  Write stream error log
0x22	   GPL	 R/O	  1  Read stream error log
0x80-0x9f  GPL,SL  R/W	 16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS	  16  Device vendor specific log
0xa8-0xb6  GPL,SL  VS	   1  Device vendor specific log
0xb7	   GPL,SL  VS	  39  Device vendor specific log
0xbd	   GPL,SL  VS	   1  Device vendor specific log
0xc0	   GPL,SL  VS	   1  Device vendor specific log
0xc1	   GPL	 VS	  93  Device vendor specific log
0xe0	   GPL,SL  R/W	  1  SCT Command/Status
0xe1	   GPL,SL  R/W	  1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 26 (device log contains only the most recent 24 errors)
   CR	 = Command Register
   FEATR  = Features Register
   COUNT  = Count (was: Sector Count) Register
   LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
   LH	 = LBA High (was: Cylinder High) Register	]   LBA
   LM	 = LBA Mid (was: Cylinder Low) Register	  ] Register
   LL	 = LBA Low (was: Sector Number) Register	 ]
   DV	 = Device (was: Device/Head) Register
   DC	 = Device Control Register
   ER	 = Error register
   ST	 = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 26 [1] occurred at disk power-on lifetime: 11026 hours (459 days + 10 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 01 cd 39 18 40 00  Error: UNC at LBA = 0x01cd3918 = 30226712

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 40 00 08 00 00 01 cd 3d 40 40 00 16d+06:28:27.539  READ FPDMA QUEUED
  60 00 40 00 48 00 00 01 cd 3d 80 40 00 16d+06:28:27.525  READ FPDMA QUEUED
  61 00 08 00 40 00 00 0e ef 78 08 40 00 16d+06:28:27.525  WRITE FPDMA QUEUED
  60 00 40 00 00 00 00 01 cd 3d 00 40 00 16d+06:28:27.525  READ FPDMA QUEUED
  60 01 00 00 38 00 00 01 cd 3c 00 40 00 16d+06:28:27.525  READ FPDMA QUEUED

Error 25 [0] occurred at disk power-on lifetime: 11025 hours (459 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 00 c6 08 a0 40 00  Error: UNC at LBA = 0x00c608a0 = 12978336

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 40 00 00 00 00 00 c6 0a f8 40 00 16d+05:59:28.961  READ FPDMA QUEUED
  60 01 00 00 20 00 00 00 c6 09 60 40 00 16d+05:59:28.855  READ FPDMA QUEUED
  60 01 00 00 18 00 00 00 c6 08 60 40 00 16d+05:59:28.854  READ FPDMA QUEUED
  60 01 00 00 00 00 00 00 c6 07 60 40 00 16d+05:59:28.854  READ FPDMA QUEUED
  60 01 00 00 28 00 00 00 c6 06 60 40 00 16d+05:59:28.853  READ FPDMA QUEUED

Error 24 [23] occurred at disk power-on lifetime: 11019 hours (459 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 01 cd 44 e8 40 00  Error: UNC at LBA = 0x01cd44e8 = 30229736

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 c0 00 30 00 00 01 cd 44 40 40 00 15d+23:55:20.996  READ FPDMA QUEUED
  60 01 00 00 28 00 00 01 cd 43 40 40 00 15d+23:55:20.996  READ FPDMA QUEUED
  60 01 00 00 20 00 00 01 cd 42 40 40 00 15d+23:55:20.996  READ FPDMA QUEUED
  60 01 00 00 18 00 00 01 cd 41 40 40 00 15d+23:55:20.995  READ FPDMA QUEUED
  60 01 00 00 10 00 00 01 cd 40 40 40 00 15d+23:55:20.995  READ FPDMA QUEUED

Error 23 [22] occurred at disk power-on lifetime: 11019 hours (459 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 01 cd 36 f0 40 00  Error: WP at LBA = 0x01cd36f0 = 30226160

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 08 00 00 00 00 67 5d 07 78 40 00 15d+23:55:19.491  WRITE FPDMA QUEUED
  60 00 40 00 50 00 00 01 cd 3d 40 40 00 15d+23:55:16.244  READ FPDMA QUEUED
  60 00 40 00 48 00 00 01 cd 3d 80 40 00 15d+23:55:16.244  READ FPDMA QUEUED
  60 00 40 00 40 00 00 01 cd 3d 00 40 00 15d+23:55:16.244  READ FPDMA QUEUED
  60 01 00 00 38 00 00 01 cd 3c 00 40 00 15d+23:55:16.244  READ FPDMA QUEUED

Error 22 [21] occurred at disk power-on lifetime: 11019 hours (459 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 01 cd 32 30 40 00  Error: WP at LBA = 0x01cd3230 = 30224944

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 10 00 18 00 00 65 99 97 60 40 00 15d+23:55:11.875  WRITE FPDMA QUEUED
  60 01 00 00 68 00 00 01 cd 34 40 40 00 15d+23:55:11.875  READ FPDMA QUEUED
  60 01 00 00 10 00 00 01 cd 33 40 40 00 15d+23:55:11.875  READ FPDMA QUEUED
  60 01 00 00 08 00 00 01 cd 32 40 40 00 15d+23:55:11.874  READ FPDMA QUEUED
  60 01 00 00 00 00 00 01 cd 31 40 40 00 15d+23:55:11.874  READ FPDMA QUEUED

Error 21 [20] occurred at disk power-on lifetime: 11019 hours (459 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 01 cd 24 28 40 00  Error: WP at LBA = 0x01cd2428 = 30221352

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 01 00 00 10 00 00 67 5c ae c8 40 00 15d+23:55:04.800  WRITE FPDMA QUEUED
  61 00 88 00 08 00 00 67 5c ae 40 40 00 15d+23:55:04.800  WRITE FPDMA QUEUED
  60 00 10 00 00 00 01 d1 c0 bc 90 40 00 15d+23:55:04.378  READ FPDMA QUEUED
  60 00 10 00 68 00 01 d1 c0 ba 90 40 00 15d+23:55:04.378  READ FPDMA QUEUED
  60 00 10 00 60 00 00 00 40 02 90 40 00 15d+23:55:04.378  READ FPDMA QUEUED

Error 20 [19] occurred at disk power-on lifetime: 11019 hours (459 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 01 cd 22 a0 40 00  Error: WP at LBA = 0x01cd22a0 = 30220960

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 01 00 00 58 00 00 67 5c a5 40 40 00 15d+23:55:00.937  WRITE FPDMA QUEUED
  61 01 00 00 50 00 00 67 5c a4 40 40 00 15d+23:55:00.936  WRITE FPDMA QUEUED
  61 01 00 00 48 00 00 67 5c a3 40 40 00 15d+23:55:00.936  WRITE FPDMA QUEUED
  61 01 00 00 40 00 00 67 5c a2 40 40 00 15d+23:55:00.935  WRITE FPDMA QUEUED
  61 01 00 00 38 00 00 67 5c a1 40 40 00 15d+23:55:00.935  WRITE FPDMA QUEUED

Error 19 [18] occurred at disk power-on lifetime: 11019 hours (459 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 01 cd 1a e8 40 00  Error: WP at LBA = 0x01cd1ae8 = 30218984

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 08 00 00 00 00 67 5c 83 30 40 00 15d+23:54:59.268  WRITE FPDMA QUEUED
  60 00 40 00 30 00 00 01 cd 1f 40 40 00 15d+23:54:56.021  READ FPDMA QUEUED
  60 00 40 00 28 00 00 01 cd 20 40 40 00 15d+23:54:56.021  READ FPDMA QUEUED
  60 01 00 00 20 00 00 01 cd 1d 80 40 00 15d+23:54:56.021  READ FPDMA QUEUED
  60 01 00 00 18 00 00 01 cd 1c 80 40 00 15d+23:54:56.021  READ FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline	Completed without error	   00%	 11014		 -
# 2  Short offline	   Completed without error	   00%	 10994		 -
# 3  Short offline	   Completed without error	   00%	 10930		 -
# 4  Extended offline	Completed without error	   00%	 10843		 -
# 5  Short offline	   Completed without error	   00%	 10763		 -
# 6  Extended offline	Completed without error	   00%	 10575		 -
# 7  Short offline	   Completed without error	   00%	 10495		 -
# 8  Short offline	   Completed without error	   00%	 10279		 -
# 9  Extended offline	Completed without error	   00%	 10191		 -
#10  Short offline	   Completed without error	   00%	 10111		 -
#11  Short offline	   Completed without error	   00%	  9943		 -
#12  Extended offline	Completed without error	   00%	  9856		 -
#13  Short offline	   Completed without error	   00%	  9775		 -
#14  Short offline	   Completed without error	   00%	  9534		 -
#15  Extended offline	Completed without error	   00%	  9447		 -
#16  Short offline	   Completed without error	   00%	  9221		 -
#17  Extended offline	Completed without error	   00%	  9134		 -
#18  Short offline	   Completed without error	   00%	  9053		 -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
   1		0		0  Not_testing
   2		0		0  Not_testing
   3		0		0  Not_testing
   4		0		0  Not_testing
   5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:				  3
SCT Version (vendor specific):	   258 (0x0102)
SCT Support Level:				   1
Device State:						Active (0)
Current Temperature:					32 Celsius
Power Cycle Min/Max Temperature:	 26/34 Celsius
Lifetime	Min/Max Temperature:	  3/46 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:	 2
Temperature Sampling Period:		 1 minute
Temperature Logging Interval:		1 minute
Min/Max recommended Temperature:	  0/60 Celsius
Min/Max Temperature Limit:		   -41/85 Celsius
Temperature History Size (Index):	478 (407)

Index	Estimated Time   Temperature Celsius
 408	2017-12-30 07:41	32  *************
 ...	..( 20 skipped).	..  *************
 429	2017-12-30 08:02	32  *************
 430	2017-12-30 08:03	31  ************
 ...	..( 22 skipped).	..  ************
 453	2017-12-30 08:26	31  ************
 454	2017-12-30 08:27	32  *************
 ...	..(398 skipped).	..  *************
 375	2017-12-30 15:06	32  *************
 376	2017-12-30 15:07	31  ************
 ...	..(  4 skipped).	..  ************
 381	2017-12-30 15:12	31  ************
 382	2017-12-30 15:13	32  *************
 ...	..( 24 skipped).	..  *************
 407	2017-12-30 15:38	32  *************

SCT Error Recovery Control:
		  Read:	 70 (7.0 seconds)
		 Write:	 70 (7.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID	  Size	 Value  Description
0x0001  2			0  Command failed due to ICRC error
0x0002  2			0  R_ERR response for data FIS
0x0003  2			0  R_ERR response for device-to-host data FIS
0x0004  2			0  R_ERR response for host-to-device data FIS
0x0005  2			0  R_ERR response for non-data FIS
0x0006  2			0  R_ERR response for device-to-host non-data FIS
0x0007  2			0  R_ERR response for host-to-device non-data FIS
0x0008  2			0  Device-to-host non-data FIS retries
0x0009  2			4  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2			5  Device-to-host register FISes sent due to a COMRESET
0x000b  2			0  CRC errors within host-to-device FIS
0x000f  2			0  R_ERR response for host-to-device data FIS, CRC
0x0012  2			0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4	  1451856  Vendor specific

Important Announcement for the TrueNAS Community.

Help to confirm my drive's gone bad, please

Murac

Dabbler

Jailer

Not strong, but bad

Murac

Dabbler

Jailer

Not strong, but bad

Murac

Dabbler

rs225

Guru

Johnnie Black

Guru

tvsjr

Guru

Ericloewe

Server Wrangler

Murac

Dabbler

Murac

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Help to confirm my drive's gone bad, please

Dabbler

Not strong, but bad

Dabbler

Not strong, but bad

Dabbler

Guru

Guru

Guru

Server Wrangler

Dabbler

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Help to confirm my drive's gone bad, please"

Similar threads