SOLVED ...error resulting in data corruption. Applications may be affected.

Status
Not open for further replies.

dnilgreb

Contributor
Joined
Mar 29, 2016
Messages
168
I am running a FReeNAS 9.10.2-U6, with 32GB RAM (non-ECC).
So, I got this error:
The volume volume1 (ZFS) state is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected.

So i ran zpool status -v and got this output:

Code:
[root@NAS01] ~# zpool status -v
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h1m with 0 errors on Thu Aug  3 03:46:56 2017
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  da1p2	 ONLINE	   0	 0	 0

errors: No known data errors

  pool: volume1
 state: ONLINE
status: One or more devices has experienced an error resulting in data
		corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
		entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 96K in 6h0m with 3 errors on Sun Aug 27 06:00:11 2017
config:

		NAME											STATE	 READ WRITE CKSUM
		volume1										 ONLINE	   0	 0	 3
		  raidz2-0									  ONLINE	   0	 0	 6
			gptid/5f2b7283-9739-11e4-8b4e-5404a697b96e  ONLINE	   0	 0	 1
			gptid/07895578-cce5-11e4-9dde-5404a697b96e  ONLINE	   0	 0	 1
			gptid/6070d53b-9739-11e4-8b4e-5404a697b96e  ONLINE	   0	 0	 1
			gptid/614a8b8d-9739-11e4-8b4e-5404a697b96e  ONLINE	   0	 0	 1
			gptid/6221dcba-9739-11e4-8b4e-5404a697b96e  ONLINE	   0	 0	 1
			gptid/62fd2bc2-9739-11e4-8b4e-5404a697b96e  ONLINE	   0	 0	 0

errors: Permanent errors have been detected in the following files:

		/mnt/volume1/Backup/Time-Machine/MacBookAir.sparsebundle/bands/16145
		/mnt/volume1/Backup/Time-Machine/MacBookAir.sparsebundle/bands/16815


I should probably delete the reported files, but how can I determine if a drive is failing, and if so,m which one?
 

styno

Patron
Joined
Apr 11, 2016
Messages
466
with 32GB RAM (non-ECC).
Could be the memory or less likely all of the disks. Full memcheck and smart analysis would be a good starting point.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Almost more likely it's a memory error that caused the corruption. If there is any.

I'd suggest rescrubbing after testing your memory. It might go away.
 

dnilgreb

Contributor
Joined
Mar 29, 2016
Messages
168
Thanks for the response. How do I test the memory in the best way? Do I have to create a memtest86 stick, or is there a way to do this inside FreeNAS?
Isn´t there a risk involved in running a scrub if you have corrupt data/disks?
Smart analysis, is that simply running a long smart test, or something different?
 

styno

Patron
Joined
Apr 11, 2016
Messages
466
Yes, create a memtest86 stick and let it do it's work. Scrubs are there to detect and fix corruption within the zfs ecosystem (and thus countering drive errors as well).
It is important to run them only *after* you make sure that the memory is not faulty, this is why ECC is recommended.
Smart analysis would involve scheduled periodic short & long tests as well as investigating smartctl outputs from every drive.

But again, this most likely is caused by a faulty memory stick so check that first.
 

dnilgreb

Contributor
Joined
Mar 29, 2016
Messages
168
OK. But if I run the memtest, I need to run it for a long time, right? Maybe a week or two?
I´ll create a stick tomorrow when I´m at work, but what can I do in the meantime? I am running a long SMART test the first of every month, and a short one every day at 6am. Can the output from that be useful?
I´ll be happy to post output, but what to post?
The output of smartctl -a /dev/ada0 is huge, is there something in particular that´s interesting, or shuold I post the whole thing? Or maybe something different entirely?
 

styno

Patron
Joined
Apr 11, 2016
Messages
466
You can start with smartctl -A /dev/yourdrives
 

dnilgreb

Contributor
Joined
Mar 29, 2016
Messages
168
Here is smartctl -A /dev/ada0. Should I post for all drives?

Code:
[root@NAS01] ~# smartctl -A /dev/ada0
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   174   172   021	Pre-fail  Always	   -	   8275
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   68
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   059   059   000	Old_age   Always	   -	   30186
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   68
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   27
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   1917
194 Temperature_Celsius	 0x0022   108   101   000	Old_age   Always	   -	   44
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Here is smartctl -A /dev/ada0. Should I post for all drives?
Yes, but with the -a in lowercase: smartctl -a /dev/ada0 etc.
 

dnilgreb

Contributor
Joined
Mar 29, 2016
Messages
168
Ok. Here goes:

1:
Code:
[root@NAS01] ~# smartctl -a /dev/ada0
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Red
Device Model:	 WDC WD40EFRX-68WT0N0
Serial Number:	WD-WCC4E0898115
LU WWN Device Id: 5 0014ee 25f0fc553
Firmware Version: 80.00A80
User Capacity:	4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5400 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Sun Aug 27 13:59:48 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
										was never started.
										Auto Offline Data Collection: Disabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever
										been run.
Total time to complete Offline
data collection:				(54300) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		( 543) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x703d) SCT Status supported.
										SCT Error Recovery Control supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   174   172   021	Pre-fail  Always	   -	   8275
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   68
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   059   059   000	Old_age   Always	   -	   30186
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   68
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   27
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   1917
194 Temperature_Celsius	 0x0022   108   101   000	Old_age   Always	   -	   44
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	 24980		 -
# 2  Short offline	   Completed without error	   00%	 24814		 -
# 3  Short offline	   Completed without error	   00%	 24646		 -
# 4  Extended offline	Completed without error	   00%	 24483		 -
# 5  Short offline	   Completed without error	   00%	 24310		 -
# 6  Short offline	   Completed without error	   00%	 24142		 -
# 7  Short offline	   Completed without error	   00%	 23975		 -
# 8  Short offline	   Completed without error	   00%	 23807		 -
# 9  Extended offline	Completed without error	   00%	 23740		 -
#10  Short offline	   Completed without error	   00%	 23639		 -
#11  Short offline	   Completed without error	   00%	 23471		 -
#12  Short offline	   Completed without error	   00%	 23303		 -
#13  Short offline	   Completed without error	   00%	 23136		 -
#14  Extended offline	Completed without error	   00%	 23021		 -
#15  Short offline	   Completed without error	   00%	 22968		 -
#16  Short offline	   Completed without error	   00%	 22799		 -
#17  Short offline	   Completed without error	   00%	 22631		 -
#18  Short offline	   Completed without error	   00%	 22463		 -
#19  Short offline	   Completed without error	   00%	 22295		 -
#20  Extended offline	Completed without error	   00%	 22277		 -
#21  Short offline	   Completed without error	   00%	 22128		 -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


2:
Code:
[root@NAS01] ~# smartctl -a /dev/ada1
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Red
Device Model:	 WDC WD40EFRX-68WT0N0
Serial Number:	WD-WCC4E7TEH493
LU WWN Device Id: 5 0014ee 2611a3018
Firmware Version: 82.00A82
User Capacity:	4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5400 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Sun Aug 27 14:01:17 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
										was never started.
										Auto Offline Data Collection: Disabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever
										been run.
Total time to complete Offline
data collection:				(51540) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		( 515) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x703d) SCT Status supported.
										SCT Error Recovery Control supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   180   177   021	Pre-fail  Always	   -	   8000
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   56
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   072   072   000	Old_age   Always	   -	   21080
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   56
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   26
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   231
194 Temperature_Celsius	 0x0022   106   098   000	Old_age   Always	   -	   46
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   100   253   000	Old_age   Offline	  -	   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


3:
Code:
[root@NAS01] ~# smartctl -a /dev/ada2
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Red
Device Model:	 WDC WD40EFRX-68WT0N0
Serial Number:	WD-WCC4EHRKCHEZ
LU WWN Device Id: 5 0014ee 26054b502
Firmware Version: 82.00A82
User Capacity:	4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5400 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Sun Aug 27 14:01:48 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
										was never started.
										Auto Offline Data Collection: Disabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever
										been run.
Total time to complete Offline
data collection:				(52320) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		( 523) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x703d) SCT Status supported.
										SCT Error Recovery Control supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   177   175   021	Pre-fail  Always	   -	   8133
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   60
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   069   069   000	Old_age   Always	   -	   22782
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   60
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   25
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   248
194 Temperature_Celsius	 0x0022   108   101   000	Old_age   Always	   -	   44
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	 22774		 -
# 2  Short offline	   Completed without error	   00%	 22607		 -
# 3  Short offline	   Completed without error	   00%	 22439		 -
# 4  Short offline	   Completed without error	   00%	 22271		 -
# 5  Extended offline	Completed without error	   00%	 22156		 -
# 6  Short offline	   Completed without error	   00%	 22103		 -
# 7  Short offline	   Completed without error	   00%	 21935		 -
# 8  Short offline	   Completed without error	   00%	 21768		 -
# 9  Short offline	   Completed without error	   00%	 21600		 -
#10  Short offline	   Completed without error	   00%	 21433		 -
#11  Extended offline	Completed without error	   00%	 21414		 -
#12  Short offline	   Completed without error	   00%	 21265		 -
#13  Short offline	   Completed without error	   00%	 21097		 -
#14  Short offline	   Completed without error	   00%	 20929		 -
#15  Short offline	   Completed without error	   00%	 20761		 -
#16  Extended offline	Completed without error	   00%	 20694		 -
#17  Short offline	   Completed without error	   00%	 20593		 -
#18  Short offline	   Completed without error	   00%	 20426		 -
#19  Short offline	   Completed without error	   00%	 20258		 -
#20  Short offline	   Completed without error	   00%	 20090		 -
#21  Extended offline	Completed without error	   00%	 19951		 -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

dnilgreb

Contributor
Joined
Mar 29, 2016
Messages
168
4:
Code:
[root@NAS01] ~# smartctl -a /dev/ada3
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Red
Device Model:	 WDC WD40EFRX-68WT0N0
Serial Number:	WD-WCC4EHRKCCTL
LU WWN Device Id: 5 0014ee 20aff63df
Firmware Version: 82.00A82
User Capacity:	4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5400 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Sun Aug 27 14:02:29 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
										was never started.
										Auto Offline Data Collection: Disabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever
										been run.
Total time to complete Offline
data collection:				(50880) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		( 509) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x703d) SCT Status supported.
										SCT Error Recovery Control supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   179   178   021	Pre-fail  Always	   -	   8050
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   60
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   069   069   000	Old_age   Always	   -	   22783
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   60
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   25
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   246
194 Temperature_Celsius	 0x0022   110   103   000	Old_age   Always	   -	   42
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	 22775		 -
# 2  Short offline	   Completed without error	   00%	 22607		 -
# 3  Short offline	   Completed without error	   00%	 22439		 -
# 4  Short offline	   Completed without error	   00%	 22271		 -
# 5  Extended offline	Completed without error	   00%	 22156		 -
# 6  Short offline	   Completed without error	   00%	 22104		 -
# 7  Short offline	   Completed without error	   00%	 21936		 -
# 8  Short offline	   Completed without error	   00%	 21768		 -
# 9  Short offline	   Completed without error	   00%	 21600		 -
#10  Short offline	   Completed without error	   00%	 21433		 -
#11  Extended offline	Completed without error	   00%	 21414		 -
#12  Short offline	   Completed without error	   00%	 21265		 -
#13  Short offline	   Completed without error	   00%	 21097		 -
#14  Short offline	   Completed without error	   00%	 20930		 -
#15  Short offline	   Completed without error	   00%	 20762		 -
#16  Extended offline	Completed without error	   00%	 20694		 -
#17  Short offline	   Completed without error	   00%	 20594		 -
#18  Short offline	   Completed without error	   00%	 20426		 -
#19  Short offline	   Completed without error	   00%	 20258		 -
#20  Short offline	   Completed without error	   00%	 20090		 -
#21  Extended offline	Completed without error	   00%	 19951		 -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


5:
Code:
[root@NAS01] ~# smartctl -a /dev/ada4
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Red
Device Model:	 WDC WD40EFRX-68WT0N0
Serial Number:	WD-WCC4E7KA75KH
LU WWN Device Id: 5 0014ee 2b5b52c0d
Firmware Version: 82.00A82
User Capacity:	4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5400 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Sun Aug 27 14:03:10 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
										was never started.
										Auto Offline Data Collection: Disabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever
										been run.
Total time to complete Offline
data collection:				(55260) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		( 552) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x703d) SCT Status supported.
										SCT Error Recovery Control supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   175   174   021	Pre-fail  Always	   -	   8208
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   60
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   069   069   000	Old_age   Always	   -	   22783
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   60
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   25
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   246
194 Temperature_Celsius	 0x0022   105   097   000	Old_age   Always	   -	   47
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	 22775		 -
# 2  Short offline	   Completed without error	   00%	 22607		 -
# 3  Short offline	   Completed without error	   00%	 22439		 -
# 4  Short offline	   Completed without error	   00%	 22272		 -
# 5  Extended offline	Completed without error	   00%	 22157		 -
# 6  Short offline	   Completed without error	   00%	 22104		 -
# 7  Short offline	   Completed without error	   00%	 21936		 -
# 8  Short offline	   Completed without error	   00%	 21768		 -
# 9  Short offline	   Completed without error	   00%	 21600		 -
#10  Short offline	   Completed without error	   00%	 21433		 -
#11  Extended offline	Completed without error	   00%	 21415		 -
#12  Short offline	   Completed without error	   00%	 21265		 -
#13  Short offline	   Completed without error	   00%	 21097		 -
#14  Short offline	   Completed without error	   00%	 20930		 -
#15  Short offline	   Completed without error	   00%	 20762		 -
#16  Extended offline	Completed without error	   00%	 20695		 -
#17  Short offline	   Completed without error	   00%	 20594		 -
#18  Short offline	   Completed without error	   00%	 20426		 -
#19  Short offline	   Completed without error	   00%	 20258		 -
#20  Short offline	   Completed without error	   00%	 20091		 -
#21  Extended offline	Completed without error	   00%	 19952		 -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


6:
Code:
[root@NAS01] ~# smartctl -a /dev/ada5
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Red
Device Model:	 WDC WD40EFRX-68WT0N0
Serial Number:	WD-WCC4ECAVN2NJ
LU WWN Device Id: 5 0014ee 20aff73da
Firmware Version: 82.00A82
User Capacity:	4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5400 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Sun Aug 27 14:04:02 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
										was never started.
										Auto Offline Data Collection: Disabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever
										been run.
Total time to complete Offline
data collection:				(51840) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		( 518) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x703d) SCT Status supported.
										SCT Error Recovery Control supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   180   179   021	Pre-fail  Always	   -	   8000
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   59
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   069   069   000	Old_age   Always	   -	   22782
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   59
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   25
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   244
194 Temperature_Celsius	 0x0022   106   098   000	Old_age   Always	   -	   46
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	 17577		 -
# 2  Short offline	   Completed without error	   00%	 17410		 -
# 3  Short offline	   Completed without error	   00%	 17242		 -
# 4  Extended offline	Completed without error	   00%	 17079		 -
# 5  Short offline	   Completed without error	   00%	 16907		 -
# 6  Short offline	   Completed without error	   00%	 16739		 -
# 7  Short offline	   Completed without error	   00%	 16571		 -
# 8  Short offline	   Completed without error	   00%	 16403		 -
# 9  Extended offline	Completed without error	   00%	 16336		 -
#10  Short offline	   Completed without error	   00%	 16236		 -
#11  Short offline	   Completed without error	   00%	 16068		 -
#12  Short offline	   Completed without error	   00%	 15900		 -
#13  Short offline	   Completed without error	   00%	 15732		 -
#14  Extended offline	Completed without error	   00%	 15617		 -
#15  Short offline	   Completed without error	   00%	 15564		 -
#16  Short offline	   Completed without error	   00%	 15395		 -
#17  Short offline	   Completed without error	   00%	 15228		 -
#18  Short offline	   Completed without error	   00%	 15060		 -
#19  Short offline	   Completed without error	   00%	 14892		 -
#20  Extended offline	Completed without error	   00%	 14873		 -
#21  Short offline	   Completed without error	   00%	 14724		 -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
But if I run the memtest, I need to run it for a long time, right? Maybe a week or two?
Overnight should be sufficient to show a problem if it exists.

Your drives look ok other than you don't have smart tests enabled for ada1.

Please provide a complete list of your hardware for proper troubleshooting.
 

dnilgreb

Contributor
Joined
Mar 29, 2016
Messages
168
Interesting that SMART isn´t enabled for ada1. It is in the FreeNAS GUI. How can I enable it, and how can I make sure it is enabled?

On a side note, I talked to a frien of mine today, and he is having the exact same problem I am. As me, he is running FreeNAS 9.10.2-U6 and WD Red disks.
He upgraded to U6 from U5 about a week before I did, and the problem happened to him a week before I had it. Is there ANY chance that the U6 upgrade i causing it?
And if so, where to go? Revert to U5 or onward to 11?

Will get the memtest going asap.
 

dnilgreb

Contributor
Joined
Mar 29, 2016
Messages
168
OK, so I ran memtest. For all of 90 seconds. That gave me 5031 errors. I guess I am getting some new RAM. Is that the solution, or could it be something else? Some other tests to run before running to the store?
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
OK, so I ran memtest. For all of 90 seconds. That gave me 5031 errors. I guess I am getting some new RAM. Is that the solution, or could it be something else? Some other tests to run before running to the store?

You need to trouble shoot your ram a bit more.

Try to work out which slot or dimm is defective.
 

dnilgreb

Contributor
Joined
Mar 29, 2016
Messages
168
Of course. Why didn´t I think of that? I´ll try to get that done tonight.
I am curious about this, though:
Your drives look ok other than you don't have smart tests enabled for ada1.

Since the GUI says it´s enabled for all drives, how can I ensure that I get SMART enabled for all drives?
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Highlight them all and click save again.
 

dnilgreb

Contributor
Joined
Mar 29, 2016
Messages
168
Highlight them all and click save again.
I´m very sorry, but I don´t understand what you mean by this. What do you want me to do?
 

dnilgreb

Contributor
Joined
Mar 29, 2016
Messages
168
Got it!
Will take a look when I have FreNAS up and running again.

I´ve run memtest on all 4 of my sticks individually now, and found the bad one. All three remaining ones are running since last night, and when I left home this morning it had reached 7 hours with 0 errors. So that´s promising.

What is the best/correct procedure once I get a new memory stick? I thought I´ll insert it, and run memtest for a day or so. Then what?
I guess I am deleting the files reported as corrupt. Can I do something to make sure everything is in order before i scrub the volume?
After scrubbing, it should be safe to clear the errors, correct?
 
Status
Not open for further replies.
Top