Hard drive smart error

virusbcn · Nov 14, 2018

I think one of my 4 WD Red 2Tb disks are gone.... i have a ZFS1 raid, and i have an alert from freenas and i make a long smart check and this is the output:

Code:

root@freenas:~ # smartctl -a /dev/ada3
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Red
Device Model:	 WDC WD20EFRX-68EUZN0
Serial Number:	WD-WCC4M5RA5R3X
LU WWN Device Id: 5 0014ee 20d3e6357
Firmware Version: 82.00A82
User Capacity:	2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5400 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:	Tue Nov 13 20:54:38 2018 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
										was never started.
										Auto Offline Data Collection: Disabled.
Self-test execution status:	  ( 121) The previous self-test completed having
										the read element of the test failed.
Total time to complete Offline
data collection:				(27720) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		( 280) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x703d) SCT Status supported.
										SCT Error Recovery Control supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   328
  3 Spin_Up_Time			0x0027   174   174   021	Pre-fail  Always	   -	   4300
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   15
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   071   071   000	Old_age   Always	   -	   21321
 10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   15
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   13
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   684
194 Temperature_Celsius	 0x0022   123   114   000	Old_age   Always	   -	   24
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   5
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

SMART Error Log Version: 1
ATA Error Count: 19 (device log contains only the most recent five errors)
		CR = Command Register [HEX]
		FR = Features Register [HEX]
		SC = Sector Count Register [HEX]
		SN = Sector Number Register [HEX]
		CL = Cylinder Low Register [HEX]
		CH = Cylinder High Register [HEX]
		DH = Device/Head Register [HEX]
		DC = Device Command Register [HEX]
		ER = Error register [HEX]
		ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 19 occurred at disk power-on lifetime: 21088 hours (878 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 f0 00 01 44  Error: UNC at LBA = 0x040100f0 = 67174640

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 e0 00 01 44 08   5d+06:50:54.887  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:51.490  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:48.094  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:44.720  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:41.345  READ DMA

Error 18 occurred at disk power-on lifetime: 21088 hours (878 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 f0 00 01 44  Error: UNC at LBA = 0x040100f0 = 67174640

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 e0 00 01 44 08   5d+06:50:51.490  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:48.094  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:44.720  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:41.345  READ DMA

Error 17 occurred at disk power-on lifetime: 21088 hours (878 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 f0 00 01 44  Error: UNC at LBA = 0x040100f0 = 67174640

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 e0 00 01 44 08   5d+06:50:48.094  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:44.720  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:41.345  READ DMA

Error 16 occurred at disk power-on lifetime: 21088 hours (878 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 f0 00 01 44  Error: UNC at LBA = 0x040100f0 = 67174640

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 e0 00 01 44 08   5d+06:50:44.720  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:41.345  READ DMA

Error 15 occurred at disk power-on lifetime: 21088 hours (878 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 f0 00 01 44  Error: UNC at LBA = 0x040100f0 = 67174640

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 e0 00 01 44 08   5d+06:50:41.345  READ DMA

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline	Completed: read failure	   90%	 21286		 23153333
# 2  Short offline	   Aborted by host			   90%	   117		 -
# 3  Short offline	   Completed without error	   00%	   101		 -
# 4  Extended offline	Completed without error	   00%	   101		 -
# 5  Short offline	   Completed without error	   00%		91		 -
# 6  Extended offline	Completed without error	   00%		91		 -
# 7  Short offline	   Completed without error	   00%		81		 -
# 8  Extended offline	Completed without error	   00%		81		 -
# 9  Short offline	   Completed without error	   00%		70		 -
#10  Extended offline	Completed without error	   00%		70		 -
#11  Short offline	   Completed without error	   00%		60		 -
#12  Extended offline	Completed without error	   00%		60		 -
#13  Short offline	   Completed without error	   00%		50		 -
#14  Extended offline	Completed without error	   00%		50		 -
#15  Short offline	   Completed without error	   00%		40		 -
#16  Extended offline	Completed without error	   00%		40		 -
#17  Short offline	   Completed without error	   00%		30		 -
#18  Extended offline	Completed without error	   00%		30		 -
#19  Short offline	   Completed without error	   00%		20		 -
#20  Extended offline	Completed without error	   00%		20		 -
#21  Short offline	   Completed without error	   00%		 5		 -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@freenas:~ #

Maybe 3 years later my hdd are gone and i'm thinking about change all disks, if I change one at a time and I'm waiting for it to regenerate, i can change 2 Tb disks to 6 or 8 Tb and finally i have a big pool or ....my idea would be to take advantage to change them one to one for bigger ones and thus to have more space without losing the data of the pool, would it be possible ?

Johnnie Black · Nov 14, 2018

virusbcn said:
I think one of my 4 WD Red 2Tb disks are gone....

You are correct.

sretalla · Nov 14, 2018

virusbcn said:
i can change 2 Tb disks to 6 or 8 Tb and finally i have a big pool or ....my idea would be to take advantage to change them one to one for bigger ones and thus to have more space without losing the data of the pool, would it be possible ?

Yes, it works like that... replace all disks (one at a time) in the pool with larger ones and at the end, your pool will grow to the full size of all new disks (of course all of the new disks should be the same size as each other or you'll waste the additional space of the largest one(s))

Make sure you wait for the resilver to finish in each replacement before replacing the next one.

Chris Moore · Nov 14, 2018

virusbcn said:
I think one of my 4 WD Red 2Tb disks are gone.... i have a ZFS1 raid, and i have an alert from freenas and i make a long smart check and this is the output:

When you have a long block of text to post, especially if it is formatted text like this, please use code tags:

That way the text is shown like this:

Code:

root@freenas:~ # smartctl -a /dev/ada3
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 western digital Red
Device Model:	 WDC WD20EFRX-68EUZN0
Serial Number:	WD-WCC4M5RA5R3X
LU WWN Device Id: 5 0014ee 20d3e6357
Firmware Version: 82.00A82
User Capacity:	2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5400 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:	Tue Nov 13 20:54:38 2018 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
										was never started.
										Auto Offline Data Collection: Disabled.
Self-test execution status:	  ( 121) The previous self-test completed having
										the read element of the test failed.
Total time to complete Offline
data collection:				(27720) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		( 280) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x703d) SCT Status supported.
										SCT Error Recovery Control supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   328
  3 Spin_Up_Time			0x0027   174   174   021	Pre-fail  Always	   -	   4300
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   15
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   071   071   000	Old_age   Always	   -	   21321
10 Spin_Retry_Count		0x0032   100   253   000	Old_age   Always	   -	   0
11 Calibration_Retry_Count 0x0032   100   253   000	Old_age   Always	   -	   0
12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   15
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   13
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   684
194 Temperature_Celsius	 0x0022   123   114   000	Old_age   Always	   -	   24
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   5
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

SMART Error Log Version: 1
ATA Error Count: 19 (device log contains only the most recent five errors)
		CR = Command Register [HEX]
		FR = Features Register [HEX]
		SC = Sector Count Register [HEX]
		SN = Sector Number Register [HEX]
		CL = Cylinder Low Register [HEX]
		CH = Cylinder High Register [HEX]
		DH = Device/Head Register [HEX]
		DC = Device Command Register [HEX]
		ER = Error register [HEX]
		ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 19 occurred at disk power-on lifetime: 21088 hours (878 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 f0 00 01 44  Error: UNC at LBA = 0x040100f0 = 67174640

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 e0 00 01 44 08   5d+06:50:54.887  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:51.490  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:48.094  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:44.720  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:41.345  READ DMA

Error 18 occurred at disk power-on lifetime: 21088 hours (878 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 f0 00 01 44  Error: UNC at LBA = 0x040100f0 = 67174640

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 e0 00 01 44 08   5d+06:50:51.490  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:48.094  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:44.720  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:41.345  READ DMA

Error 17 occurred at disk power-on lifetime: 21088 hours (878 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 f0 00 01 44  Error: UNC at LBA = 0x040100f0 = 67174640

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 e0 00 01 44 08   5d+06:50:48.094  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:44.720  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:41.345  READ DMA

Error 16 occurred at disk power-on lifetime: 21088 hours (878 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 f0 00 01 44  Error: UNC at LBA = 0x040100f0 = 67174640

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 e0 00 01 44 08   5d+06:50:44.720  READ DMA
  c8 00 00 e0 00 01 44 08   5d+06:50:41.345  READ DMA

Error 15 occurred at disk power-on lifetime: 21088 hours (878 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 f0 00 01 44  Error: UNC at LBA = 0x040100f0 = 67174640

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 e0 00 01 44 08   5d+06:50:41.345  READ DMA

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline	Completed: read failure	   90%	 21286		 23153333
# 2  Short offline	   Aborted by host			   90%	   117		 -
# 3  Short offline	   Completed without error	   00%	   101		 -
# 4  Extended offline	Completed without error	   00%	   101		 -
# 5  Short offline	   Completed without error	   00%		91		 -
# 6  Extended offline	Completed without error	   00%		91		 -
# 7  Short offline	   Completed without error	   00%		81		 -
# 8  Extended offline	Completed without error	   00%		81		 -
# 9  Short offline	   Completed without error	   00%		70		 -
#10  Extended offline	Completed without error	   00%		70		 -
#11  Short offline	   Completed without error	   00%		60		 -
#12  Extended offline	Completed without error	   00%		60		 -
#13  Short offline	   Completed without error	   00%		50		 -
#14  Extended offline	Completed without error	   00%		50		 -
#15  Short offline	   Completed without error	   00%		40		 -
#16  Extended offline	Completed without error	   00%		40		 -
#17  Short offline	   Completed without error	   00%		30		 -
#18  Extended offline	Completed without error	   00%		30		 -
#19  Short offline	   Completed without error	   00%		20		 -
#20  Extended offline	Completed without error	   00%		20		 -
#21  Short offline	   Completed without error	   00%		 5		 -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@freenas:~ #

It is much easier to read that way.

Chris Moore · Nov 14, 2018

virusbcn said:
i have a ZFS1 raid

Another problem is that you have RAIDz1, not ZFS1, and you are not supposed to be using RAIDz1 with such large disks. Please review this documentation:

Why not to use RAID-5 or RAIDz1
https://www.zdnet.com/article/why-raid-5-stops-working-in-2009/

Slideshow explaining VDev, zpool, ZIL and L2ARC
https://forums.freenas.org/index.ph...ning-vdev-zpool-zil-and-l2arc-for-noobs.7775/

Terminology and Abbreviations Primer
https://forums.freenas.org/index.php?threads/terminology-and-abbreviations-primer.28174/

virusbcn · Nov 15, 2018

Thanks to all and cyberjock to write this interesting doc about Freenas ;)

After read this i have more questions than i did at first

I have a "little" HP DL160 G6 with Xeon and 16 Gb RAM, and only space for put 4 x 3,5" HDD, i have one disk faulty and its my NAS for backup my VMS and files, i not like it delete all and make it new again with ZFS and i think with only 4 disks maybe its not better file system...

If you were on my case, what would you do?
To go changing disc to disc, to change all, to continue in RaidZ1 and to have a bigger volume at the end.....................................................................................................................................................................................................................
or... copy everything, delete, and create a new pool with ZFS and 4 new disks ?

Sincerely I prefer the first option, maybe from here a time I mount another NAS and then I put ZFS that from what I have seen in the DOC I can go adding discs as needed...

sretalla · Nov 15, 2018

If you have the time and you are OK with the risk of no redundancy while you are doing the resilvering then option one is fine.

The wisdom of the forum is to avoid RAIDZ1 for disks over 2TB, but that assumes that risking data loss is unacceptable (seems maybe not so for you).

virusbcn · Nov 15, 2018

Now i have 4 x 2 Tb WD RED, and i thinking about put 4x6 or 8 Tb WD RED .... ¿? ...

sretalla · Nov 15, 2018

Larger disks in RAIDZ1 means more resilvering time which increases the risk of a second failure during the process (= pool dead).

It's your choice on how to balance the risk.

Bigger disks = higher risk of pool loss during resilvering.

virusbcn · Nov 16, 2018

Finally i buy 4 WD Red 4 Tb to change all disks, one to one, its a same price of 2 Tb
Its the little size that today are in the Market

Enviado desde mi SM-G930F mediante Tapatalk

Chris Moore · Nov 18, 2018

The exposure to risk is what we try to minimize. If this system is a backup, not the primary data, it is not as large a risk factor. In the end, you are the one that must decide how much risk you will accept. I have a RAIDz1 pool of four 6TB drives for a backup. It has worked fine for me but it might be a problem when the drives get old. It takes about eight hours to resilver a drive in that pool with about 11TB of data and 3TB of free space. It is getting pretty full and I guess I will need to do something about that soon.

virusbcn · Nov 19, 2018

Thank you, Chris.
Today I'm going to change the first disk, I understand that it's simple...
Locate the defective disk
Shut down the server
Take out a damaged disc and put in a new one.
Does the new pool automatically include it in the pool, or do I have to add it manually?

sretalla · Nov 19, 2018

You will need to do a replace (can be done from the GUI).

danb35 · Nov 19, 2018

sretalla said:
(can be done from the GUI)

...and, in fact, should be done from the GUI. The manual has complete instructions.

Chris Moore · Nov 19, 2018

virusbcn said:
Does the new pool automatically include it in the pool, or do I have to add it manually?

@danb35 wrote an illustrated guide. Do check that if you have any questions:
https://forums.freenas.org/index.php?resources/replacing-a-failed-failing-disk.75/

virusbcn · Nov 20, 2018

Thank you Chris, its a perfect guide, i view your useful links and they are perfect as a study guide

I summarize what I have done to guide the next:

4x2Tb Wd RED, have a red alert that one disk have many pending sectors, go to -> Storage -> Volume -> select first Volume -> volume status (botton down) -> view disk failed
Turn off NAS, I get the wrong disc and I insert the good one.
Turn on, go to -> Storage -> Volume -> select first Volume -> volume status (botton down) -> view disk failed -> go to bottom down and select replace, select new disk and Freenas make a resilver to the RaidZ1, 4 hours after i have my RAID online and green
And the most important, all the time i have my RAID online and i no lost anything

Its like a RAID card but different menus and options, thank you Freenas ;-)

virusbcn · Nov 20, 2018

I have only one question more...

When i change all the disks 2Tb to 4 Tb... will automatically increase the size of my pool?

Chris Moore · Nov 20, 2018

virusbcn said:
I have only one question more...

When i change all the disks 2Tb to 4 Tb... will automatically increase the size of my pool?

Yes. I have done that a couple of times. First to go from 1TB to 2TB then later I did it again to go from 2TB to 4TB. Works perfectly, as advertised.

virusbcn · Nov 20, 2018

Chris Moore said:
Yes. I have done that a couple of times. First to go from 1TB to 2TB then later I did it again to go from 2TB to 4TB. Works perfectly, as advertised.

This is new for me, i am counting days to change all disks and view this !!!

Important Announcement for the TrueNAS Community.

Hard drive smart error

Explorer

Guru

Powered by Neutrality

Hall of Famer

Hall of Famer

Explorer

Powered by Neutrality

Explorer

Powered by Neutrality

Explorer

Hall of Famer

Explorer

Powered by Neutrality

Hall of Famer

Hall of Famer

Explorer

Explorer

Hall of Famer

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Hard drive smart error"

Similar threads