Do Drive / Drives Fail ?! help!

Status
Not open for further replies.

Itay1778

Patron
Joined
Jan 29, 2018
Messages
269
Hi, everyone
I have a problem that repeats itself in the last few days I get an email from FreeNAS from my system that there is a problem with the drives but I look there are no alerts at all The circle in front of her says "OK" and the second looks at the list of storage He writes that he is "HEALTHY" and it's strange because I get these emails Every day and if I have drive or drives fail I need to know.

This is the email:
Code:
Checking status of gmirror (8) devices:
		Name Status Components
mirror / swap0 DEGRADED ada3p1 (ACTIVE)
mirror / swap1 COMPLETE ada2p1 (ACTIVE)
						ada1p1 (ACTIVE)

- End of daily output -

And I've noticed something odd here and it registers ada3p1 to say or ada2p1
But in the storage list everything is ada3p2 or ada2p2 hope you understand what i mean by everything in this email p1 and in this storage list p2

I also did smartctl -t long for all drives and everything looks good (or I do not know the reader as well as I thought)

Anyway it is that of ada3 if you want other drives just say and i will add
Code:
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Caviar Blue (SATA)
Device Model:	 WDC WD2500AAKS-00VSA0
Serial Number:	WD-WMART0780853
LU WWN Device Id: 5 0014ee 055da8d21
Firmware Version: 01.01B01
User Capacity:	250,058,268,160 bytes [250 GB]
Sector Size:	  512 bytes logical/physical
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
Local Time is:	Mon Jul 23 23:43:34 2018 IDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
										was suspended by an interrupting command from host.
										Auto Offline Data Collection: Enabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever
										been run.
Total time to complete Offline
data collection:				( 4980) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		(  61) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x303f) SCT Status supported.
										SCT Error Recovery Control supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x000f   200   200   051	Pre-fail  Always	   -	   129
  3 Spin_Up_Time			0x0003   177   176   021	Pre-fail  Always	   -	   2133
  4 Start_Stop_Count		0x0032   099   099   000	Old_age   Always	   -	   1033
  5 Reallocated_Sector_Ct   0x0033   194   194   140	Pre-fail  Always	   -	   43
  7 Seek_Error_Rate		 0x000e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   040   040   000	Old_age   Always	   -	   44261
 10 Spin_Retry_Count		0x0012   100   100   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0012   100   100   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   560
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   145
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   560
194 Temperature_Celsius	 0x0022   105   085   000	Old_age   Always	   -	   38
196 Reallocated_Event_Count 0x0032   198   198   000	Old_age   Always	   -	   2
197 Current_Pending_Sector  0x0012   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0010   200   200   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x003e   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline	Completed without error	   00%	 44138		 -
# 2  Short offline	   Completed without error	   00%	 44050		 -
# 3  Extended offline	Completed without error	   00%	 44048		 -
# 4  Short offline	   Completed without error	   00%	 43438		 -
# 5  Extended offline	Completed without error	   00%	 43437		 -
# 6  Short offline	   Completed without error	   00%	 42695		 -
# 7  Extended offline	Completed without error	   00%	 42694		 -
# 8  Short offline	   Completed without error	   00%	 41983		 -
# 9  Extended offline	Completed without error	   00%	 41982		 -
#10  Short offline	   Completed without error	   00%	 41265		 -
#11  Extended offline	Completed without error	   00%	 41264		 -
#12  Extended offline	Completed without error	   00%	 41087		 -
#13  Conveyance offline  Completed: handling damage??  90%	 41073		 -
#14  Short offline	   Completed without error	   00%	 41072		 -
#15  Extended offline	Completed without error	   00%	 41070		 -
#16  Extended offline	Aborted by host			   90%	 41069		 -
#17  Conveyance offline  Completed: handling damage??  90%	 41002		 -
#18  Short offline	   Completed without error	   00%	 41000		 -
#19  Conveyance offline  Completed: handling damage??  90%	 40907		 -
#20  Short offline	   Completed without error	   00%	 40906		 -
#21  Conveyance offline  Completed: handling damage??  90%	 40836		 -
4 of 4 failed self-tests are outdated by newer successful extended offline self-test # 1

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

If I need to add information or other things just say. Although I have backup I can not afford now to lose drives!

Thanks
Itay
 
Last edited by a moderator:

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
Post results for:

Code:
zpool list


and

Code:
zpool status


please.
 

Itay1778

Patron
Joined
Jan 29, 2018
Messages
269

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
From the GUI, click on shell, and type the commands.

upload_2018-7-23_17-49-17.png
 

Itay1778

Patron
Joined
Jan 29, 2018
Messages
269
Post results for:

Code:
zpool list


and

Code:
zpool status


please.

Here is the zpool list

Code:
NAME		   SIZE  ALLOC   FREE  EXPANDSZ   FRAG	CAP  DEDUP  HEALTH  ALTROOT
NAS			920G   610G   310G		 -	10%	66%  1.00x  ONLINE  /mnt
freenas-boot   149G  4.32G   145G		 -	  -	 2%  1.00x  ONLINE  -



Here is the
zpool status

Code:
pool: NAS
 state: ONLINE
  scan: resilvered 1.82G in 0 days 00:00:56 with 0 errors on Mon Jul 16 21:11:38 2018
config:

		NAME											STATE	 READ WRITE CKSUM
		NAS											 ONLINE	   0	 0	 0
		  raidz1-0									  ONLINE	   0	 0	 0
			gptid/69e4db41-082d-11e8-9445-e069952bbf5c  ONLINE	   0	 0	 0
			gptid/6b07b177-082d-11e8-9445-e069952bbf5c  ONLINE	   0	 0	 0
			gptid/6c0f35c9-082d-11e8-9445-e069952bbf5c  ONLINE	   0	 0	 0
			ada4p2									  ONLINE	   0	 0	 0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:01:26 with 0 errors on Sat Jul 21 03:46:26 2018
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  ada0p2	ONLINE	   0	 0	 0

errors: No known data errors
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Hi, everyone
I have a problem that repeats itself in the last few days I get an email from FreeNAS from my system that there is a problem with the drives but I look there are no alerts at all The circle in front of her says "OK" and the second looks at the list of storage He writes that he is "HEALTHY" and it's strange because I get these emails Every day and if I have drive or drives fail I need to know.

This is the email:

Checking status of gmirror (8) devices:
Name Status Components
mirror / swap0 DEGRADED ada3p1 (ACTIVE)
mirror / swap1 COMPLETE ada2p1 (ACTIVE)
ada1p1 (ACTIVE)
...
Please give us the version of FreeNAS.

The above message indicates something that is NOT a ZFS pool. It appears to be the swap partition on the data disks. And since it's a "gmirrror", I need the version of FreeNAS. Older versions of FreeNAS did not support mirrored swap. Only striped.
 

Itay1778

Patron
Joined
Jan 29, 2018
Messages
269
Please give us the version of FreeNAS.

The above message indicates something that is NOT a ZFS pool. It appears to be the swap partition on the data disks. And since it's a "gmirrror", I need the version of FreeNAS. Older versions of FreeNAS did not support mirrored swap. Only striped.

My version of FreeNAS is 11.1 U5
As I set up my RAID (yes I know it is not accurate to call it RAID but ...) If I remember correctly I used RAID 1 that I could lose one drive without losing files
 

Itay1778

Patron
Joined
Jan 29, 2018
Messages
269
Hey anyone know how to solve this? Or what is it? How bad is that?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
It looks like your data pool is healthy.

And the thing that failed is a Gmirrored swap. I don't know how to deal with that. Sorry. But it has not yet failed. Just half the mirror is having trouble.
 
Status
Not open for further replies.
Top