SOLVED Failed drive not showing in GUI. Unable to view disk.

Status
Not open for further replies.

sitelux

Cadet
Joined
Dec 3, 2017
Messages
9
Hi, I have a system where I used 4x 2TB Seagate Green disks. I have replaced 2 of the disks earlier, one with a similar one and one with a 4 TB WD Red. I've bought new WD 4 TB Red disks and were going to replace the rest, but haven't got there yet.

Anyhow, yesterday I was tranfering a 20 GB file to the NAS and it stopped responding. I could ping it and log in via SSH, but the prompt was very delayed. Managed to reboot it after a while but it never came back up. Managed to hook up a monitor and a keyboard and it was stuck in bootup (BIOS) because there is a SMART error on /ada1.

Managed to buypass it and boot the system and run a short SMART test which confirms the scenario. The disk is failing the SMART test and all console output on the FREENAS shows CAM Status: ATA Status error, followed by retrying command and finally a error 5, error exhausted.

The zpool status shows the following:


[root@freenas] ~# zpool status -v
pool: Storage
state: ONLINE
status: One or more devices are configured to use a non-native block size.
Expect reduced performance.
action: Replace affected devices with devices that support the
configured block size, or migrate data to a properly configured
pool.
scan: scrub repaired 63K in 8h13m with 0 errors on Mon Sep 24 11:14:44 2018
config:

NAME STATE READ WRITE CKSUM
Storage ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/18b34e2d-f5b2-11e0-ac5f-f46d0473b90e ONLINE 0 0 0 block size: 512B configured, 4096B native
gptid/193498d0-f5b2-11e0-ac5f-f46d0473b90e ONLINE 0 0 0 block size: 512B configured, 4096B native
gptid/f2e933b4-f5e2-11e6-85ed-f46d0473b90e ONLINE 0 0 0 block size: 512B configured, 4096B native
gptid/a1aee53b-7f64-11e6-9493-f46d0473b90e ONLINE 0 0 0 block size: 512B configured, 4096B native

errors: No known data errors



So it doesn't seems like the NAS has "got the bad news" yet. Also, when I enter the web page I'm unable to view disks, it leaves the field blank. How am I suppose to set the disk in offline status? I have a new disk to replace it with. I run v. 9.2.1.7

freenas.png


The short SMART test on ada1 shows this:
Code:

[root@freenas] ~# smartctl -a /dev/ada1
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p10 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Seagate Barracuda Green (AF)
Device Model:	 ST2000DL003-9VT166
Serial Number:	5YD5G3JZ
LU WWN Device Id: 5 000c50 03e7725b9
Firmware Version: CC32
User Capacity:	2,000,398,934,016 bytes [2.00 TB]
Sector Size:	  512 bytes logical/physical
Rotation Rate:	5900 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Sun Sep 30 13:57:24 2018 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
										was completed without error.
										Auto Offline Data Collection: Enabled.
Self-test execution status:	  (  73) The previous self-test completed having
										a test element that failed and the test
										element that failed is not known.
Total time to complete Offline
data collection:				(  623) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   1) minutes.
Extended self-test routine
recommended polling time:		( 334) minutes.
Conveyance self-test routine
recommended polling time:		(   2) minutes.
SCT capabilities:			  (0x30b7) SCT Status supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x000f   072   059   006	Pre-fail  Always	   -	   108592
  3 Spin_Up_Time			0x0003   092   092   000	Pre-fail  Always	   -	   0
  4 Start_Stop_Count		0x0032   100   100   020	Old_age   Always	   -	   66
  5 Reallocated_Sector_Ct   0x0033   002   002   036	Pre-fail  Always   FAILING_NOW 64496
  7 Seek_Error_Rate		 0x000f   050   049   030	Pre-fail  Always	   -	   7194241396767
  9 Power_On_Hours		  0x0032   050   012   000	Old_age   Always	   -	   44642
 10 Spin_Retry_Count		0x0013   100   100   097	Pre-fail  Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   020	Old_age   Always	   -	   57
183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -	   0
184 End-to-End_Error		0x0032   100   100   099	Old_age   Always	   -	   0
187 Reported_Uncorrect	  0x0032   001   001   000	Old_age   Always	   -	   17973
188 Command_Timeout		 0x0032   100   086   000	Old_age   Always	   -	   270587068495
189 High_Fly_Writes		 0x003a   100   100   000	Old_age   Always	   -	   0
190 Airflow_Temperature_Cel 0x0022   059   048   045	Old_age   Always	   -	   41 (Min/Max 24/41)
191 G-Sense_Error_Rate	  0x0032   100   100   000	Old_age   Always	   -	   0
192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   18
193 Load_Cycle_Count		0x0032   100   100   000	Old_age   Always	   -	   1259
194 Temperature_Celsius	 0x0022   041   052   000	Old_age   Always	   -	   41 (0 22 0 0 0)
195 Hardware_ECC_Recovered  0x001a   036   003   000	Old_age   Always	   -	   108592
197 Current_Pending_Sector  0x0012   097   097   000	Old_age   Always	   -	   312
198 Offline_Uncorrectable   0x0010   097   097   000	Old_age   Offline	  -	   312
199 UDMA_CRC_Error_Count	0x003e   200   200   000	Old_age   Always	   -	   0
240 Head_Flying_Hours	   0x0000   100   253   000	Old_age   Offline	  -	   276501404616774
241 Total_LBAs_Written	  0x0000   100   253   000	Old_age   Offline	  -	   2836505282
242 Total_LBAs_Read		 0x0000   100   253   000	Old_age   Offline	  -	   1246484989

SMART Error Log Version: 1
ATA Error Count: 546 (device log contains only the most recent five errors)
		CR = Command Register [HEX]
		FR = Features Register [HEX]
		SC = Sector Count Register [HEX]
		SN = Sector Number Register [HEX]
		CL = Cylinder Low Register [HEX]
		CH = Cylinder High Register [HEX]
		DH = Device/Head Register [HEX]
		DC = Device Command Register [HEX]
		ER = Error register [HEX]
		ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 546 occurred at disk power-on lifetime: 44642 hours (1860 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 55 ff ff ff 4f 00	  00:55:26.634  READ FPDMA QUEUED
  60 00 ab ff ff ff 4f 00	  00:55:26.634  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00	  00:55:26.591  READ LOG EXT
  60 00 55 ff ff ff 4f 00	  00:55:14.250  READ FPDMA QUEUED
  60 00 ab ff ff ff 4f 00	  00:55:14.250  READ FPDMA QUEUED

Error 545 occurred at disk power-on lifetime: 44642 hours (1860 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 55 ff ff ff 4f 00	  00:55:14.250  READ FPDMA QUEUED
  60 00 ab ff ff ff 4f 00	  00:55:14.250  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00	  00:55:14.207  READ LOG EXT
  60 00 55 ff ff ff 4f 00	  00:55:06.109  READ FPDMA QUEUED
  60 00 ab ff ff ff 4f 00	  00:55:06.108  READ FPDMA QUEUED

Error 544 occurred at disk power-on lifetime: 44642 hours (1860 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 55 ff ff ff 4f 00	  00:55:06.109  READ FPDMA QUEUED
  60 00 ab ff ff ff 4f 00	  00:55:06.108  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00	  00:55:06.066  READ LOG EXT
  60 00 55 ff ff ff 4f 00	  00:55:01.927  READ FPDMA QUEUED
  60 00 ab ff ff ff 4f 00	  00:55:01.927  READ FPDMA QUEUED

Error 543 occurred at disk power-on lifetime: 44642 hours (1860 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 55 ff ff ff 4f 00	  00:55:01.927  READ FPDMA QUEUED
  60 00 ab ff ff ff 4f 00	  00:55:01.927  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00	  00:55:01.854  READ LOG EXT
  60 00 55 ff ff ff 4f 00	  00:54:55.927  READ FPDMA QUEUED
  60 00 ab ff ff ff 4f 00	  00:54:55.927  READ FPDMA QUEUED

Error 542 occurred at disk power-on lifetime: 44642 hours (1860 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 55 ff ff ff 4f 00	  00:54:55.927  READ FPDMA QUEUED
  60 00 ab ff ff ff 4f 00	  00:54:55.927  READ FPDMA QUEUED
  60 00 10 ff ff ff 4f 00	  00:54:55.927  READ FPDMA QUEUED
  60 00 10 ff ff ff 4f 00	  00:54:55.927  READ FPDMA QUEUED
  60 00 10 90 02 40 40 00	  00:54:55.927  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed: unknown failure	90%	 44641		 0

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

jlpellet

Patron
Joined
Mar 21, 2012
Messages
287
I'd try shutting down the system & removing power & data cables from ada1 then boot & see if the system finds the pool with a missing disk. It may be that the dying disk is hanging the system. Good luck.
 

sitelux

Cadet
Joined
Dec 3, 2017
Messages
9
Hi,
So I've detached the broken disk and booted. The system is now degraded, but I still have no way to put the disk offline. The view disk part of the web page is still empty as shown above.

Can I do this via CLI or is there a way to fix the GUI?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

sitelux

Cadet
Joined
Dec 3, 2017
Messages
9
Thanks for the reply, I know I should upgrade, but I run an old AMD system and the upgrade is not recommended. Then I need new HW.

Anyhow, the view volumes is not the place in this version.
Is there a guide on how to replace disks in Command Line Interface or a way to fix the web GUI?

viewvol.png
viewvol_.png
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Well, yes, you can always replace at the CLI using zpool replace. See the instructions in this resource to properly partition the new disk and find its gptid to add to the pool.

You should not, under any circumstances, touch the volume manager in the process of replacing a failed/failing disk.
 

sitelux

Cadet
Joined
Dec 3, 2017
Messages
9
Hi, and thank you for the help.
I was able to offline the disk using the guide, but found it rather cumbersome to include the new disk in the pool. Hopefully a more detailed guide can be made as I guess I'm not the only one who could need it.

Concerning the Web GUI issues. I found that using Firefox solved the problem. I was unable to view several tabs from the GUI using both Chrome and Opera, but Firefox displayed all. I don't know if this is fixed in later releases, but for people like me who run an older version it's worth mentioning in case they have the same problem.

Again, thanks for supporting me!
 
Status
Not open for further replies.
Top