Help with failed Drive

Status
Not open for further replies.

netman06

Dabbler
Joined
Sep 13, 2012
Messages
21
Hello,

System is running FreeNAS

Build FreeNAS-9.2.1.3-RELEASE-x64 (dc0c46b)

Drive Information below post for your review.


I have been running FreeNAS for many years and have not had this happen to me.

I had a failed drive, was in the process of replacing it and on reboot, I had another drive failed.

So i was able to figure on which drive i needed to replace for the first one.

Now, I this is where i need help.

I have since add another new drive, but when I select it, and click on the replace button, it's Member Disk is blank.

I have googled and searching but have not found any article that seems to fit my situation.

What is the best way to fix this issue, so that I can get my FreeNAS health again.

Please let me know if I need to add any other information and/or command output.

Thanks in advanced for the help.

upload_2016-7-8_8-12-58.png


zpool status -v
pool: Volume1
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: scrub repaired 0 in 7h19m with 0 errors on Thu Jul 7 04:52:27 2016
config:

NAME STATE READ WRITE CKSUM
Volume1 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/d4b98d69-10da-11e6-a911-902b34839325 ONLINE 0 0 0
8901019939338115914 UNAVAIL 0 0 0 was /dev/gptid/ad66532a-1ddb-11e4-a557-902b34839325
gptid/a48da8db-e61d-11e4-b7dc-902b34839325 ONLINE 0 0 0
gptid/7f68673a-bc24-11e3-8749-902b34839325 ONLINE 0 0 0
gptid/ab74893e-0910-11e4-8b0d-902b34839325 ONLINE 0 0 0
gptid/669e839b-431b-11e6-94fe-902b34839325 ONLINE 0 0 0
gptid/8f8afff2-05f3-11e2-afc3-902b34839325 ONLINE 0 0 0
gptid/2cd16ae8-e54d-11e4-b046-902b34839325 ONLINE 0 0 0

errors: No known data errors


camcontrol devlist
<WDC WD30EFRX-68EUZN0 82.00A82> at scbus0 target 0 lun 0 (ada0,pass0)
<WDC WD30EFRX-68EUZN0 82.00A82> at scbus1 target 0 lun 0 (pass1,ada1)
<ST3000DM001-1CH166 CC29> at scbus3 target 0 lun 0 (ada2,pass2)
<ST3000DM001-9YN166 CC4C> at scbus4 target 0 lun 0 (ada3,pass3)
<ST3000DM001-1CH166 CC29> at scbus5 target 0 lun 0 (ada4,pass4)
<WDC WD30EFRX-68EUZN0 82.00A82> at scbus6 target 0 lun 0 (ada5,pass5)
<ST3000DM001-9YN166 CC4C> at scbus7 target 0 lun 0 (ada6,pass6)
<ST3000DM001-1CH166 CC29> at scbus7 target 1 lun 0 (ada7,pass7)
<Generic STORAGE DEVICE 0250> at scbus10 target 0 lun 0 (pass8,da0)


glabel status
Name Status Components
gptid/d4b98d69-10da-11e6-a911-902b34839325 N/A ada0p2
gptid/a48da8db-e61d-11e4-b7dc-902b34839325 N/A ada2p2
gptid/7f68673a-bc24-11e3-8749-902b34839325 N/A ada3p2
gptid/ab74893e-0910-11e4-8b0d-902b34839325 N/A ada4p2
gptid/669e839b-431b-11e6-94fe-902b34839325 N/A ada5p2
gptid/8f8afff2-05f3-11e2-afc3-902b34839325 N/A ada6p2
gptid/2cb70f24-e54d-11e4-b046-902b34839325 N/A ada7p1
gptid/2cd16ae8-e54d-11e4-b046-902b34839325 N/A ada7p2
ufs/FreeNASs3 N/A da0s3
ufs/FreeNASs4 N/A da0s4
ufs/FreeNASs1a N/A da0s1a


gpart show
=> 34 5860533101 ada0 GPT (2.7T)
34 94 - free - (47k)
128 4194304 1 freebsd-swap (2.0G)
4194432 5856338696 2 freebsd-zfs (2.7T)
5860533128 7 - free - (3.5k)

=> 34 5860533101 ada2 GPT (2.7T)
34 94 - free - (47k)
128 4194304 1 freebsd-swap (2.0G)
4194432 5856338696 2 freebsd-zfs (2.7T)
5860533128 7 - free - (3.5k)

=> 34 5860533101 ada3 GPT (2.7T)
34 94 - free - (47k)
128 4194304 1 freebsd-swap (2.0G)
4194432 5856338703 2 freebsd-zfs (2.7T)

=> 34 5860533101 ada4 GPT (2.7T)
34 94 - free - (47k)
128 4194304 1 freebsd-swap (2.0G)
4194432 5856338696 2 freebsd-zfs (2.7T)
5860533128 7 - free - (3.5k)

=> 34 5860533101 ada5 GPT (2.7T)
34 94 - free - (47k)
128 4194304 1 freebsd-swap (2.0G)
4194432 5856338696 2 freebsd-zfs (2.7T)
5860533128 7 - free - (3.5k)

=> 34 5860533101 ada6 GPT (2.7T)
34 94 - free - (47k)
128 4194304 1 freebsd-swap (2.0G)
4194432 5856338703 2 freebsd-zfs (2.7T)

=> 34 5860533101 ada7 GPT (2.7T)
34 94 - free - (47k)
128 4194304 1 freebsd-swap (2.0G)
4194432 5856338696 2 freebsd-zfs (2.7T)
5860533128 7 - free - (3.5k)

=> 63 15568833 da0 MBR (7.4G)
63 1930257 1 freebsd [active] (942M)
1930320 63 - free - (31k)
1930383 1930257 2 freebsd (942M)
3860640 3024 3 freebsd (1.5M)
3863664 41328 4 freebsd (20M)
3904992 11663904 - free - (5.6G)

=> 0 1930257 da0s1 BSD (942M)
0 16 - free - (8.0k)
16 1930241 1 !0 (942M)

Segmentation fault (core dumped)

gpart status
Name Status Components
ada0p1 OK ada0
ada0p2 OK ada0
ada2p1 OK ada2
ada2p2 OK ada2
ada3p1 OK ada3
ada3p2 OK ada3
ada4p1 OK ada4
ada4p2 OK ada4
ada5p1 OK ada5
ada5p2 OK ada5
ada6p1 OK ada6
ada6p2 OK ada6
ada7p1 OK ada7
ada7p2 OK ada7
da0s1 OK da0
da0s2 OK da0
da0s3 OK da0
da0s4 OK da0
da0s1a OK da0s1
ada1p1 N/A ada1
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
8901019939338115914 UNAVAIL 0 0 0 was /dev/gptid/ad66532a-1ddb-11e4-a557-902b34839325
This should be your old failed drive and you need to remove it via the GUI first.

Also, let me make sure I understand you ...
You had a hard drive failure and you replace the hard drive following the user guide and resilvered the new drive. The resilvering completed?

If you cannot get the GUI to remove the old drive entry you can try to remove the entire volume and then auto-import the volume, this should clean up any issues. You can leave in all your drives but I assume you have removed all your failed drives by now.

Hum... two failed drives at the same time, good thing you had a RAIDZ2.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Any particular reason you are running such an old version?
I'll bet because it was an "If it works then don't mess with it" type thing, especially if it's only being used as a NAS. I still have copies of FreeNAS 8 so if I were in a hard spot, I could run a NAS on a system with only 4GB RAM and slower CPU, if all I wanted were a NAS with no special features.
 

netman06

Dabbler
Joined
Sep 13, 2012
Messages
21
This should be your old failed drive and you need to remove it via the GUI first.

Also, let me make sure I understand you ...
You had a hard drive failure and you replace the hard drive following the user guide and resilvered the new drive. The resilvering completed?

If you cannot get the GUI to remove the old drive entry you can try to remove the entire volume and then auto-import the volume, this should clean up any issues. You can leave in all your drives but I assume you have removed all your failed drives by now.

Hum... two failed drives at the same time, good thing you had a RAIDZ2.

Hi joeschmuck,

I have removed the failed drive without issue, then once the resilvering completed, overnight, then the next night, I then removed the other failed drive, but this is were i might have gotten in trouble.

Normally when I have a drive fail, FreeNAS, offlines the drive, so I'm clear to replace drive, then follow the User Guide on drive replacement.

I always have the option to select the new drive as a Member Drive, but this time with the 2nd new drive it did not work like it normally does, it was blank for adding a new Member Drive.

So, I traced all of my drives down, I keep good documentation on Evernote in the cloud on all of my drive and FreeNAS server details.

With all of the information, I did remove the 2nd failed drive, using the GUI. But still had this one listed in the zpool status -v output.

This should be the new drive, 8901019939338115914 UNAVAIL 0 0 0 was /dev/gptid/ad66532a-1ddb-11e4-a557-902b34839325

From the gpart status output, see that I do not have a ada1, this is the drive that I'm trying to replace.

Bottomline, is if now I only have good health drive in the system, how can I add 8901019939338115914 to be the ada1 drive. This will make it back to the way it was before I had both drives failures.

I just know that you always want to replace drives, using the GUI and not the CLI, i guess unless you are a pro on FreeNAS. But in my situation, I'm really stuck.

I've been running this system from many years and have never had an issue like this where I cannot replace a drive.

Now, with the remove the entire volume, and then auto-import, so my data will stay and not get deleted and/or corrupted?

I try again to replace the drive using the GUI and I only had the Replace button, the after clicking it, Member Drive was blank in the drop down box.

upload_2016-7-9_7-21-7.png


upload_2016-7-9_7-21-49.png


So is there any other command to run, that would help us in determine what is going on with this ada1 drive?

Thanks again for your help,
 

netman06

Dabbler
Joined
Sep 13, 2012
Messages
21
Any particular reason you are running such an old version?
Any particular reason you are running such an old version?

Hi Mirfster,

I know that I'm running a older verison, as an IT Guy, I always have to be running the latest and greatest at work, but my personal option is you do not have to always be running the latest.

I have unreplaceable data on my system, some of it goes into the cloud as a backup, but most not as it would take a year to upload it.

So, I like to take it slow, upgrading, but if you know of a bug or feature that is in this version, then yes, I will upgrade.

I've only done it once and was really scared to loss data. What I do not know, is if you upgraded and something happens and you have a configuration backup, can you use this to get up and running with your original drives.

Then maybe I would be more in line with an upgrade, if I knew that I could still use the backup configuration and either a newer version and/or my old version and mostly my drive with all of my data.

Thanks for taking a look at my post and point out this version being old, as I'd like to follow experts on this system advice.

Take Care,
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Now, with the remove the entire volume, and then auto-import, so my data will stay and not get deleted and/or corrupted?
Removing the volume and then Auto-Import is what "basically" (okay it unmounts and then imports) happens when you shut down the system and then power it back up. So you would not loose any data performing this task, but leave all the drives installed of course when you do this. If you do not have a current backup of your configuration, make it. Worse case is the Auto-Import fails to work (which I doubt) but say that happens, then you can just restore your configuration file and the drives will be put back in.

You currently have a RAIDZ2 with 7 drives active, you need to do something before 2 more drives fail and you have no data.

And I always advise people to have any critical data backed up if at all possible, there is no difference here.

Last thing and it may sound stupid but I'm not there so I have to ask... The "new" drive, is it really new or was it used? And are you sure it's the correct drive, that you didn't accidentally install the failed drive again. That would be fine if that were the case just so your problem was fixed, and it's okay to feel stupid, we all make stupid mistakes. It's just that something like this would be the best outcome to explain why things happened the way they did.
 

netman06

Dabbler
Joined
Sep 13, 2012
Messages
21
Removing the volume and then Auto-Import is what "basically" (okay it unmounts and then imports) happens when you shut down the system and then power it back up. So you would not loose any data performing this task, but leave all the drives installed of course when you do this. If you do not have a current backup of your configuration, make it. Worse case is the Auto-Import fails to work (which I doubt) but say that happens, then you can just restore your configuration file and the drives will be put back in.

You currently have a RAIDZ2 with 7 drives active, you need to do something before 2 more drives fail and you have no data.

And I always advise people to have any critical data backed up if at all possible, there is no difference here.

Last thing and it may sound stupid but I'm not there so I have to ask... The "new" drive, is it really new or was it used? And are you sure it's the correct drive, that you didn't accidentally install the failed drive again. That would be fine if that were the case just so your problem was fixed, and it's okay to feel stupid, we all make stupid mistakes. It's just that something like this would be the best outcome to explain why things happened the way they did.

Hi joeschmuck,

Yes, both drives are brand new and WD Red 3TB, I learned the hard and long way that standard Seagates, died a lot and got smart after reading this forum and switched to only using NAS drives.

I'll make a config file backup and then lookup the procedure for removing a volume and Auto-Import it to see if this cleans my system up.

Will let you know the status.

Thanks again for the quick response, since I agree that I do not want to be done any longer then I need to because if possible another drive failure.

I always have 3-4 brand new drives in my computer room at all times.

Take Care,
 

netman06

Dabbler
Joined
Sep 13, 2012
Messages
21
So that process that I will use is this:

1) Select volume, next select to Detech Volume.

2) On Detach Volume Dialog Box.
Mark the disks as new (detroy data): Not Checked

Also delete the share's configuration: Not Checked

Click on the Yes button

3) After this has been successful, I will proceed to Auto-Import
On step number 1 or 2
Encrypted ZFS volume? : No: Skip to import
Click on the OK button

Next, on step 2 of 2
Auto Import Volume, select my existing ZFS volume from the Drop-down list.
Next, click on the OK button.

How do these steps look and are they correct.

If so, This is the process that I will use.

Also, all drive in the system are new, verified by serial numbers.

Hopefully FreeNAS will take in the new drive or I can manual add it to the system.

Thanks,
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
That procedure is correct.

You made my heart skip a beat when I read "Mark the disks as new (destroy data)", then I saw "Not Checked" and my heart returned to normal. Yea, do not destroy your data, please.

Let me know how the operation turns out.

And if the volume isn't listed in the drop down, the quick way to take that step back is to restore your configuration file. And we can take the next step. That next step may be to backup all your data elsewhere first as a safety precaution. I really hope removing the pool and bringing it back it will solve it.
 

netman06

Dabbler
Joined
Sep 13, 2012
Messages
21
Hi joeschmuck,

I was able to perform this process with any issues, but it looks like the disk is in the same state.

When I try to Replace it, Member disk is blank.

Here are two screen shots to see both View Disks and View Volumes.

upload_2016-7-10_7-37-53.png


upload_2016-7-10_7-38-8.png


If I can run any other CLI commands to help us figure out what is going on with this disk, let me know.

Thanks,
 

netman06

Dabbler
Joined
Sep 13, 2012
Messages
21
i found this while check things.

(pass1:ahcich1:0:0:0): SMART. ACB: b0 d5 09 4f c2 40 00 00 00 00 01 00
(pass1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
Read SMART Selective Self-test Log failed: Input/output error

What does this mean on a brand new drive. Also if we can have a way to confirm that no FreeNAS data is on this drive, and double check, then maybe I can wipe it.

I do have this option available, in the GUI.

root@freenas] ~# smartctl -a /dev/ada1
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p3 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red (AF)
Device Model: WDC WD30EFRX-68EUZN0
Serial Number: WD-WCC4NXXXXXXXXX
LU WWN Device Id: 5 0014ee 20ce3ecbb
Firmware Version: 82.00A82
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sun Jul 10 08:11:54 2016 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (39540) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 397) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 253 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 1
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 83
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 0
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 85
194 Temperature_Celsius 0x0022 121 118 000 Old_age Always - 29
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]


(pass1:ahcich1:0:0:0): SMART. ACB: b0 d5 09 4f c2 40 00 00 00 00 01 00
(pass1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
Read SMART Selective Self-test Log failed: Input/output error

[root@freenas] ~#
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
All I can recommend next is to remove the ada1 drive form the system (the new drive) and see what it looks like. Maybe you do need to use a different drive. Your problem is odd for sure. And you could try using the CLI but only if you are planning to put your data at some risk. The wrong typo could be very bad.

I don't recall if you did this but here is something else you could try...

1) Shutdown FreeNAS server
2) Remove your USB Flash boot device.
3) Install a clean USB Flash drive.
4) Install FreeNAS 9.3 (latest version of 9.3)
5) Auto-Import your pool and see what happens.
6) DO NOT UPGRADE YOUR POOL (when you see the Alert flashing) becasue you will not be able to roll back to 9.2 if you do.
 
Status
Not open for further replies.
Top