Drives unavailable

Status
Not open for further replies.

Bishop

Cadet
Joined
Jan 18, 2017
Messages
2
Hello,

I'm new as of a few days ago to FreeNas so if I'm out of line with this posting please let me know.

I've run into a situation where within 2 pools a few drives have become unavailable:

Version: FreeNAS-9.10-STABLE-201605021851 (35c85f7)

[root@freenas] ~# zpool status -x
pool: VM1
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: scrub repaired 0 in 0h0m with 0 errors on Wed Jan 18 08:07:47 2017
config:

NAME STATE READ WRITE CKSUM
VM1 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
6156782431948319902 UNAVAIL 0 0 0 was /dev/gptid/2b9b8cbd-31cf-11e6-9f12-00e0ed313542
gptid/2bc708be-31cf-11e6-9f12-00e0ed313542 ONLINE 0 0 0
gptid/2bf8b9f4-31cf-11e6-9f12-00e0ed313542 ONLINE 0 0 0
gptid/2c25a81a-31cf-11e6-9f12-00e0ed313542 ONLINE 0 0 0
gptid/2c528c71-31cf-11e6-9f12-00e0ed313542 ONLINE 0 0 0
gptid/2c83049b-31cf-11e6-9f12-00e0ed313542 ONLINE 0 0 0

errors: No known data errors

pool: VM3
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: scrub repaired 0 in 0h9m with 0 errors on Thu Jan 19 08:44:48 2017
config:

NAME STATE READ WRITE CKSUM
VM3 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/b72520a4-31d3-11e6-be72-00e0ed313542 ONLINE 0 0 0
18386910683362558414 UNAVAIL 0 0 0 was /dev/gptid/b75143d5-31d3-11e6-be72-00e0ed313542
gptid/b7866bdc-31d3-11e6-be72-00e0ed313542 ONLINE 0 0 0
gptid/b7b3de18-31d3-11e6-be72-00e0ed313542 ONLINE 0 0 0
gptid/b7e182ea-31d3-11e6-be72-00e0ed313542 ONLINE 0 0 0
1042060917291809891 UNAVAIL 0 0 0 was /dev/gptid/b816616b-31d3-11e6-be72-00e0ed313542

errors: No known data errors

It appears that the drives in question have been pulled and then possibly replaced...which makes the gui completely useless at this point as the option to offline (not present) and replace (no options in the drop down) irrelevant.

I've gone to the command line and tried the zpool online/import/replace commands to no avail.

Again...I do apologize for the noob question but I'm kind of stuck and could use a hand. Any assistance or guidance would be greatly appreciated.
 

Attachments

  • Volume Status.PNG
    Volume Status.PNG
    31.9 KB · Views: 291
  • VM1.PNG
    VM1.PNG
    12.9 KB · Views: 308
  • VM3.PNG
    VM3.PNG
    13.3 KB · Views: 285

PCanada

Cadet
Joined
Jan 17, 2017
Messages
6
Edit: Welcome to the forums btw!

Please provide the output of the following commands from the CLI each individually wrapped in [CODE]"Paste Stuff Here"[/CODE] tags to preserve indenting for ease of reading.
  • dmidecode
    This will provide some understanding of what your hardware is.
  • camcontrol devlist
    This will return a list of devices seen by your system
  • glabel status
    This will report the gptid of the disks seen by your system
I presume that you have verified that this problem is not related to a lose/disconnected data cable/port or power connection and that a system reboot did/has not resolved the problem (even temporarily).
 

Bishop

Cadet
Joined
Jan 18, 2017
Messages
2
Hello,

Thank you for the response. I kind of inherited this system as of the other day

Before your response I did the following:
1. Tried to bring the drives back online thinking that at least at that point I could do something with them

VM3
zpool online VM3 1042060917291809891
zpool online VM3 18386910683362558414
VM1
zpool online VM1 6156782431948319902

[root@freenas] /dev# zpool online VM3 1042060917291809891
warning: device '1042060917291809891' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present

[root@freenas] /dev# zpool online VM3 18386910683362558414
warning: device '18386910683362558414' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present

[root@freenas] /dev# zpool online VM1 6156782431948319902
warning: device '6156782431948319902' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present


2. Did a Scrub of each pool - VM1 and VM3
3. Rebooted the system
4. Came back in the same state as before


With regards to the commands that you requested
camcontrol devlist

[root@freenas] ~# camcontrol devlist
<LSI SAS2X36 0e12> at scbus0 target 8 lun 0 (ses0,pass0)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 9 lun 0 (pass1)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 10 lun 0 (pass2)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 11 lun 0 (pass3)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 12 lun 0 (pass4)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 13 lun 0 (pass5)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 14 lun 0 (pass6)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 15 lun 0 (pass7)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 16 lun 0 (pass8)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 17 lun 0 (pass9)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 18 lun 0 (pass10)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 19 lun 0 (pass11)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 20 lun 0 (pass12)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 21 lun 0 (pass13)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 22 lun 0 (pass14)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 23 lun 0 (pass15)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 24 lun 0 (pass16)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 25 lun 0 (pass17)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 26 lun 0 (pass18)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 27 lun 0 (pass19)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 28 lun 0 (pass20)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 29 lun 0 (pass21)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 30 lun 0 (pass22)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 31 lun 0 (pass23)
<ATA WDC WD2000FYYZ-0 1K03> at scbus0 target 32 lun 0 (pass24)
<Forti2GB USB DISK 2.0 PMAP> at scbus9 target 0 lun 0 (da0,pass25)

glabel status

[root@freenas] ~# glabel status
Name Status Components
gptid/2bc708be-31cf-11e6-9f12-00e0ed313542 N/A mfid0p2
gptid/2bf8b9f4-31cf-11e6-9f12-00e0ed313542 N/A mfid1p2
gptid/2c25a81a-31cf-11e6-9f12-00e0ed313542 N/A mfid2p2
gptid/2c528c71-31cf-11e6-9f12-00e0ed313542 N/A mfid3p2
gptid/2c83049b-31cf-11e6-9f12-00e0ed313542 N/A mfid4p2
gptid/d0bc4da8-1b89-11e6-9547-00e0ed313542 N/A mfid5p2
gptid/d0f461ef-1b89-11e6-9547-00e0ed313542 N/A mfid6p2
gptid/d1375b44-1b89-11e6-9547-00e0ed313542 N/A mfid7p2
gptid/d175b15c-1b89-11e6-9547-00e0ed313542 N/A mfid8p2
gptid/d1b392ee-1b89-11e6-9547-00e0ed313542 N/A mfid9p2
gptid/d1f2a410-1b89-11e6-9547-00e0ed313542 N/A mfid10p2
gptid/edcef0a6-1b89-11e6-9547-00e0ed313542 N/A mfid11p2
gptid/ee0b35a4-1b89-11e6-9547-00e0ed313542 N/A mfid12p2
gptid/ee51868b-1b89-11e6-9547-00e0ed313542 N/A mfid13p2
gptid/ee8febb7-1b89-11e6-9547-00e0ed313542 N/A mfid14p2
gptid/eeceaf6c-1b89-11e6-9547-00e0ed313542 N/A mfid15p2
gptid/ef0fcc4e-1b89-11e6-9547-00e0ed313542 N/A mfid16p2
gptid/b72520a4-31d3-11e6-be72-00e0ed313542 N/A mfid17p2
gptid/b7866bdc-31d3-11e6-be72-00e0ed313542 N/A mfid18p2
gptid/b7b3de18-31d3-11e6-be72-00e0ed313542 N/A mfid19p2
gptid/b7e182ea-31d3-11e6-be72-00e0ed313542 N/A mfid20p2
gptid/c38814f4-1180-11e6-a663-00e0ed313542 N/A da0p1

dmidecode

See attached file
 

Attachments

  • DMIDECODE.txt
    45.1 KB · Views: 370

PCanada

Cadet
Joined
Jan 17, 2017
Messages
6
Will get back to this shortly as I have an sensitive client matter to resolve...

I find is curious that camcontrol finds 24 disks while glabel reports 20; I don't recall having seen that behavior before...
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Why are you attempting something in the shell? The guide to replace drives are in the documentation. The drives are in a faulted state for a reason, no point trying to on-line something the ZFS pool has kicked out.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
@Bishop Please post the output of "dmesg".

What I'd suggest you do is locate each drive by it's serial number, next replace one of the failing drives in VM3 (because you have two failed drives in a RAIDZ2 and one more failure will result in data loss). Follow the instructions (link) @m0nkey_ provided above. You can resilver multiples drives at the same time however I would only recommend a single drive in VM1 and VM3, once the first VM3 drive is done, then replace the second VM3 drive. Hopefully I was clear here.

If the drives which were kicked out were the new drives, the drives will need to be wiped in order to use them in the system again.
 

dcevansiii

Dabbler
Joined
Sep 9, 2013
Messages
22
Hello,

Thank you for the response. I kind of inherited this system as of the other day

I think this might be your major problem. You might not be familiar with how ZFS/Freenas works. (Then again you might be, but your responses about trying to "online" an offline drive tell me otherwise.) You might want to spend some time doing that.

My suggestions/thoughts:

1. Follow most of joeschmuck's advice.
2. Do NOT resilver more than one disk at a time within a pool. I believe that is a very bad thing and stresses the drives. ( I had an experience on opensolaris where the whole thing went unresponsive for hours, when I tried doing that. but it did resilver in the end. was very touch and go though. Or maybe it was doing zpool status during a resilver, I forget which. )
3. Use the GUI.
4. Ask more questions if you are unsure.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
I wonder if that LSI device shown with 'camcontrol devlist' is in the correct IT mode? That output looks funny to me. The disks don't seem to have individual device names. I'm sure @monkey and @joeschmuck would be able to tell though.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I wonder if that LSI device shown with 'camcontrol devlist' is in the correct IT mode? That output looks funny to me. The disks don't seem to have individual device names. I'm sure @monkey and @joeschmuck would be able to tell though.
Thanks for the vote of confidence but LSI controllers are not my a strong topic for me.

2. Do NOT resilver more than one disk at a time within a pool.
Hopefully the advice I gave doesn't say that. I think I said only one drive per pool and starting with VM3 and VM1 (that would be 1 drive per pool) and after the first VM3 drive was resilvered, replace the last VM3 drive.

4. Ask more questions if you are unsure.
Absolutely please ask if you have any doubt. VM3 is at high risk for data loss and if you pull the wrong drive, the data could be gone.
 

PCanada

Cadet
Joined
Jan 17, 2017
Messages
6
And this is why I decided to finally join the community and participate!

Thanks to @joeschmuck, @Glorious1, @dcevansiii and @m0nkey_ because while I'm thinking root cause analysis you guys are thinking problem resolution which may very well reveal the root cause in the process...

With multiple senior members now advising the off-lining and replacement of the suspect disks I am not sure I have anything more to add at the moment and look forward to hearing "all is well"
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I wonder if that LSI device shown with 'camcontrol devlist' is in the correct IT mode? That output looks funny to me. The disks don't seem to have individual device names. I'm sure @monkey and @joeschmuck would be able to tell though.
They are using a raid card. The mfi driver is what gave it away. This is the 3rd post today with someone using raid. I'm tired of explaining why not to use it.


Sent from my Nexus 5X using Tapatalk
 

dcevansiii

Dabbler
Joined
Sep 9, 2013
Messages
22
They are using a raid card. The mfi driver is what gave it away. This is the 3rd post today with someone using raid. I'm tired of explaining why not to use it.

Remember that he got this system dropped in his lap and didn't design it. He's trying to pick up the pieces. But yeah... RAID cards with ZFS are bad, mkay?
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I would suggest backing up the data and building up the system properly. This will require a quick learning curve but that is the way it is right now. If you can flash the RAID controller to IT mode then you should be able to use the exact same hardware.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
@Bishop, please use [CODE][/CODE] tags whenever posting console output.
 
Status
Not open for further replies.
Top