Pool Unavail

Status
Not open for further replies.

cabishop

Cadet
Joined
Mar 19, 2016
Messages
7
Hi,

Today I awoke to find that a couple of my HDD LED's were not on. I logged into the WebGUI and there was an alert. I clicked on the Alert to be shown "alert system.jpg".

I then logged into Putty and SSH'd into FreeNAS. I typed in 'zpool status', it displays:

Code:
 
pool: storhome
state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: scrub in progress since Sun Mar 20 00:00:34 2016
        20.9T scanned out of 22.3T at 483M/s, 0h51m to go
        633M repaired, 93.64% done
config:

        NAME                                            STATE     READ WRITE CKSUM
        storhome                                        UNAVAIL    299     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/025ad4a7-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/05b954c8-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/088a7c14-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/0b1bcdd9-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/0ce2341f-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/0dee94c5-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/0f02e248-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1024c1c3-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1132ffd2-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1248648c-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
          raidz2-1                                      ONLINE       0     0     0
            gptid/13823817-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/14909c2a-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/15a41e76-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/16af9451-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/17bea8f6-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/18d49cce-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/19fae165-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1b1677ec-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1c32263f-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1d52d97c-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
          raidz2-2                                      ONLINE       0     0     0
            gptid/1f0a26ed-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/20e9d6be-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/22dcc6ce-8055-11e3-960f-000c293f2342  ONLINE       0     0     0  (repairing)
            gptid/24cf09f1-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/26ca4f68-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/28b2ca23-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/2bc1c1fb-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/2eac7941-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/31394999-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/341c0baf-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
          raidz2-3                                      UNAVAIL    606     8     0
            7425134643272534971                         UNAVAIL     39     2     0  was /dev/gptid/373073d1-8055-11e3-960f-000c293f2342
            7030332361354863589                         UNAVAIL     39     2     0  was /dev/gptid/395bb564-8055-11e3-960f-000c293f2342
            gptid/3b74838c-8055-11e3-960f-000c293f2342  ONLINE       0     0     0  (repairing)
            2302982127250804425                         UNAVAIL     41     4     0  was /dev/gptid/44f648f7-8055-11e3-960f-000c293f2342
            9302872677343048229                         UNAVAIL     33     4     0  was /dev/gptid/4899c712-8055-11e3-960f-000c293f2342
            6093478878733219772                         UNAVAIL      7     5     0  was /dev/gptid/4b26b426-8055-11e3-960f-000c293f2342
            7610573395036278912                         UNAVAIL      5     4     0  was /dev/gptid/4d85ade5-8055-11e3-960f-000c293f2342
            gptid/4fc8164f-8055-11e3-960f-000c293f2342  ONLINE       0     0     0  (repairing)
            1699868845844476237                         UNAVAIL     39     3     0  was /dev/gptid/52e40d92-8055-11e3-960f-000c293f2342
            8413629583474340662                         UNAVAIL     39     1     0  was /dev/gptid/558d179e-8055-11e3-960f-000c293f2342
        logs
          gptid/56b8d096-8055-11e3-960f-000c293f2342    ONLINE       0     0     0
        cache
          gptid/56ff3296-8055-11e3-960f-000c293f2342    ONLINE       0     0     0

errors: 11824834 data errors, use '-v' for a list


I do intend on waiting for the scrub to finish and then power down and up again to see if the connected HDD's which are UNAVAIL come back ONLINE.

My concern is with the 8 drives in raidz2-3 showing as UNAVAIL.... and its stored data.

Would like to ask for guidance and other recommendations please. Extra information has been included.

Regards,
cabishop.
 

Attachments

  • alert system.jpg
    alert system.jpg
    25.4 KB · Views: 229
  • system info.jpg
    system info.jpg
    32.6 KB · Views: 235
  • dmesg.txt
    63.5 KB · Views: 443

cabishop

Cadet
Joined
Mar 19, 2016
Messages
7
Hmm, well the scrub did complete. I shutdown FreeNAS from WebGUI, logged into esxi and did a power down. Cleaned the drive caddies and reseated the HDD's again. Powered everything up again and logged into SSH and did a zpool status:

Code:
Welcome to FreeNAS
[root@freenas] ~# zpool status
  pool: storhome
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Mar 20 15:08:01 2016
        163G scanned out of 22.4T at 1.10G/s, 5h45m to go
        69.4M resilvered, 0.71% done
config:

        NAME                                            STATE     READ WRITE CKSUM
        storhome                                        ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/025ad4a7-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/05b954c8-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/088a7c14-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/0b1bcdd9-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/0ce2341f-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/0dee94c5-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/0f02e248-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1024c1c3-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1132ffd2-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1248648c-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
          raidz2-1                                      ONLINE       0     0     0
            gptid/13823817-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/14909c2a-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/15a41e76-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/16af9451-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/17bea8f6-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/18d49cce-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/19fae165-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1b1677ec-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1c32263f-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1d52d97c-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
          raidz2-2                                      ONLINE       0     0     0
            gptid/1f0a26ed-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/20e9d6be-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/22dcc6ce-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/24cf09f1-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/26ca4f68-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/28b2ca23-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/2bc1c1fb-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/2eac7941-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/31394999-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/341c0baf-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
          raidz2-3                                      ONLINE       0     0     0
            gptid/373073d1-8055-11e3-960f-000c293f2342  ONLINE       0     0     0  (resilvering)
            gptid/395bb564-8055-11e3-960f-000c293f2342  ONLINE       0     0     0  (resilvering)
            gptid/3b74838c-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/44f648f7-8055-11e3-960f-000c293f2342  ONLINE       0     0     0  (resilvering)
            gptid/4899c712-8055-11e3-960f-000c293f2342  ONLINE       0     0     0  (resilvering)
            gptid/4b26b426-8055-11e3-960f-000c293f2342  ONLINE       0     0     0  (resilvering)
            gptid/4d85ade5-8055-11e3-960f-000c293f2342  ONLINE       0     0     0  (resilvering)
            gptid/4fc8164f-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/52e40d92-8055-11e3-960f-000c293f2342  ONLINE       0     0     0  (resilvering)
            gptid/558d179e-8055-11e3-960f-000c293f2342  ONLINE       0     0     0  (resilvering)
        logs
          gptid/56b8d096-8055-11e3-960f-000c293f2342    ONLINE       0     0     0
        cache
          gptid/56ff3296-8055-11e3-960f-000c293f2342    ONLINE       0     0     0

errors: 14088368 data errors, use '-v' for a list
[root@freenas] ~#


Now pondering....

Regards,
cabishop.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Now pondering....
Does FreeNas have direct access to those drives? Since you are running in a VM, your situation may be outside of the normal scope... Please post complete system configuration as well as how the drives are being accessed.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
I don't like this :
Code:
errors: 14088368 data errors, use '-v' for a list


I think that with 14 M errors this pool is toasted unfortunately.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I agree, post all your system specs, drive controllers, etc., and how the system runs/configured on ESXi.
 

cabishop

Cadet
Joined
Mar 19, 2016
Messages
7
Hi,

Virtually FreeNAS does have direct access to the HDD's as passthrough mode is enabled in esxi. HBA is configured for IT mode.

Enclosure 1:

Boot: 8GB USB (ESXi 5.5.0)
CPU: Intel Xeon E3-1230 v2
RAM: 32GB ECC
MOBO:Supermicro X9SCI/X9SCA
HBA: LSI 2004 (9211-8i from memory - connected to SAS Expander)
SAS: HP SAS Expander (connected to SAS backplane)
HDD: 24x WD 3.0TB Green
SSD1: 120GB (logs)
SSD2: 120GB (cache)
Case: Norco RPC-4224 (w/ 120mm Fan Wall Bracket)

Enclosure 2 (connected via SFF-8088 port):

SAS: Chenbro CK23601 (connected to SAS backplane)
HDD: 16x WD 3.0TB Green
Case: Norco RPC-4224 (w/ 120mm Fan Wall Bracket)

This enclosure is connected to the HP SAS Expanders External Port via SFF-8088. The Chenbro is using a UEK Bracket kit w/Power Control) http://www.chenbro.com/upload/website/0ec98bec7b81e5505d5dc653943610af.jpg so it has no Motherboard or CPU requirements.

The resilvering process did complete:

Code:
[root@freenas] ~# zpool status
  pool: storhome
state: ONLINE
  scan: resilvered 3.11G in 2h43m with 0 errors on Sun Mar 20 17:51:49 2016
config:

        NAME                                            STATE     READ WRITE CKSUM
        storhome                                        ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/025ad4a7-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/05b954c8-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/088a7c14-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/0b1bcdd9-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/0ce2341f-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/0dee94c5-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/0f02e248-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1024c1c3-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1132ffd2-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1248648c-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
          raidz2-1                                      ONLINE       0     0     0
            gptid/13823817-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/14909c2a-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/15a41e76-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/16af9451-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/17bea8f6-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/18d49cce-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/19fae165-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1b1677ec-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1c32263f-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/1d52d97c-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
          raidz2-2                                      ONLINE       0     0     0
            gptid/1f0a26ed-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/20e9d6be-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/22dcc6ce-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/24cf09f1-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/26ca4f68-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/28b2ca23-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/2bc1c1fb-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/2eac7941-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/31394999-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/341c0baf-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
          raidz2-3                                      ONLINE       0     0     0
            gptid/373073d1-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/395bb564-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/3b74838c-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/44f648f7-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/4899c712-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/4b26b426-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/4d85ade5-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/4fc8164f-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/52e40d92-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
            gptid/558d179e-8055-11e3-960f-000c293f2342  ONLINE       0     0     0
        logs
          gptid/56b8d096-8055-11e3-960f-000c293f2342    ONLINE       0     0     0
        cache
          gptid/56ff3296-8055-11e3-960f-000c293f2342    ONLINE       0     0     0

errors: No known data errors
[root@freenas] ~#


Under the WebGUI > Alert System its all green, healthy and OK. I think its time for some individual smart tests on the HDD's. I'm leaning towards thinking the Norco SAS backplane was the cause as when you reseat the drives, they come back ONLINE?

The error count previously vs after a resilvering is a little concerning, which im hoping the smart tests will shed some light on. WD Reds would be nice however they did no exist when it was built.

Regards,
cabishop.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
And the drives which are being resilvered in your previous postings are part of which enclosure? Do you think you can isolate the problem to a cable issue?
 

cabishop

Cadet
Joined
Mar 19, 2016
Messages
7
I will need to look into which enclosure the raidz2-3 pool is. Perhaps this is a good time to incorporate labels so I can correspond a serial to a drive name. I dont think it is a cable problem, the enclosures are rack mounted (32U on wheels) though its rarely moved. The number of drives which were UNAVAIL also tells me its not cable related, if 4 where UNAVAIL perhaps cabling.

The HDD Standby is 'Always On' and the Advanced Power Management is 'Disabled'.

Labels and HDD smart tests will be a tomorrow job.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472

cabishop

Cadet
Joined
Mar 19, 2016
Messages
7
Thank you @gpsguy for the scripts mention. This saved some time. All I had to do was enable "Access for less secure apps" in Google for mail to work properly in my FreeNAS version.

I used both SAS and SATA versions of the scripts due to my environment. They have been attached. From going through them it appears that I need to exchange 4 drives. Some information would not populate and errors were displayed running ./sata_report.sh.

SATA Report: da3 da23 da27
SAS Report: da3 da16 da23 da27

Would other's concur based on these reports? Your views would be appreciated.

Regards,
cabishop.
 

Attachments

  • sas_report.txt
    209.1 KB · Views: 258
  • sata_report.txt
    86.3 KB · Views: 274
  • sata_report_error.txt
    3.4 KB · Views: 277

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
After a very quick glance over the results:
Replace: da3, da23, da25.
Keep an eye on: da14, da15, might fail after a long test.
Check Cables/interface with drives: da27, da28, da29, da30, da34, da35, da36.

Additionally you nee to run a Long Test, you haven't a single one done to date and it will likely show more failures.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I would move da14 to the Replace list, and add da16 to the Keep an eye on list (assuming it doesn't fail a long test).
 

cabishop

Cadet
Joined
Mar 19, 2016
Messages
7
Thank you for recommendations.

Labels are on and thus far I have replaced and resilver'd da3, da23 and da25 with new spares - thank god for hardware that supports hot-pluggable :)

I have scheduled for a long test starting at 4am today and monthly from now on. I'll check da14 and da16 after this test. I only have one new spare left...more drives to purchase :cool:

Thanks,
cabishop.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
thank god for hardware that supports hot-pluggable
Just be careful about that, hopefully it's a nice tight fit because I've seen Hot Plugable become Spark-O-Matic, then Melt-O-Matic. It is always best if you can power down the drive being replaced.
 

cabishop

Cadet
Joined
Mar 19, 2016
Messages
7
Just be careful about that, hopefully it's a nice tight fit because I've seen Hot Plugable become Spark-O-Matic, then Melt-O-Matic. It is always best if you can power down the drive being replaced.

All is fine.

This long test however is another hurdle. Its been two days and I've tried multiple configs through the WebGUI and it just won't start the test.

I wish there were more information whether is be docs, screenshots or videos on S.M.A.R.T monitoring and testing setups instead of other least significant areas of functionality. Could not find one video on the FreeNAS YouTube channel!

Regards,
cabishop.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
This long test however is another hurdle. Its been two days and I've tried multiple configs through the WebGUI and it just won't start the test.
What test? If you are talking a SMART test, you need to really pay attention to the settings and they can screw you up. But from the shell you can start the SMART test "smartctl -t long /dev/da3" and do that for every drive you want to kick off the test. You do have a lot of drives but if you just wanted to get things going, that is the way to do it.
 
Status
Not open for further replies.
Top