3 HDD failed and my pool is now UNAVAIL

popot

Cadet
Joined
Jan 11, 2014
Messages
5
1x Seagate ST4000VM000 4TB 5900 RPM 64MB Cache SATA 6.0Gb/s 3.5" HDD failed last week in tank-share pool. Took it offline.

Was away, so was not able to replace the disk.

-----

at 12:38 last night freenas emailed me

New alerts:
* The volume tank-share state is UNKNOWN:


Gone alerts:
* Device: /dev/da1 [SAT], Read SMART Error Log Failed
* Device: /dev/da3 [SAT], not capable of SMART self-check
* Device: /dev/da3 [SAT], Read SMART Error Log Failed
* Device: /dev/da1 [SAT], not capable of SMART self-check
* Device: /dev/da3 [SAT], Read SMART Self-Test Log Failed
* Device: /dev/da1 [SAT], failed to read SMART Attribute Data
* Device: /dev/da3 [SAT], failed to read SMART Attribute Data
* Device: /dev/da1 [SAT], Read SMART Self-Test Log Failed
* The volume tank-share state is DEGRADED: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.


Alerts:
* The volume tank-share state is UNKNOWN:

-----

This morning tried to do zpool import

Code:
   pool: tank-share
     id: 7959606875702175080
  state: UNAVAIL
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
        devices and try again.
   see: http://illumos.org/msg/ZFS-8000-3C
 config:

        tank-share                              UNAVAIL  insufficient replicas
          raidz2-0                                      UNAVAIL  insufficient replicas
            11926784897592992771                        UNAVAIL  cannot open
            5538303827549403759                         OFFLINE
            2713945048641951632                         UNAVAIL  cannot open
            gptid/6a8832b0-e67b-11e8-b908-0050568cc29e  ONLINE
          raidz2-1                                      ONLINE
            gptid/8c08852f-e67b-11e8-b908-0050568cc29e  ONLINE
            gptid/8d2f6d69-e67b-11e8-b908-0050568cc29e  ONLINE
            gptid/8e5288e1-e67b-11e8-b908-0050568cc29e  ONLINE
            gptid/8f740385-e67b-11e8-b908-0050568cc29e  ONLINE
          raidz2-2                                      ONLINE
            gptid/a4eff44b-e67b-11e8-b908-0050568cc29e  ONLINE
            gptid/a61ac224-e67b-11e8-b908-0050568cc29e  ONLINE
            gptid/a74d645d-e67b-11e8-b908-0050568cc29e  ONLINE
            gptid/a88628d6-e67b-11e8-b908-0050568cc29e  ONLINE
          raidz2-3                                      ONLINE
            gptid/bb652fee-e67b-11e8-b908-0050568cc29e  ONLINE
            gptid/bcce2855-e67b-11e8-b908-0050568cc29e  ONLINE
            gptid/be04648a-e67b-11e8-b908-0050568cc29e  ONLINE
            gptid/bf62b7fe-e67b-11e8-b908-0050568cc29e  ONLINE


First urgent desperate question, have tank-share gone to the great beyond? Can it be recovered?

Second bunch of inane questions, what config would have been better? Some spares? I thought raidz2 can survive 3 hdd failures?
 

blueether

Patron
Joined
Aug 6, 2018
Messages
259
It would seen very bad luck to loose 3 of 4 disks in a vdev, are they all on the same controller or power line or some other factor they have in common?
 

popot

Cadet
Joined
Jan 11, 2014
Messages
5
It would seen very bad luck to loose 3 of 4 disks in a vdev, are they all on the same controller or power line or some other factor they have in common?

blueether, thanks for your help.

Controller 1
raidz2-0
raidz2-1

Controller 2
raidz2-3
raidz2-4

All the vdevs in tank-share is sharing a common power line.
 

blueether

Patron
Joined
Aug 6, 2018
Messages
259
I think I would be looking at the (I'm guessing miniSAS 8087?) cable to make sure that it is well seated, maybe even swap it with another one, then maybe look at the backplate if using
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,974
I thought raidz2 can survive 3 hdd failures?
2 hard drive failures. If you can't find something wrong with your cabling as @blueether has suggested then your pool is gone. I hope you have a good backup.
 

popot

Cadet
Joined
Jan 11, 2014
Messages
5
Joined
Oct 18, 2018
Messages
969
Were you getting email warnings about bad sectors or problems with these drives prior to failure? Do you have regular scrubs and smart tests scheduled? It seems that you do.

What controllers are you using?

What happens if you online that single offline disk?

Are you using encryption?

If you get the pool back up do you have a replacement drive burned in and ready to go?
 
Top