SOLVED Lost a disk, degraded then crash

Status
Not open for further replies.

blairjj

Cadet
Joined
Oct 14, 2013
Messages
8
Howdy,

I have the following system:

ASROCK C2750D4I / Crucial 16GB Kit (8GBx2) DDR3L 1600MT/s (PC3-12800) DR x8 ECC
Bitfenix Prodigy / Cosair 600 W PS
(5) Toshiba 2TB 7200 RPM SATA 6Gb/s 3.5" Internal Hard Drive DT01ACA200 - RAIDZ
Dual PNY 16GB USB 2.0 Flash Drive P-FDU16G/APPMT-EF
FreeNAS 9.3 (Plex. Transmission, CIF)

Last night I saw that I had the dreaded blinking red idiot light. Looking at the messages, I saw that I had a disk fail and was in degraded mode. I ordered a new disk (arrives tomorrow) and when I got home today I noticed that I couldn't ssh into the box. Went downstairs and determined the box was unresponsive, was at the 1-14 console and wouldn't respond to any of the choices. After a reboot, the zpool not longer is listed in the GUI and only 2 drives seem to be online out of the original 5.

Looking at some of the relevant posts in this section I ran the following commands:
Code:
[root@FreeNAS] ~# camcontrol devlist
<TOSHIBA DT01ACA200 MX4OABB0>      at scbus0 target 0 lun 0 (ada0,pass0)
<TOSHIBA DT01ACA200 MX4OABB0>      at scbus1 target 0 lun 0 (ada1,pass1)
<HL-DT-ST DVDRAM GH24NS95 RN01>    at scbus14 target 0 lun 0 (pass2,cd0)
<PNY USB 2.0 FD 1100>              at scbus17 target 0 lun 0 (pass3,da0)
<PNY USB 2.0 FD 1100>              at scbus18 target 0 lun 0 (pass4,da1)

[root@FreeNAS] ~# zpool status -v
  pool: freenas-boot
state: ONLINE
  scan: resilvered 4.50K in 0h0m with 0 errors on Wed Dec 31 19:03:35 1969
config:

    NAME                                            STATE     READ WRITE CKSUM
    freenas-boot                                    ONLINE       0     0     0
     mirror-0                                      ONLINE       0     0     0
       gptid/09fe3642-922b-11e4-8c20-d050992ecef8  ONLINE       0     0     0
       gptid/0ae1cf63-922b-11e4-8c20-d050992ecef8  ONLINE       0     0     0

errors: No known data errors

[root@FreeNAS] ~# zdb
Media:
    version: 5000
    name: 'Media'
    state: 0
    txg: 7400472
    pool_guid: 3969545755668622442
    hostid: 2400999328
    hostname: 'FreeNAS.local'
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 3969545755668622442
        create_txg: 4
        children[0]:
            type: 'raidz'
            id: 0
            guid: 5551506481268430069
            nparity: 1
            metaslab_array: 35
            metaslab_shift: 36
            ashift: 12
            asize: 9991233208320
            is_log: 0
            create_txg: 4
            children[0]:
                type: 'disk'
                id: 0
                guid: 8452839542136952068
                path: '/dev/gptid/f9ea1f47-9230-11e4-8d33-d050992ecef8'
                whole_disk: 1
                DTL: 278
                create_txg: 4
            children[1]:
                type: 'disk'
                id: 1
                guid: 12755607072554625527
                path: '/dev/gptid/fa78a65e-9230-11e4-8d33-d050992ecef8'
                whole_disk: 1
                DTL: 277
                create_txg: 4
            children[2]:
                type: 'disk'
                id: 2
                guid: 10229266234416752081
                path: '/dev/gptid/fb06e3ce-9230-11e4-8d33-d050992ecef8'
                whole_disk: 1
                DTL: 276
                create_txg: 4
                degraded: 1
                aux_state: 'err_exceeded'
            children[3]:
                type: 'disk'
                id: 3
                guid: 18357837903980992176
                path: '/dev/gptid/fb8f2a82-9230-11e4-8d33-d050992ecef8'
                whole_disk: 1
                DTL: 275
                create_txg: 4
            children[4]:
                type: 'disk'
                id: 4
                guid: 7124986016943102226
                path: '/dev/gptid/fc145a46-9230-11e4-8d33-d050992ecef8'
                whole_disk: 1
                DTL: 274
                create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data


Also reviewing dmesg shows something else ominous:

Code:
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <TOSHIBA DT01ACA200 MX4OABB0> ATA-8 SATA 3.x device
ada0: Serial Number 43Q2BHPGS
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <TOSHIBA DT01ACA200 MX4OABB0> ATA-8 SATA 3.x device
ada1: Serial Number 73RTNMXKS
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad6


I had hoped that just replacing the drive would let me get back up and running, but I am rather alarmed that I now seem to be short 3 disks OR have a larger problem (sata controller).

Any advice / counsel or shoulders to cry on are welcome.

Thanks
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Well, putting the best possible outcome on the worst possible thing that could happen, there's a possibility that your other drives are fine and something else (cabling?) is wrong, so this is a very good time to carefully open the machine and take a look at the disks. Multiple devices totally failing simultaneously is unusual.

However, if you had some other issue, like you only had one fan, it stalled, and you created an EZ-Bake ZFS Oven by cooking the disks to death, you could very well be hosed.

You need to assess what has actually happened and if those drives are okay.
 

blairjj

Cadet
Joined
Oct 14, 2013
Messages
8
So I thought about and remembered that I had a dell poweredge T110 laying around. So I moved all of the drives over and yanked the RAID card (had just enough on board SATA ports). I then moved my USB thumb sticks over and booted up. All of the drives but one was seen. I replaced that drive and resilvering commenced (YES!).

Then came happiness:

Code:
[root@FreeNAS] /mnt# zpool status -v
  pool: Media
state: ONLINE
  scan: resilvered 231G in 0h42m with 0 errors on Sun Mar 13 21:59:26 2016
config:

    NAME                                            STATE     READ WRITE CKSUM
    Media                                           ONLINE       0     0     0
      raidz1-0                                      ONLINE       0     0     0
        gptid/f9ea1f47-9230-11e4-8d33-d050992ecef8  ONLINE       0     0     0
        gptid/fa78a65e-9230-11e4-8d33-d050992ecef8  ONLINE       0     0     0
        gptid/5e6d8c08-e982-11e5-8682-d4ae52cefe85  ONLINE       0     0     0
        gptid/fb8f2a82-9230-11e4-8d33-d050992ecef8  ONLINE       0     0     0
        gptid/fc145a46-9230-11e4-8d33-d050992ecef8  ONLINE       0     0     0

errors: No known data errors


I am making a complete backup (Double YES!!).

I will use another drive and experiment on the old Mobo to see what the heck was going on. the Dell only has 8GB of memory and it is not ECC so that is a very temporary solution!

Thanks
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
So I moved all of the drives over and yanked the RAID card (had just enough on board SATA ports). I then moved my USB thumb sticks over and booted up. All of the drives but one was seen. I replaced that drive and resilvering commenced (YES!).

Then came happiness:
Ahh... The beauty of FreeNas... :D
 
Status
Not open for further replies.
Top