Replaced drive

dgraybill · Nov 8, 2012

I had a drive in a zfs pool that was giving me a crap load of write errors, and ahchi errors, enough that it was causing FreeNAS to lock up. So I stuck a new drive in, and hit replaced. It took forever, and crashed several times. This is the output I get now:

Code:

[root@freenas] ~# zpool status
  pool: Storage
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
  scan: resilvered 186G in 241h35m with 348669 errors on Wed Nov  7 14:39:23 2012
config:

        NAME                                            STATE     READ WRITE CKSUM
        Storage                                         ONLINE       0     0     0
          replacing-0                                   ONLINE       0     0     0
            gptid/6d3fa90d-1b23-11e2-b307-00241d29191e  ONLINE       0     0     4
            gptid/9254b0ca-2141-11e2-b3d2-00241d29191e  ONLINE       0     0     0
          gptid/71b35dc9-1b23-11e2-b307-00241d29191e    ONLINE       0     0     0
          gptid/59fa73fd-1bb5-11e2-a43a-00241d29191e    ONLINE       0     0     0

errors: 348669 data errors, use '-v' for a list
[root@freenas] ~#

But it wont let me delete the old drive in the pool on the GUI, like I did last time I replaced a drive. Thoughts?

cyberjock · Nov 13, 2012

Have you done a '-v' like it says for a list of data errors? It appears that you may have up to 348669 corrupted files on your server...

Also, I assume this error is the same from your previous thread. You are running in a RAID-0(no redundancy) so a failure of any disk has catastrophic data consequences. It surely appears that you are seeing the "rewards" of using a RAID0 in a multi-drive situation.

I could be mistaken, but I think your best option is to backup your data that is still on the server, delete and recreate the zpool with redundancy, then copy the data back to the server. There's a reason why there is a warning in bold in the manual regarding RAID0.

dgraybill · Nov 13, 2012

These are 2 different situations. In my other thread my os drive failed and I couldn't recreate the pool. This problem has since been rectified. The issue I speak of here is of one drive that is slowly failing. I have had this issue before with an identical drive of the same age and I replaced it just as I have here but I was able to delete the old drive after the fact and it was fine until this drive began acting up. I don't much care about the files that are corrupted but I do care about the other files that aren't corrupted. My issue now is that I can access all the files on the Nas, but very slowly because turn dying drive is still in there and keeps giving ahchi errors. Then after a few thousand faults the kernel panics and locks up. My data is fine if I can get that bad drive out of there.

cyberjock · Nov 13, 2012

Did the resilvering ever complete without errors? I'm not too read-up on how resilvering works with unrecoverable errors. In this case I think its possible that since it couldn't do a 100% resilver the drive is still there. You also said that the system kept crashing. It's possible that the crashing(and I assume rebooting) has made the resilvering never complete.

In any case, I'd say that you are in a pickle. If you can't successfully complete a resilvering because of the crashing I'm not sure you'll ever be able to remove the old drive.

That's why I said the best course of action is likely to delete and recreate the zpool. I'd add redundancy to save your data next time. Redundancy doesn't work as a backup, but it sure does save you some serious time and effort having to use those backups.

You mentioned AHCI errors. I've only seen those with a bad SATA cable. Any chance you could replace your SATA cable?

cyberjock · Nov 13, 2012

If you are sure the resilvering completed(it looks like it may have completed as much as it will ever complete on Nov 7th) then I'd just do a shutdown, physically remove the failing drive, then bootup. If you have access to your zpool then you can detach the removed disk from the GUI. If you can't access the data after you physically remove the failiing drive then the resilvering didn't complete and it looks like you're stuck with the bad drive since you have no redundancy.

dgraybill · Nov 13, 2012

They look like this:

https://picasaweb.google.com/m/view...63155/5792260744016510945/5810326318634222306

I spent a few hours on Google and all I came up with was bad drives. I can try a cable. It did say slivering complete with errors. What would I do to retry delete the new drive?

dgraybill · Nov 13, 2012

And fyi I don't have redundancy because the info is not important and my drives are different sizes. Its just a huge pain to get all the data back so ill just try all avenues before starting over.

cyberjock · Nov 13, 2012

The link to picasaweb is only to an empty album.

Redundancy for RAID isn't a backup. It's only meant to avoid the hassle of recovering from backups in the event of a failure. I'm glad you understand that. I really hate having to say things like "your setup cost you all of your data" and they reply with "but.. but.. but I have no backups".

Id try a cable only because its cheap and it would suck to use that cable in a year or two and have another drive "fail" because you reused the same cable.

dgraybill · Nov 13, 2012

Hmm picture loads for me. Something like AHCHI0 timeout on slot x port 0. Slot changes all the time port is always 0. There are some other errors too. Ill try a new cable tonight. Thanks.

Important Announcement for the TrueNAS Community.

Replaced drive

dgraybill

Dabbler

cyberjock

Inactive Account

dgraybill

Dabbler

cyberjock

Inactive Account

cyberjock

Inactive Account

dgraybill

Dabbler

dgraybill

Dabbler

cyberjock

Inactive Account

dgraybill

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Replaced drive

Dabbler

Inactive Account

Dabbler

Inactive Account

Inactive Account

Dabbler

Dabbler

Inactive Account

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Replaced drive"

Similar threads