Disk repacement fails

Status
Not open for further replies.

CyberPete

Dabbler
Joined
May 12, 2015
Messages
19
I have just replaced one of the hard disks in a 3 disk stripe, using the FreeNAS GUI, and my system has been left in a confused state. Storage → Volume status shows the new disk, but also the old disk, labelled as UNAVAIL and a disk called 'replacing-2', labelled as DEGRADED. Storage → View Disks shows only the correct 3 disks.

I recently had a warning that one of my volumes was low on free space. The stripe, consisting of 3TB, 2TB and 1TB disks. I already had a 3TB as a spare, so removed it using the GUI and used it to replace the 1TB, again using the GUI. The system seems to be stuck mid-replacement. zpool status -v is included below.
I physically disconnected the old drive and the volume still functions OK. All the files are accessible, so the new disk seems to be in place, comprising 8TB of storage, but FreeNAS reports the old configuration and still claims that I'm low on disk space.

I did a scrub on the volume, but it made no difference.
Code:
 zpool status -v Backup
  pool: Backup
state: DEGRADED
  scan: scrub repaired 0 in 6h28m with 0 errors on Sun Jul  3 00:59:28 2016
config:

        NAME                                            STATE     READ WRITE CKSUM
        Backup                                          DEGRADED     0     0     0
          gptid/11beb5f7-bc3f-11e5-aa96-00241dcf3a0f    ONLINE       0     0     0
          gptid/1282feff-bc3f-11e5-aa96-00241dcf3a0f    ONLINE       0     0     0
          replacing-2                                   DEGRADED     0     0     0
            4761865740716069053                         UNAVAIL      0     0     0  was /dev/gptid/1475dad5-bc3f-11e5-aa96-00241dcf3a0f
            gptid/a15e0d0e-3eb8-11e6-8539-00241dcf3a0f  ONLINE       0     0     0

errors: No known data errors


System Info:
Code:
System Information
Hostname FourNAS.local
Build FreeNAS-9.10-STABLE-201606270534 (dd17351)
Platform AMD Phenom(tm) II X4 955 Processor
Memory 8160MB
System Time Sun Jul 03 11:38:44 BST 2016
Uptime 11:38AM up 58 mins, 1 user
Load Average 0.12, 0.16, 0.16


Attached:
  • Screen shots of the 'Volumes', 'Volume status' and 'View disks' screens on the GUI.
  • Debug output from System → Advanced → Save Debug

View attachment 12510 View attachment 12510 View attachment 12511
 

Attachments

  • Volumes.png
    Volumes.png
    37.9 KB · Views: 179
  • Volume status.png
    Volume status.png
    27.2 KB · Views: 175
  • View disks.png
    View disks.png
    30.1 KB · Views: 166
  • debug-FourNAS-20160704074249.tgz
    898.8 KB · Views: 169
Last edited:

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
A stripe has no redundancy. You lost a disk, odds are you've also lost the pool. Time to recreate the pool and restore from backup.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
ada0 and ada2 seem to be unhappy.
You don't seem to be running a proper SMART testing schedule.
removed it using the GUI and used it to replace the 1TB, again using the GUI
Please explain exactly what you did, step by step.
FreeNAS reports the old configuration and still claims that I'm low on disk space
Have you tried rebooting?
 

CyberPete

Dabbler
Joined
May 12, 2015
Messages
19
First of all, the pool is functioning as before. I can read it, I can write to it and I can open files on it. My other FreeNAS server backs up onto it, using replication, and reports no errors. Its snapshots are all there.

I know that stripes have no redendancy. The disk I replaced had no errors, I was just running out of space so attempted to replace the smallest disk in the volume with a bigger one, usin the replace command from the GUI.

The steps I took were:
  1. I identified the smallest disk (1TB) in the striped volume 'Backup'.
  2. I already had a 3TB disk as a spare, so I removed that, using the GUI.
  3. I used Volume Status → Replace on the 1TB to replace it with the 3TB.
  4. FreeNAS reported resilvering, which went on for some hours (as expected).
  5. I left it overnight, shutting down the GUI. The following day I could not connect with the server via the GUI, or PuTTY.
  6. I powered the server off by cutting the power and restarted it.
  7. On logging in, the server was in the state you see from my previous post, with the UNAVAIL disk and the DEGRADED disk, with the addition of the 1TB disk being visible under Storage → View Disks.
  8. I thought that this anomaly might be die to the old disk still being present, so I powered down normally, from the console, opened the server case and disconnected the 1TB drive (i.e. pulled its plug out).
  9. I restarted and logged in.
  10. The system state was as shown in my post.
  11. So now, the old disk has been taken out and the new disk is in place . I can read from the volume, access its files through a CIFS share and write to it, both directly and via Replication from my other FreeNAS server. In other words, it functions normally.
  12. The 2 disks, labelled UNAVAIL disk and DEGRADED are still reported via Volume Status in the GUI and zpool from the command line. Storage → Volumes reports the old storage capacities. Storage → View Disks shows the correct 3 disks.
--
Pete
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I don't see anything wrong with your process, other than step 6, which seems like it was forced on you.

Unfortunately, I have no experience to draw on for a solution.

Since the problem pool is only a backup destination for another system, can you live with destroying and recreating it?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
What I see is three drives attached and ONLINE (ada0p2, ada1p2, and ada2p2). I also see some weirdness that looks like you need to detach in the GUI your old drive "UNAVAIL" and that "replacing-2" likely needs to be detached as well.

It appears that you are running in a stripe so you are in a pickle. If you detach the wrong drive, your data may go away.
 

CyberPete

Dabbler
Joined
May 12, 2015
Messages
19
Thanks for the replies.

It is indeed a backup, so is redundant data - hence the striped volume. I could destroy the volume and rebuild it, but is there a way to reproduce the setup - in other words, the volume, datasets and shares - with having to re-enter the information by hand?

--
Pete
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
but is there a way to reproduce the setup - in other words, the volume, datasets and shares - with having to re-enter the information by hand?
Not that I am aware of, I think it's all by hand but then again, it's not that bad.

Since you must destroy the pool, or it seems like you must based on previous advice, I still recommend that you go into the Volume Status GUI and "offline" the two drives listed as "UNAVAIL" and "replacing-2". Start with the "UNAVAIL" first and see what happens. Reboot. Does that fix it? If not then do the same for the second drive.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
It's showing degraded because of the device unavailability. Detach the "UNAVAIL" drive. If you were actually replacing the drive, this is a necessary step, and your array will return to normal if the drive was successfully replaced.
 

CyberPete

Dabbler
Joined
May 12, 2015
Messages
19
I offlined the disk labelled "47618......" and rebooted. It just shows OFFLINE. BTW, I had to do this from the command line - the GUI only offered me the Replace option.

The disk labelled "replacing-2"is odd. The GUI offers a Replace button, but zpool, from the command line, says it doesn't exist, even though "zpool status Backup" shows that it does. The physical drive is not connected either.

Whichever way, its a mess. I'm going to detach the whole volume and start again. It'll probably be quicker, in the end.

Thanks for the help.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I'm going to detach the whole volume and start again.
Since you offlined a disk via the CLI, the GUI wasn't able to track the change. If you export the pool and reimport it, the GUI should be back in sync.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Since you offlined a disk via the CLI, the GUI wasn't able to track the change. If you export the pool and reimport it, the GUI should be back in sync.
I was thinking the same thing and hoping it would make the difference and save you the hassle of rebuilding everything.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Under the hood, I believe, the zfs replace functionality works by mirroring the drive you're replacing, and the detaching the original drive from the mirror.

But when you attempt to detach the 2nd last drive in a mirror, you get an error (thus being forced to use split to get back to s stripe)

Does replacing a stripe disk even work normally?

If detach doesn't work, then replacing the unavail disk with another disk, and then splitting the resulting mirror should work.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Under the hood, I believe, the zfs replace functionality works by mirroring the drive you're replacing, and the detaching the original drive from the mirror

Yes, or so my recollection goes.
 
Status
Not open for further replies.
Top