Grub Error - After attempted Replace of Boot USB Drive

Status
Not open for further replies.

Gilley7997

Dabbler
Joined
Feb 23, 2015
Messages
42
I am still working on the new system build as outlined here. Thought I was getting to the point of being able to put this in production and get it in use and was attempting to add my new CZ33 Sandisk Drives and remove the old Generic USB stick that I had used for testing when the 32GB Ultra Fits would not work in this board. I inserted the first new USB device and successfully added it to the boot-mirror. This resilver'd just fine and I shut down the system and restarted it and everything appears to be just wonderful.

I then attempted to add the second CZ33 and do a replace from Web UI of the old generic USB to the second CZ33. This was running last night when I went to bed. I woke up this morning, and it appeared that the resilver failed after about 28 minutes with to many fatal errors. (Checked with zpool status -v). This struck me as odd as I scanned both these New CZ33 Drives with Rufus for bad blocks for 4 passes before attempting to put them in production and saw no errors. At this point the system was still running but I could not view the boot status in the web ui. It was throwing an error but at this time I cannot remember what it said and I cannot get back to that point at this time. So I restarted the system, because the UI was still reporting the pools as healthy. Just the Replacement had failed. At this point is where I ran into my issue. On reboot grub throws an error at me that says this.

Code:
error: no such device: a25a98d77c9b3111
Entering rescue mode...
grub rescue>


This is not critical for me at this point, but I want to use this as a learning experience. Is there a way to recover from this? When I do an ls at this prompt It lists all the drives, but when I do an ls (hd0,1) or anything like that on any of the list - I just receive '(hd2): Filesystem is unknown.

I put my FreeNAS installation USB drive back into the machine and prompted the installer to use the old generic USB Drive as the installation destination, and the Installer Detected that there was already a FreeNAS System present and asked me to upgrade or fresh install, so the system appears to still be there, but GRUB is borked, and I will be first to admit that I am not overly familiar with GRUB.

I took the media that supposedly had the errors and am scanning it again with Rufus as we speak, but in the meantime I would like to try to understand the best process to recover from this error. I do have access to this machine all day and will gladly try whatever is asked of me. I don't have physical access though so I have to work with what is installed in the machine. I have a USB Drive with the 9.3 Installation, and the two USB Drives that were in the successful boot mirror still installed in the drive.
 

Gilley7997

Dabbler
Joined
Feb 23, 2015
Messages
42
I have continued to try and determine what the best way to resolve this is. I'm currently looking at a shell started from the FreeNAS 9.3 installation media.

From the shell I ran 'zpool import'

Both my boot and the Data pools are listed. I'm assuming my Data Pool is fine as no odd activities were occurring there.

This is the output that I see for the boot pool:
Code:
  pool: freenas-boot
    id: 11698831033324220689
state: DEGRADED
status: The pool is formatted using a legacy on-disk version.
action: The pool can be imported despite missing or damaged devices.  The fault tolerance of the pool may be compromised if imported.
config:
freenas-boot                                     DEGRADED
  mirror-0                                       DEGRADED
    replacing-0                                  DEGRADED
      gptid/dcd616b9-c876-11e4-a54c-0cc47a31486  ONLINE
      13680724755574897696                       UNAVAIL     cannot open
    da6ps                                        ONLINE


These are the values I would expect considering that the I did pull the one drive to test, and that the replace process had failed. Based on these values though I would expect the system to be bootable, but apparently I have a problem with grub. Is there a process to recover?

I have finished scanning the USB Drive again that had write errors, it seems to have no issues so I will be looking for a process to clean up this boot pool and attempt to get this system moved over to the SanDisks.

Thanks for all the help you all provide.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I'd start by removing the apparently faulty device to see if GRUB still chokes.

In any case, this might be interesting for the devs to know (FreeNAS and/or upstream at GRUB).
 

Gilley7997

Dabbler
Joined
Feb 23, 2015
Messages
42
I'd start by removing the apparently faulty device to see if GRUB still chokes.

If you are talking about the supposedly faulty USB (which isn't testing as faulty but that's besides the point), it is currently not physically installed in the system at this time, which is why it is listed in the Pool status as UNAVAIL.

If you are talking about removing it from the pool config, if you could help me out here with the right process, I will do so. I was thinking this would be the next step, but I didn't want to cause more damage than already existed.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
If you are talking about the supposedly faulty USB (which isn't testing as faulty but that's besides the point), it is currently not physically installed in the system at this time, which is why it is listed in the Pool status as UNAVAIL.

If you are talking about removing it from the pool config, if you could help me out here with the right process, I will do so. I was thinking this would be the next step, but I didn't want to cause more damage than already existed.

I mean the one that failed to resilver. Honestly, I'm not sure how to proceed to get rid of the faulted devices from the pool, since not all options are available in the GUI. I'd recommend you take this one step at a time, since it looks like an edge case that might need better handling in the future.
 

Gilley7997

Dabbler
Joined
Feb 23, 2015
Messages
42
I mean the one that failed to resilver.

So the only drive that is currently installed is the one SanDisk drive that was successfully in the mirror that was not attempted to be replaced. I am still receiving the same error from grub when attempting to boot.

After booting from the Install Media again, and running 'zpool import' one more time this is what it shows for the boot pool now.

Code:
  pool: freenas-boot
    id: 11698831033324220689
 state: DEGRADED
status: The pool is formatted using a legacy on-disk version.
action: The pool can be imported despite missing or damaged devices.  The fault tolerance of the pool may be compromised if imported.
config:
freenas-boot                                     DEGRADED
  mirror-0                                       DEGRADED
    replacing-0                                  UNAVAIL   insufficient replicas
      9221627796630526634                        UNAVAIL   cannot open
      13680724755574897696                       UNAVAIL   cannot open
    da5p2                                        ONLINE


So it appears to think that the pool would still be available but GRUB on the USB Stick doesn't know what to do with it. The error remains the same when grub starts to load. Any other suggestions?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
At this point, a bug report might be in order. Probably requires an upstream fix, but the devs will know.

As for a solution, the easiest would be to install FreeNAS to new media and import the config file, keeping the old media in case someone wants to have a look for bug squashing proposes.
 

Gilley7997

Dabbler
Joined
Feb 23, 2015
Messages
42
HOLY CRAP! I fixed it!

I will tell you the steps and show you the after status and I have a small inkling of how it resolved the issue but someone who has had more than 4 hours experience with grub and ZFS can tell me why. I wasn't sure how the grub loader specifically referenced different things, but with the extra mirror in the ZFS pool, I though maybe this could be causing some of the issue and because the replace left this in this status, maybe the grub loader is never expecting this extra layer and thus gets lost. (This is all very hypothetical, like I said...I only have about 4 hours of experience here) :)

You can see in the above posts what the status was I was dealing with.

Here is the process I did:
  1. I put all the USB drives back into the machine and then booted from the FreeNAS Installation media. (Remember that the 3rd USB drive I had formatted and tested again for errors just to verify, I am guessing this is why it is still not found.)
  2. Opened the Shell from the installation software.
  3. Ran 'zpool import' which showed me my 2 pools at this time we are only concerned with freenas-boot
  4. Ran 'zpool import freenas-boot'. Successfully imported the pool.
  5. Checked 'zpool status -v' which returned basically the same data as above showing one drive as unavailable and a mirror inside a mirror.
  6. I manually ran a scrub at this time with 'zpool scrub freenas-boot'. Did this just to verify that at least the 2 drives that it was aware of were in sync.
  7. I then detached the UNAVAILABLE Drive from the zpool with 'zpool detach freenas-boot 13680724755574897696' which was the device ID from the 'zpool status -v' command
  8. When I checked 'zpool status -v' again this is the output I received:
    Code:
      pool: freenas-boot
     state: ONLINE
      scan: scrub repaired 0 in 0h18m with 0 errors on Sat Apr 11 00:31:37 2015
    config:
    
            NAME                                            STATE     READ WRITE CKSUM
            freenas-boot                                    ONLINE       0     0     0
              mirror-0                                      ONLINE       0     0     0
                gptid/dcd616b9-c876-11e4-a54c-0cc47a314686  ONLINE       0     0     0
                da6ps                                       ONLINE       0     0     0
    
    errors: No known data errors
    


  9. I then ran another scrub with 'zpool scrub freenas-boot' just to make sure. This again returned with no errors.
  10. I then rebooted the system changed the boot device back to the FreeNAS boot devices. I wasn't expecting this to actually work so I wasn't really watching. I looked back at the screen and the FreeNAS system was up and running. Verified that the plexmediaserver came back up. Everything seems to be in place correctly and functioning correctly. This is now the output of 'zpool status -v'
    Code:
      pool: freenas-boot
     state: ONLINE
      scan: scrub repaired 0 in 0h18m with 0 errors on Sat Apr 11 00:31:37 2015
    config:
    
            NAME                                            STATE     READ WRITE CKSUM
            freenas-boot                                    ONLINE       0     0     0
              mirror-0                                      ONLINE       0     0     0
                gptid/dcd616b9-c876-11e4-a54c-0cc47a314686  ONLINE       0     0     0
                gptid/d3e52070-df1f-11e4-8e7f-0cc47a314686  ONLINE       0     0     0
    
    errors: No known data errors
    


  11. I then verified that setting the bios to the other USB device also allows the system to boot just for good measure.
So it appears that I got my system back. At least for now.

So my one question here is, why does the device name in zpool status -v keep changing to different values. Just curious!

I am actually going to try to replace the older 8GB generic USB with the New SanDisk that I tested again during the day today and it still tests fine over night again tonight.

I will let you know how that goes!
 

Gilley7997

Dabbler
Joined
Feb 23, 2015
Messages
42
The replace process worked perfectly this time around. Now looking at a system with the mirrored USB 16GB Sandisk Cruzers. Not sure what the first hiccup was the first time around, but looks like I got it all straightened out.

One thing while the resilver is taking place on the boot drive there is still a hiccup and the interface throws an error where normally the zpool status information would be under the Boot->Status. I once again didn't capture the error, but it seems pretty consistent, I'm wondering if it also has something to do with the code not being able to display the data with the extra replace-0 mirror exists in the pool. One the replace process finished, the table comes back and displays the zpool data correctly.
 
Status
Not open for further replies.
Top