How to recover from installing freenas OS to the wrong drive

Status
Not open for further replies.

Paul.G

Dabbler
Joined
May 18, 2018
Messages
10
Hello,

(I hope this is the correct sub-section to post this to)

I inherited the job of managing a freenas server just a couple of weeks ago; first time ever working with the product. First thing on the agenda was to update the system from 9.3 to the current 11.1. On the general support forum I was told upgrade to 9.10 first and then 11.1.
The server is an ixsystems 2U, with 12 drives and a small 8 gig drive for the OS (tough drive, I think it was called).
I did the 9.3 -> 9.10 upgrade one day, and the 9.10 -> 11.1 upgrade the following day.
After the 11.1 upgrade I noticed the volume was in a "degraded" state. After digging into the problem a little more, I found that there were 11 drives in the volume, and it looked like one of the 12 drives had been converted to a boot drive. I suspect I picked the wrong drive to install the OS to during one of the upgrades.
There doesn't appear to be any data loss. All the data that was there before is still there. Logins and the web interface still work just fine.
Is it possible to convert the boot drive back to a data drive and add it back to the volume?
Paul.G
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Ouch.

Yes. It’s tricky tho, the boot drives stick to each other.

I’d be very careful, depending on your pool config you could have very little redundancy left.

One of the first things you will need to do is set up a new boot drive and restore a config.

Then you will need to erase the old one, and then you will need to replace the missing disk in the array with it.

Or perhaps you’ve mis diagnosed the problem, and what’s simply happened is an hd has failed?
 

Inxsible

Guru
Joined
Aug 14, 2017
Messages
1,123
What does your
Code:
zpool status
say?
Does it list which vdev is degraded? You can probably find out the serial number of the drive from that to be absolutely certain about which drive might have been overwritten.

If there are no degraded vdevs, then the drive that you installed to might just be a hot spare or a cold spare.
 

Paul.G

Dabbler
Joined
May 18, 2018
Messages
10
The boot pool has this in it: (I had to hand jam this in - this server is on a private network)

pool: freenas-boot
state: ONLINE
scan: none-requested
config
NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
da11p2 ONLINE 0 0 0
errors: no known data errors

** the 'freenas-boot' is where the OS should be installed; I believe the da11p2 is the drive that mistakenly had the OS written to it **

*** The data pool is named "store" - this is what is in there: ***

pool: store
state: DEGRADED
status: one or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state
action: Attach the missing device and online it using 'zpool online'
see: http://illumos.org/msg/ZFS-8000-2Q
scan: scrub repaired 0 in 0 days 01:38:21 with 0 errors on Sun Apr 15 01:38:24 2018
config:

NAME STATE READ WRITE CKSUM
store DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/7696af83-df6f-11e7-a3ec-0025909515e8 ONLINE 0 0 0

(that last gptid/... line repeats 6 times, with slightly different initial numbers - then this line comes up)

2435056653744321299 UNAVAIL 0 0 0 was /dev/gptid/8226e99f-df6f-11e7-a3ec-0025909515e8

(And then 5 more lines like the gptid/... line - 12 lines total - the UNAVAIL line is the 7th line)

errors: No known data errors
 
Last edited:

Paul.G

Dabbler
Joined
May 18, 2018
Messages
10
BTW - just to add to the above comments, I do have a backup config file that I took before doing the 9.3 -> 9.10 upgrade
 

Inxsible

Guru
Joined
Aug 14, 2017
Messages
1,123
So you do have a degraded pool. Go to the FreeNAS WebUI and look at the Storage section and click on "View Disks" to have a look at the disks. That might give you part of the Serial Number of the drives. Match the ones that are still available in the UI. The missing one is the disk that you need to "fix".

If you are lucky, the person before you might have put stickers or labels on the drives and you can identify it by that. Or if you are unlucky, you will have to check the serial number of each drive and match it up with ones that are still available in the pool. You might have to shutdown the NAS for this which may or may not be feasible depending on your work environment.
 

Paul.G

Dabbler
Joined
May 18, 2018
Messages
10
Ok, so does that mean I might have assumed wrong, that I didn't necessarily install to the wrong drive? That it really is a failed drive? Or, I did install to the wrong drive and this is just a way to figure out which drive?
It's not really a heavily used server - some scheduled backups to it might fail, but once it's back up again, things should go back to normal.
If I did mess up and install to two different drives, is it possible to confirm which install went to which drive? i.e. the 9.10 upgrade went to X and the 11.1 upgrade went to Y?
 

Inxsible

Guru
Joined
Aug 14, 2017
Messages
1,123
Ok, so does that mean I might have assumed wrong, that I didn't necessarily install to the wrong drive? That it really is a failed drive? Or, I did install to the wrong drive and this is just a way to figure out which drive?
It could be either or. Did you match up he serial numbers? Make sure you also match it with the drive in the boot pool -- that might confirm if you installed FreeNAS on a data drive by mistake.

It's not really a heavily used server - some scheduled backups to it might fail, but once it's back up again, things should go back to normal.
That's good then. You should be able to bring the server down and check the drives. Make sure you print/write down the serial numbers of the drives that are currently available in FreeNAS before you bring down the server.
If I did mess up and install to two different drives, is it possible to confirm which install went to which drive? i.e. the 9.10 upgrade went to X and the 11.1 upgrade went to Y?
Once you confirm which drives are where -- i.e which drives constitute the boot pool vs which drives constitute the storage pool, you might be able to look into the boot pool drives to see which version of FreeNAS is installed on it. No guarantees though.
 

Paul.G

Dabbler
Joined
May 18, 2018
Messages
10
Ok, I took down the server, copied all the serial numbers off each drive, booted it back up. When I got back to my desk, I re-ran "zpool status" and one of the drives was "resilvering", and the status of both the boot pool and the store pool was ONLINE. The boot pool wasn't running on da11p2 anymore, it was running on da12p2. This made more sense because da12 is a 4 gig disk (not 8 gig, like I thought before), and disks da0-da11 are 3 gig drives and supposed to be the data drives.
There was a status on the "store" pool of "one or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Application are unaffected"
I didn't really change anything in the configuration though; just powered down the server and reseated the drives. It's good that both pools are back online - however, I'd like to understand better what happened in the first place.
 

Inxsible

Guru
Joined
Aug 14, 2017
Messages
1,123
disks da0-da11 are 3 gig drives and supposed to be the data drives.
You surely mean 3TB not 3 gig !!! right?
There was a status on the "store" pool of "one or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Application are unaffected"
Check the logs to get some more details as to what might have caused this.
I didn't really change anything in the configuration though; just powered down the server and reseated the drives. It's good that both pools are back online - however, I'd like to understand better what happened in the first place.
Maybe that re-seating helped. One of the connections to a drive might have been loose the first time around. That was probably why the re-silvering happened -- because FreeNAS thought you added a "new" drive, so it tried & fixed the degraded pool back to online. Once the re-silvering process is completed, double check the zpool status again, just to be sure.

I would still keep a good eye on the pool for the next week or so (maybe more if the data is super important). Make sure you get regular SMART emails and zpool health information emails -- in case sh!t hits the fan in the middle of the night. If the errors are not increasing and everything seems stable, then it might just be a simple case of a loose cable in one of the drives.

Maybe others can suggest a few other things to keep track of further.
 
Last edited:

Paul.G

Dabbler
Joined
May 18, 2018
Messages
10
Uh, yes, you are correct on the drive sizes - 12 x 3TB drives, 1 x 4GB drive.
There was one drive that had a handwritten note on it, something about bad sectors. That drive was the da6 drive, which would seem to correspond to the 7th drive listed in the original "zpool status" output as being "UNAVAIL".
Looks like things are somewhat 'normal' now, but at least I know to keep on eye on the drive that appears somewhat sketchy for awhile.
Thanks for all your help!
 

Inxsible

Guru
Joined
Aug 14, 2017
Messages
1,123
There was one drive that had a handwritten note on it, something about bad sectors. That drive was the da6 drive, which would seem to correspond to the 7th drive listed in the original "zpool status" output as being "UNAVAIL".
Looks like things are somewhat 'normal' now, but at least I know to keep on eye on the drive that appears somewhat sketchy for awhile.
Thanks for all your help!
I would just switch that drive rightaway instead of waiting for it to blow up. I am sure your workplace would pony up for a single 3TB drive.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
You can’t rely on drive id labels Ie ada6 etc. they move about depending on esoteric boot time factors.

Suspect the issue was a drive went away. And then it came back.
 
Status
Not open for further replies.
Top