Trouble with auto-import

Status
Not open for further replies.

Tired_

Dabbler
Joined
Feb 17, 2013
Messages
29
Hi. I'm having some difficulty getting FreeNAS to auto-import my RAIDZ2. First, some background info...about a year and a half ago, I built (what I thought was) a solid FreeNAS server, with 6 SATA drives, a Pentium 4 CPU and 4GB of RAM, in a normal desktop PC case. It ran fine for a couple of months, then one day I logged on and saw it was degraded. I was able to isolate the degradation to one drive, so I shut things down, removed the drive, and tested it with SeaTools (it was a Seagate drive). After some time, SeaTools said the drive was fine, but when I went to put it back in, it turned out that SeaTools had destroyed the contents of the drive. I didn't know how to deal with this situation, and I came here looking for help. In the process of troubleshooting, it came to light that my server had tremendous heat problems, due to my inexperience with server construction, and I was advised to shutdown the server immediately until I was able to resolve the heat problems. So, I did so; the server got shut off and went into a corner for the better part of a year. During that time, I was able to save some money and purchase a proper server case (with eight fans!), as well as a new motherboard and processor. About a month ago, when I was finally ready and had all the pieces together, I assembled the new motherboard, processor and RAM (now it's a C2D E6300 with 8GB of non-ECC DDR2), ran memtest on it for several days (no errors). Somehow during the time the server was down, the USB drive that had FreeNAS on it before was destroyed (I think it got kicked, as I found it in pieces beside the server on the floor), so I downloaded 8.3.1-p2 (which was the current version when this all started...I'm leery about a newer version trying to upgrade the zpool before I get it running properly), wrote it to a flash drive, and booted off of it. It booted up OK, but when I went to auto-import my existing pool, it went for about 10 minutes, then stopped, popping up the incredibly helpful "An error occurred" at the top. At this point, I necroed my old thread from a year ago, but nobody has responded after two weeks, so I thought I'd make a new one here. During this time, I felt (for no reason whatsoever) that the drive that got wiped by SeaTools was cursed, so I bought a new 3TB WD Red drive to take it's place.

So here's where I am at now. 5 of the 6 drives in my RAIDz2 are listed as ONLINE, and one is UNAVAIL. I have a fresh drive ready to be resilvered in, but I can't get the GUI to auto-import. I'm somewhat outside of my comfort zone here, so I am scared to do much of anything without some advice. Any thoughts would be greatly appreciated.

Here's some info that might be useful:

Code:
[root@freenas] ~# zpool import
  pool: MAIN
    id: 2316280524828094423
  state: DEGRADED
status: One or more devices are missing from the system.
action: The pool can be imported despite missing or damaged devices.  The
    fault tolerance of the pool may be compromised if imported.
  see: http://www.sun.com/msg/ZFS-8000-2Q
config:
 
    MAIN                                            DEGRADED
      raidz2-0                                      DEGRADED
        gptid/3e54ea70-1e5d-11e2-b155-00261889a36e  ONLINE
        gptid/3f04f9e9-1e5d-11e2-b155-00261889a36e  ONLINE
        13589355565508211589                        UNAVAIL  cannot open
        gptid/feb27aa0-3f54-11e2-8fb8-00261889a36e  ONLINE
        gptid/40ca2b7b-1e5d-11e2-b155-00261889a36e  ONLINE
        gptid/0c4307e2-21f7-11e2-95fd-00261889a36e  ONLINE


Code:
[root@freenas] ~# camcontrol devlist
<ST2000DM001-9YN164 CC4B>          at scbus0 target 0 lun 0 (pass0,ada0)
<ST3000DM001-1CH166 CC24>          at scbus1 target 0 lun 0 (pass1,ada1)
<WDC WD30EZRX-00MMMB0 80.00A80>    at scbus3 target 0 lun 0 (pass2,ada2)
<WDC WD30EFRX-68EUZN0 80.00A80>    at scbus4 target 0 lun 0 (pass3,ada3)
<WDC WD15EADS-00P8B0 01.00A01>    at scbus5 target 0 lun 0 (pass4,ada4)
<ST3000DM001-9YN166 CC4H>          at scbus6 target 0 lun 0 (pass5,ada5)
<Memorex Micro PMAP>              at scbus10 target 0 lun 0 (pass7,da1)


Code:
[root@freenas] ~# glabel status
                                      Name  Status  Components
gptid/3f04f9e9-1e5d-11e2-b155-00261889a36e    N/A  ada0p2
gptid/feb27aa0-3f54-11e2-8fb8-00261889a36e    N/A  ada1p2
gptid/0c4307e2-21f7-11e2-95fd-00261889a36e    N/A  ada2p2
          gpt/Microsoft reserved partition    N/A  ada3p1
gptid/360e0dbb-a4f9-40d2-8593-7a87ec9bba5b    N/A  ada3p1
                  gpt/Basic data partition    N/A  ada3p2
gptid/9a121d4f-f39b-4322-ad10-fea1fda016e5    N/A  ada3p2
gptid/40ca2b7b-1e5d-11e2-b155-00261889a36e    N/A  ada4p2
gptid/3e3e2c3d-1e5d-11e2-b155-00261889a36e    N/A  ada5p1
gptid/3e54ea70-1e5d-11e2-b155-00261889a36e    N/A  ada5p2
                            ufs/FreeNASs3    N/A  da1s3
                            ufs/FreeNASs4    N/A  da1s4
                            ufs/FreeNASs1a    N/A  da1s1a

ada3 is the new drive. I threw it into a Windows system when it arrived so I could test functionality and capacity before I put it into service, which is probably why it is still listed as Microsoft in here.

Code:
[root@freenas] ~# gpart show
=>        34  3907029101  ada0  GPT  (1.8T)
          34          94        - free -  (47k)
        128    4194304    1  freebsd-swap  (2.0G)
    4194432  3902834696    2  freebsd-zfs  (1.8T)
  3907029128          7        - free -  (3.5k)
 
=>        34  5860533101  ada1  GPT  (2.7T)
          34          94        - free -  (47k)
        128    4194304    1  freebsd-swap  (2.0G)
    4194432  5856338696    2  freebsd-zfs  (2.7T)
  5860533128          7        - free -  (3.5k)
 
=>        34  5860533101  ada2  GPT  (2.7T)
          34          94        - free -  (47k)
        128    4194304    1  freebsd-swap  (2.0G)
    4194432  5856338696    2  freebsd-zfs  (2.7T)
  5860533128          7        - free -  (3.5k)
 
=>        34  5860533101  ada3  GPT  (2.7T)
          34      262144    1  ms-reserved  (128M)
      262178        2014        - free -  (1M)
      264192  5860268032    2  linux-data  (2.7T)
  5860532224        911        - free -  (455k)
 
=>        34  2930277101  ada4  GPT  (1.4T)
          34          94        - free -  (47k)
        128    4194304    1  freebsd-swap  (2.0G)
    4194432  2926082703    2  freebsd-zfs  (1.4T)
 
=>        34  5860533101  ada5  GPT  (2.7T)
          34          94        - free -  (47k)
        128    4194304    1  freebsd-swap  (2.0G)
    4194432  5856338696    2  freebsd-zfs  (2.7T)
  5860533128          7        - free -  (3.5k)
 
=>    63  7570689  da1  MBR  (3.6G)
      63  1930257    1  freebsd  [active]  (942M)
  1930320      63      - free -  (31k)
  1930383  1930257    2  freebsd  (942M)
  3860640    3024    3  freebsd  (1.5M)
  3863664    41328    4  freebsd  (20M)
  3904992  3665760      - free -  (1.8G)
 
=>      0  1930257  da1s1  BSD  (942M)
        0      16        - free -  (8.0k)
      16  1930241      1  !0  (942M)


Many thanks for all your help!
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
you probably need to clear out the windows partitions that got created. After you do this it will auto import correctly and you will be able to resilver the new drive. I would wait for someone to second my suggestion though.
 

Tired_

Dabbler
Joined
Feb 17, 2013
Messages
29
I really don't mean to sound ungrateful, but could you explain to me why that is? I don't see how the presence of a certain partition on a brand new disk would affect the importing of a zpool on a different set of disks. Regardless, I wiped the disk in the FreeNAS GUI and it still won't auto-import the existing zpool. Anything else I can try? I'm grasping at straws at this point, I had no idea this kind of software was so fragile.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Sorry, I can't explain your situation as I don't fully understand it. But... FreeNAS on bootup checks your partition tables for any viable partition tables. If partition tables exist then the OS will lock out those drives from being usable by you as the user. Even as root you can't do things to the disk. The expectation is that you will put unpartitioned disks in the FreeNAS appliance and let FreeNAS always do the partitioning and such.

So far it looks like you've done everything right except for the Windows partition. FreeNAS expects new disks to be blank and unpartitioned so you may need to wipe the drive before you use it.
 

Tired_

Dabbler
Joined
Feb 17, 2013
Messages
29
I did wipe the new drive. It still fails to import the existing Raidz2.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Can you post the output of "zpool import" from the CLI? Please paste it in pastebin as posting it in the forums will make it useless to me.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You did do it right.

So you tried to auto-import the pool per the FreeNAS documentation and it didn't work? What exactly happens? You should be getting an error or something...
 

Tired_

Dabbler
Joined
Feb 17, 2013
Messages
29
The pool shows up in auto-import, but when I select it and tell it to auto-import my pool, about 10 minutes in, it says "An error occurred" (or something similar) in grey at the top of the Web interface.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I assume you've tried a reboot without success? If so try this...

Make sure all of your file sharing services are off.

zpool import MAIN -R /mnt

If this works your pool should mount and you should be able to do a "zpool status" and see your pool listed along with the status.

Also, assuming all is working fine you *should* be able to cd /mnt/MAIN and see your files if you do an ls.

Now, if that works you should then stop and not use the WebGUI further as you've done things behind the WebGUI's back. But proving that your pool will mount is a first step. Whatever you do don't try to start up file sharing services as things will go very badly for you. Just do the commands I listed above and provide the output to pastebin again. Gonna have to go one step at a time. This is going to prove if the pool is actually healthy or if there is some kind of corruption.

I will warn you again that if you start trying to do things from the WebGUI you will be sorry. ;)
 

Tired_

Dabbler
Joined
Feb 17, 2013
Messages
29
I definitely do not want to be sorry. Thank you so much for taking the time to help me...please know that I really appreciate it! :)

I had shut it down for the night last night, so I started it up again today, and immediately disabled everything in Control Services except S.M.A.R.T. and SSH. Then I went to SSH, logged in, and ran "zpool import MAIN -R /mnt", and got an error, but "zpool import -R /mnt MAIN" seemed to work. "cd /mnt/MAIN" and "ls" did indeed show familiar looking directories. Here are my outputs: pastebin.com/J0g4NfpQ

The seeing my files part looked pretty good, but the Permanent errors part in zpool status sounds not-so-good. What do you think?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
So things don't look good...

1. One disk seems to be bad in a big way. As you probably saw ZFS kicked it out of the pool for sucking balls.
2. You have checksum errors, this pretty much means you have at least one more disk failing.
3. The fact that you have metadata errors tells me your pool is not really usable. So you need to look at where you are going to store all this data. In essence you are going to need to copy your data elsewhere, destroy and recreate the pool, then move the data back to the pool.
4. zpool status says you haven't done a scrub since February. Considering you shouldn't be doing that less frequently than montly you are in a very bad way with the ZFS gods.

I'm quite confused because while you say that importing the pool from the WebGUI doesn't work and gives you the very informative "an error has occured" the actual process of importing doesn't throw any errors and "just works".

So you need to provide the output for the following commands:

smartctl -a /dev/XXX

XXX will be your disks and the command camcontrol devlist will list your disks. We need to see just how borked your disks are. Post the output in pastebin and let's see how good/bad things are. ;)

In the meantime you need to start looking at where and how you are going to move all this data from your pool to other places as your pool is basically shot.

I'd recommend you reboot your server so your pool is again unmounted until we have a better idea of what is going on.
 

Tired_

Dabbler
Joined
Feb 17, 2013
Messages
29
I guess you didn't see the original post. The reason there hasn't been a scrub since February (of 2013) is because this server was taken offline to address heat issues in my former build (which I threw a lot of money at and think I have resolved). The reason one disk is bad in status is because it isn't there; it's contents were erased by a hard drive diagnostic, and it has since been removed. Finding a place for the data is easy...I have a 3TB drive, empty and ready to go. It can take almost all the data, and I can find some USB drives for the rest (I think the pool had 3.8TB). The 3TB drive is actually connected to the server now, as it was intended to replaced the removed disk in the pool. I'm rebooting the server now. When it comes back up I'll run the commands you asked for and pastebin the outputs. But I am pretty much ready to go on removing the data, and if that is something you think we might be able to pull off, then I'd like to give that a shot.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, regardless of the reason for not doing a scrub in almost 18 months, the errors you are seeing are a direct result of not performing scrubs at regular intervals. So whatever schedule you had/think you had is inconsequential. For your situation your scrub frequency was not appropriate to ensure you didn't lose data.

Here's what I see from your disks:

ada0: too hot and has 8 bad sectors(essentially early failing)
ada1: too hot
ada2: fine
ada3: fine
ada4: too hot and almost 10000 UDMA errors (most likely a bad SATA cable)
ada5: too hot

None of your disks have done regularly schedule SMART tests. More than likely if you were doing proper SMART tests and scrubs you wouldn't be in the situation you are in.

So here's what you need to do.... you need to be able to mount the pool from the WebGUI. If you can then you can copy the files off of the pool with your network shares. If you can't do that then you're going to have to mount the pool from the CLI, but you are also going to have to put your other disks in the FreeNAS server so you can copy the files off the pool from FreeNAS' CLI as you can't use a pool for network shares if the pool wasn't mounted from the WebGUI. You will probably have corrupted files and this is basically unavoidable now. :/
 

Tired_

Dabbler
Joined
Feb 17, 2013
Messages
29
When I say "offline", I mean shut-off, not running at all, powered down. Can't really scrub a filesystem that is powered off, and this one was powered off for 18 months. Can it accumulate errors in a completely offline, not-powered-up-just-sitting-there state?

ada3 is ready to accept the data. I wiped it in the FreeNAS gui, so it has no filesystem or contents. What do I need to do to get it mounted? And once I finish that, and run the zpool import command you gave me earlier, I should be able to copy files from the pool to wherever we get ada3 mounted to, using the command line? Am I understanding correctly?
 

Tired_

Dabbler
Joined
Feb 17, 2013
Messages
29
I don't understand what you are saying about schedules and reasons for not scrubbing. On Feb 17, 2013, a member of this forum called paleoN told me to 'shutdown -p now' and leave the system off till I had airflow across the drives. I did as I was told. Should I not have taken that advice?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, anything sitting around is aging, and does have wear and tear due to just being on the shelf. Materials break down over time and eventually they just won't work anymore. I'd never have let the box sit unpowered for 18 months with no diagnostics and expect the box to still have my data safe and sound.

paleoN's advice was completely correct. But the expectation was probably that you'd order new fans or whatever and have it fixed in a few weeks at the most... not that it would sit powered off in a corner for more than a year. ;)

This may sound harsh but leaving it unpowered for 18 months was basically abandoning your data. Does it suck? Yes. But that's the reality of it.

So you need to make a single-disk pool in the WebGUI and then using the command line copy what data you want (or can) using the command line. It's going to take a while, and since your pool is damaged there's the possibility you'll start copying data and then the box will panic and reboot. So you need to break down the data as much as you can and copy small folders instead of 1/2 the pool in one command. And you should absolutely give priority to files that are super important and copy those firs. There's a very real chance your box may crash at some point and you'll never be able to mount the pool again.

One thing you should probably do is add -o readonly=on to your pool mounting command so your pool is read-only.
 

Tired_

Dabbler
Joined
Feb 17, 2013
Messages
29
That does sound harsh. If I had known this, I would have done things very differently. I assumed ZFS was like other filesystems...if I write files to a NTFS drive, or burn them to a DVD, I can come back to them in a year or two and they are usually fine...I figured a filesystem as advanced as ZFS would have that basic feature down. How can this be used in the enterprise if the on-disk format is so volatile?

The lucky thing for me is, there's only about 1GB of data that is really important on the pool, I'll see today about mounting it ro and fetching it to the new drive.
 
Status
Not open for further replies.
Top