Failed disk - FreeNAS 8 RTM - no WebUI, no CIFS

voyager529 · Jul 27, 2011

Hey guys,

I wasn't sure whether this should go in the hardware forum or someplace else, so apologies to the mods if they need to move it...

At present, my FreeNAS is a Gigabyte mobo with 6GB of RAM and an AMD Sempron 140 (i think) processor. The motherboard has 8 onboard SATA ports, and they're all occupied with 500GByte Western Digital Caviar Blue drives. The whole setup is roughly 6 months old, except the Mobo which I got in the last month (it replaced a mobo with 4 SATA ports and a PCI SATA adapter). All eight drives are in the same storage pool, which has six datasets in it. I'm running the RTM release of FreeNAS 8.0; anxiously awaiting a non-beta release of 8.0.1...well, that and buckling down to purchase a 2GB USB drive for the task.

I notice today that my Windows XP machine at home isn't communicating with the FreeNAS. At first I thought it simply didn't like the network drive mapping, so I tried remapping the drive. That didn't work, so I tried going to the NAS via the UNC path. Still nothing. I had equal success (or lack thereof) when attempting to access the web UI console. My first thought was "hey, maybe the machine got powered off somehow". That theory was bunked when I was able to successfully ping, so I fired up Xshell and SSH'd into the box successfully. My first thought was the same as any predominantly-Windows user - when in doubt, reboot. If nothing else, I figured it was the simplest way to both restart the CIFS and lighttpd servers. It came back up, I SSH'd in again, but still no WebUI or CIFS, though I could successfully view and transfer files via SFTP (or FTPS; I could never keep them straight...the one where you FTP via SSH).

I finally figured out that a drive was bad when I tried doing a zpool scrub and it said that the array was in a degraded state because one of the drives wasn't available. However, the zscrub did come back clean. While admittedly I have not been home yet to verify that it simply a loose cable, that case seems rather unlikely and even if it is, I'd like to consider this a training exercise so I know what I'm doing when a drive *does* fail. So yes, if I get home and it's a loose cable I'll take all the ZoMg N00b RtFm!!111 flak you'd like to give me. Until then, let's assume that a disk has legitimately failed...

1) is there a correlation between having no access except SSH and the failed disk that is known? A search through the forums for 'failed disk' and 'failed lighttpd' didn't yield anything obviously useful. Also, restarting the lighttpd daemon didn't seem to help. Is there a separate set of steps needed to get back into the webUI, or am I stuck at SSH only until the drive is replaced?

2) I've heard rumors that 8.0 RTM doesn't swap out failed disks very gracefully and that I'll need 8.0.1 to do it right. Given that I'm REALLY not a fan of running with one failed disk in an array until 8.0.1 officially releases, is there any word from the beta users out there whether 8.0.1 is "stable enough" to be used, given the situation?

2a) The present set of things I've been reading indicate that I'd need a 2GB flash drive to do the update; I'm presently using a 1GB drive. The upgrade procedures indicate that my best bet is to export my config, do a fresh install on the new stick, then re-import it again. If that's the case, how do I export from SSH, and how happy will it be importing a degraded ZFS volume?

3) How do I display the serial numbers of the drives in the command line? I can show their addresses using the zpool status command, but not the serials - since they're all identical drives, I'll need the serial to tell them apart.

4) While another post about a failed drive seemed to indicate that the most desirable course of action is to plug in a new disk alongside the failed disk, let it rebuild, and remove the bad disk, I don't have the SATA ports to do that correctly at present. My two options are to either pull the bad disk out and put the new disk in, then let the array rebuild (as is the case with most enterprise-grade storage devices), or to pull my PCI SATA card out of retirement, install the drive, let it build, then pull out both the SATA card and the bad drive together. Which is more desirable?

I have the disk on order from Newegg; I'll likely have it either tomorrow or the following day. Obviously I intend on doing a warranty swap on the old disk and leaving it on the shelf as a spare in the event this happens again. Thanks in advance for your help; I'll be sure to provide documentation for what I do and whether it works or not.

Joey

Darkaine · Jul 27, 2011

I can't speak for 8.0's ease of swapping drives (Since I've not had to do it yet) and I also can't speak of 8.0.1's stability (Since I've not used it yet). However it seems a large amount of the users on this forum are running it. So I'd imagine it runs okay. If nothing else it probably runs stable enough to be used until a non-beta release of 8.0.1 is out. Also I've heard that swapping drives is fairly easy, but I can't say for certain how.

ProtoSD · Jul 27, 2011

A bad disk shouldn't screw up your access to the GUI. How are your disks configured ZFS, UFS, raidz1, raidz2?

There's not a way that I'm aware of to create a backup config file like the GUI from the command line, but I'll admit it would be helpful. What you can do is make a backup of the /data partition, just copy the files there somewhere safe. I'm not sure how you'd import them into a new flash drive if you needed to though.

You can display the serial numbers with smartctl -a /dev/your-drive-device-names (each one with a separate command)

You shouldn't really need those though because zfs is smart enough to know the order and import them.

You can pull the bad disk and put a new one in and replace it. I did it that way and I think the threads you mentioned explain how to do it.

I don't see any problem with importing an array that is degraded.

I think most of us using 8.01 beta 4 think it's stable enough. It's actually how I was able to re-import my array after replacing a failed disk in 8.0.

Maybe your flash drive is failing and some files are corrupt? Maybe you should go for that new 2GB flash drive now and eliminate that concern.

TwinDaddyKev · Jul 27, 2011

I had similar issues. Network share didn't work. GUI didn't work. SSH works at first and when I reboot it, SSH also wouldn't connect. It ended up being a bad memory module. You might want to give it a try.

Also swapping out bad drive on FN8 Release works fine but I was only able to do it in console. The GUI hard drive swap didn't work smoothly.

voyager529 · Jul 27, 2011

protosd said:
A bad disk shouldn't screw up your access to the GUI. How are your disks configured ZFS, UFS, raidz1, raidz2?

Sorry I didn't add that - all eight drives are in a raidz1 array; 3.5TBytes usable space.

There's not a way that I'm aware of to create a backup config file like the GUI from the command line, but I'll admit it would be helpful. What you can do is make a backup of the /data partition, just copy the files there somewhere safe. I'm not sure how you'd import them into a new flash drive if you needed to though.

it would, but I think the best solution at this point is to use the CLI to get the webUI back up =)

You can display the serial numbers with smartctl -a /dev/your-drive-device-names (each one with a separate command)

Cool, much appreciated =)

You shouldn't really need those though because zfs is smart enough to know the order and import them.

Agreed - I've shuffled around the drives on the motherboard before so w00tness to ZFS there. The reason I need the serial numbers is to determine which one of them is the one that died. There's no guarantee that it's clicking or doing anything else that's externally identifiable, so I need the list of serial numbers so I know which drive to pop out.

You can pull the bad disk and put a new one in and replace it. I did it that way and I think the threads you mentioned explain how to do it.

That's ultimately the plan, but my question is whether the release notes I've read thus far indicates that the procedures for doing this in the RTM build of FreeNAS isn't quite as straightforward and transparent as the Perc5i backplanes of the servers at work =)

I don't see any problem with importing an array that is degraded.

In theory there's no difference between theory and practice, in practice there is ;-). Looks like I'm gonna be a guinea pig tomorrow evening lol.

I think most of us using 8.01 beta 4 think it's stable enough. It's actually how I was able to re-import my array after replacing a failed disk in 8.0.

Sounds like a glimmer of hope.

Maybe your flash drive is failing and some files are corrupt? Maybe you should go for that new 2GB flash drive now and eliminate that concern.

Anything is possible, of course. I did just purchase a 4GB drive to swap out the 1GB (figured that should be a good enough install volume for the time being, plus it was on sale), but where that logic starts to falter is that lighttpd says that it starts successfully, and does so when i restart the daemon manually. In the past when there's been an issue of corrupted files, it doesn't generally manifest itself in reporting a proper initialization and simply not working. I'll take another crack at it and do another search around the forums when I get home, but is there a tutorial somewhere about some basic troubleshooting steps for the webUI in the (likely) event I'm fighting two separate, unrelated fires?

Joey

ProtoSD · Jul 27, 2011

is there a tutorial somewhere about some basic troubleshooting steps for the webUI

LOL! Hmmm, I think that will be available alpha for testing after we get ZFS version 30 ;-)

it doesn't generally manifest itself in reporting a proper initialization and simply not working

True I suppose. You could dump the database (sqlite3) to a text file and see if anything looks out of place. Everything gets generated from that. I guess you could spend a bunch of time trying to troubleshoot that or just try a new install. When I upgraded to beta 2, I did a clean install and then restored my saved settings from 8.0 and the GUI was just screenfuls of errors after it rebooted, and I realized the saved settings and the database structure had changed.

voyager529 · Jul 27, 2011

I think I've officially earned my n00b dunce cap. I got home, and one of the SATA cables had jiggled loose from the drive. Plugged it in and voila, everything was happy. However, before the pointing-and-laughing ensues, I'll at least chalk several things up to learning experiences and list the positive outcomes:

-The CIFS and lighttpd worked just fine after a shutdown/startup, rather than an SSH-driven reboot. Odd, but it was effective.
-I've got another drive that will sit on my shelf and wait for one of the existing disks to die, so I won't be dependent on Newegg's shipping or Best Buy's selection of overpriced hard disks if one of these disks DOES decide to go in the near future.
-I've swapped out a loose SATA cable.
-I upgraded my version of FreeNAS, along with the USB drive it sits on.
-I have a known good config stored and backed up in the event I need to restore at some point later on.

So yes, while definitely n00b mistakes, it wasn't void of a learning experience.

Joey

ProtoSD · Jul 27, 2011

No worries, it happens to most of us at some time. It actually reminds me of an incident a long time ago... Back in the mid-90's a company I worked for had some OLD Wang systems with the 5MB dishwasher sized hard drives with big disk packs. I don't remember how it happened, but I got the IDE cable (lets just call it that), which plugged in under the floor, plugged in and offset by one set of pins (connectors weren't keyed). We were down for hours and all because of that stupid connector!

So it's good to hear things weren't as bad as you thought. One other piece of advice, get the SATA connectors with the metal clips that keep them plugged in.

Important Announcement for the TrueNAS Community.

Failed disk - FreeNAS 8 RTM - no WebUI, no CIFS

voyager529

Dabbler

Darkaine

Explorer

ProtoSD

MVP

TwinDaddyKev

Dabbler

voyager529

Dabbler

ProtoSD

MVP

voyager529

Dabbler

ProtoSD

MVP

Similar threads

Important Announcement for the TrueNAS Community.

Failed disk - FreeNAS 8 RTM - no WebUI, no CIFS

Dabbler

Explorer

MVP

Dabbler

Dabbler

MVP

Dabbler

MVP

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Failed disk - FreeNAS 8 RTM - no WebUI, no CIFS"

Similar threads