'zpool clear' or 'zpool replace'.

Status
Not open for further replies.

swamphox1

Dabbler
Joined
Aug 17, 2015
Messages
33
Is there anyway to know what drives are failing? SMART reporting is turn on and has been. I created a new debug file and here is the link. I have little confidence that the SMART report is there since I did not change anything. It is on though. Could I have made a change in the settings as to what or where it is stored?

I do need to remedy the situation quickly because I really don't want to lose any data. Should I shut this thing down?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Are you deleting your posts after you make them? Because I've gotten 2-3 emails today that you've posted to this thread (with the contents of your post), but when I come to the thread, the message isn't here.

From what I can see in your debug file, ada3 is giving you lots of errors. That makes it a logical candidate for the failing drive. To identify it, do this:
  • From the shell, 'smartctl -a /dev/ada3 | more'
  • Note the serial number of the drive
  • Shut down the server and unplug the SATA cable from that drive, matching the serial number from the smartctl output with the one on the drive label
  • Boot up the server, and make sure the pool is still available (this step confirms you pulled the right drive--with your pool configuration as it is, it will not import if you disconnect any other disk)
  • If the pool did import, you know you disconnected the right disk. Shut down the server, install a replacement disk, reboot, and replace the disk using the instructions in the manual for replacing a failed disk.
And once that's done, do whatever you need to to back up your data and rebuild your pool in a safe way, because right now it's in bad shape.
 

swamphox1

Dabbler
Joined
Aug 17, 2015
Messages
33
I am not deleting but what I have noticed is that I try to respond and hit post reply and I had not typed into the correct box and I get an error saying that there is nothing in the box. Surprised it is sending you a notice if it gave me an error. If indeed is what is going on. I am not deleting anything.

I have not mentioned how much I appreciate your help, so thanks.

When you use the word "pool" are you referring to all of the data on the system? If I pull the wrong drive I should not be able to access any of the pool, correct?

Can you please advise me as to how I might better configure my NAS so that there is more redundancy when something like this or worse happens? I know that making back up of back up of backup is the recommendation but how, in terms of how freenas is configured, should I proceed?
 

swamphox1

Dabbler
Joined
Aug 17, 2015
Messages
33
Do you think the drive is failing?

The pool was not available after rebooting with info provided from 'smartctl -a /dev/ada3 | more' I don't know if it should take 10x longer but after 5 mins the pool was not available.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Yes, what FreeNAS calls a Volume, ZFS calls a pool. It's possible to have multiple pools, but you (and most users) only have one. So, yes, if you pull the wrong drive, you shouldn't be able to access any of your data.

Your configuration right now has two disks (call them 1 and 2) mirrored, striped with two more (call them 3 and 4). Right now, we'll say that #2 has failed. Since it's mirrored with #1, your data remains intact. Once you replace #2, it can fail again, or #1 can fail, without harming your data. However, if #3 or #4 fails, your data is gone forever. I'd guess you initially built this server with two disks, and later added another disk, and later yet another, as you needed more space.

The best answer on pool layout (and a lot of other things you should know) is @cyberjock's powerpoint at https://forums.freenas.org/index.ph...ning-vdev-zpool-zil-and-l2arc-for-noobs.7775/. In brief, though, your same four disks in RAIDZ1 would give you the same capacity you have now, but tolerate the loss of any one of those disks. That is, any single disk could fail without harm to your data. A better solution would be to use RAIDZ2, probably with more or larger disks. RAIDZ2 will tolerate the loss of any two disks. It will also protect against a read error when resilvering after the loss of one disk. But, of course, it takes more space for parity.

I do think one of your disks is failing, or has already failed. If your server has booted with ada3 disconnected, but the pool isn't available, then ada3 was the wrong one. There's a better way to definitely identify which one it is, which I'd forgotten when I last posted. It takes a couple of steps at the shell.

First, run 'glabel status'. That will return a list of gptid numbers (long numbers), along with the partitions they refer to. Find the disk that corresponds to "gptid/7ace49e7-733e-11e3-ae8b-d850e6db3a84". This is the disk in your pool that's failed. It will say something like "ada2p2", but you aren't interested in the "p2" part.

Second, do 'smartctl -a /dev/adaN | more', where 'adaN' is the disk number that's returned from glabel status. As before, note the serial number, power down the server, plug in the disk you previously unplugged, unplug the disk that matches the serial number, and reboot. If your pool is now available, you'll know you unplugged the right one, and can proceed with the replacement.
 

swamphox1

Dabbler
Joined
Aug 17, 2015
Messages
33
I pulled another dive in the order in which they are stacked. I was able to access the pool.
 

swamphox1

Dabbler
Joined
Aug 17, 2015
Messages
33
I guess all I really need right now is a little instruction or resource as to how can replace the failed drive.

Thanks again
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
The manual, linked in my previous post, as well as at the top of each forum page, has click-by-click instructions for replacing failed disks.
 

swamphox1

Dabbler
Joined
Aug 17, 2015
Messages
33
I ask for a little more help because it crucial I don't mess it up.

6.3.12 Replacing a Failed Drive If you are using any form of redundant RAID, you should replace a failed drive as soon as possible to repair the degraded state of the RAID. Depending upon the capability of your hardware, you may or may not need to reboot in order to replace the failed drive. AHCI capable hardware does not require a reboot. NOTE: a stripe (RAID0) does not provide redundancy. If you lose a disk in a stripe, you will need to recreate the volume and restore the data from backup. FreeNAS® 9.2.0 Users Guide Page 128 of 274 Before physically removing the failed device, go to Storage → Volumes → View Volumes → Volume Status and locate the failed disk. Once you have located the failed device in the GUI, perform the following steps: 1. If the disk is formatted with ZFS, click the disk's entry then its “Offline” button in order to change that disk's status to OFFLINE. This step is needed to properly remove the device from the ZFS pool and to prevent swap issues. If your hardware supports hot-pluggable disks, click the disk's “Offline” button, pull the disk, then skip to step 3. If there is no “Offline” button but only a “Replace” button, then the disk is already offlined and you can safely skip this step.

I have located the failed drive, I would assume since it is a ZFS volume that all the drives are formatted ZFS. How does one "click the disk's entry then its “Offline” button in order to change that disk's status to OFFLINE"? That is very confusing instruction when those options do not exist in the GUI.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
It exists do what the directions tell you. Go to Storage→ Volumes → View Volumes → Volume Status click the failed disk and then offline button.
 

swamphox1

Dabbler
Joined
Aug 17, 2015
Messages
33
I think I read screen shots are not allowed.

Let me say that there is either a something missing in the instructions, WRONG nomenclature or it is not there.
 

swamphox1

Dabbler
Joined
Aug 17, 2015
Messages
33
The bad disk does not even show up in Volume Status but it does show up in "View Disks"
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Screen shots are allowed, and even encouraged in appropriate circumstances. Yours may be one of them. What does the volume status screen look like?
 

swamphox1

Dabbler
Joined
Aug 17, 2015
Messages
33
I had to shut it down and now I can't get back in. Even if there is a bad drive what would prevent freenas from booting or not allowing me to get in the GUI?
 

swamphox1

Dabbler
Joined
Aug 17, 2015
Messages
33
There has to be something wrong with the disc that is running the system. Is there a way to intall freenas on usb drive and remove the old drive?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Sure--installing onto a USB stick is the recommended way of installing anyway. The manual will walk you through doing that as well.
 

swamphox1

Dabbler
Joined
Aug 17, 2015
Messages
33
Before I go forward with that, is it possible to put 9.3 on a USB stick and still access the GUI of the existing volume?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Yes the USB stick has nothing to do with your volume. Make sure to backup your configuration and auto import your pool after the reinstall.
 

swamphox1

Dabbler
Joined
Aug 17, 2015
Messages
33
I must be missing something. Everything shows up on the GUI but I can't access the volume. Any thoughts. I am not sure if I imported correctly or if I should have.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
How are you accessing the pool? Can you see your data using the cli? Did the pool even mount?
 
Status
Not open for further replies.
Top