Can not shutdown after failed syslog/jail pool

Status
Not open for further replies.

indy

Patron
Joined
Dec 28, 2013
Messages
287
I got an unfriendly email today:

Code:
This message was generated by the smartd daemon running on:

   host name:  freenas
   DNS domain: local

The following warning/error was logged by the smartd daemon:

Device: /dev/ada0, unable to open device

Device info:
STT_FTM64GX25H, S/N:P612102-MIBY-208A016, FW:1916, 64.0 GB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
No additional messages about this problem will be sent.


This is a single ssd hosting the pool with syslog and jails on it.
The ssd has been generating errors during scrubs for ages, but they were always fixable since copies=2 was set.
I always wanted to migrate the pool to a different ssd but unfortunately was too lazy to do it in time.
The drive seems to have fully failed now, but the loss of that pool is not a problem in itself.
I dont care for the contents of this pool as my data is on another.

However there are other problems:
Trying to login via ssh fails at login, the web interface still somewhat works however.

What I have done so far:

1) Checked the zpool status via the web interface shell:
Code:
[root@freenas ~]# zpool status tank                                                                                                 
  pool: tank                                                                                                                        
 state: UNAVAIL                                                                                                                     
status: One or more devices are faulted in response to IO failures.                                                                 
action: Make sure the affected devices are connected, then run 'zpool clear'.                                                       
   see: http://illumos.org/msg/ZFS-8000-JQ                                                                                          
  scan: scrub repaired 12K in 0h0m with 0 errors on Fri Nov 28 01:00:54 2014                                                        
config:                                                                                                                             
                                                                                                                                    
        NAME               STATE     READ WRITE CKSUM                                                                               
        tank               UNAVAIL     15   102     0                                                                               
          102399081654048  REMOVED      0     0     0  was /dev/gptid/3fd35755-a16f-11e3-bbce-002590f062ca


2) Tried to change the syslog pool to the functional one.
Not sure if that worked out since the web interface got stuck.

2) Tried to reboot / shutdown
Nothing seems to happen after the shutdown message, I can still open up the web interface.

3) Tried to lock (the encrypted) pool with my data on it to prevent any faulty actions regarding the file system.
Nothing seems to happen again, the web interface gets stuck on the popup message

4) Tried to shutdown via the ipmi kvm console.
Again the system does not shutdown as commanded.
Additionally the console keeps throwing up the error:

Code:
vm_fault: pager read error, pid ##### (tail)



I would really appreciate help with shutting down the system gracefully.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Are you using raid? Why do you show only one device in your pool?
 

indy

Patron
Joined
Dec 28, 2013
Messages
287
The failed pool consisted of only one device, the failed ssd.
This pool contained the jails and system dataset.
 

no_connection

Patron
Joined
Dec 15, 2013
Messages
480
I would not be surprised if something hangs when trying to parse corrupt .system data to GUI, or just waits for it forever.

Ether way it *should* have been handled by exceptions and whatnot to make sure they system as a whole never becomes unreliable/unstable. In my opinion of course.

Hope it works out.
 

indy

Patron
Joined
Dec 28, 2013
Messages
287
Another thing I tried was shutting down the processes that tried to access the failed pool.

freenas.png


For example I entered
Code:
/etc/rc.d/syslogd stop
kill -KILL 7178

into the web-shell but neither does anything and the process refuses to die.

Another idea of mine is to export the (encrypted) pool with data on it via the console and redo the whole Freenas installation, but I would really love some input on this.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I think redoing your configuration would solve the issue. Might be a fair amount of work depending on how complicated your setup is. And since you have an encrypted pool make sure you follow all the proper steps in the manual that relate to encryption.

Maybe just removing the failed pool then rebooting would fix the issue also.
 

indy

Patron
Joined
Dec 28, 2013
Messages
287
Do you have any advice how I would safely unmount the healthy pool?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421

indy

Patron
Joined
Dec 28, 2013
Messages
287
The problem is that the system does not shut down and the web interface does not react.
Only thing left working is the console.

Can i safely use
Code:
zpool export vol1

from the console to export the healthy (encrypted) pool?
I have both Key and Recovery Key saved as per manual.
Afterwards I would redo the whole installation and re-import the pool.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I think you can but i have never bothered to export a pool.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You can export it that way, but you're kind of in this "weird place" because your .system is fubared. Probably shouldn't have let the .system dataset get to that kind of "broke".

If worse comes to worse, just turn off the box, unplug the SSD that is bad, then bootup the system. All should be straightened out.
 

indy

Patron
Joined
Dec 28, 2013
Messages
287
So, I did the "zpool export -f vol1" command from the KVM console which went through without error.
After that I tried to do the same thing for the failed pool (in the hope of unfreezing the system) and that locked up the last functioning console.
Since there were no options left anyway I just switched the system off.
I redid the whole installation and setup on a new stick since I did not really trust what was left from that debacle.
Anyhow, my data pool seems to have imported just fine!

The new pool for the system dataset with less errors, more redundancy and Intel SSDs this time :)
Code:
[root@freenas] ~# zpool status vol0
  pool: vol0
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Wed Dec  3 19:30:53 2014
config:

        NAME                                            STATE     READ WRITE CKSUM
        vol0                                            ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/69ed7a0f-7b16-11e4-b34b-002590f062ca  ONLINE       0     0     0
            gptid/6a11a547-7b16-11e4-b34b-002590f062ca  ONLINE       0     0     0

errors: No known data errors


Thank you guys for helping me out!
 
Status
Not open for further replies.
Top