NFS share went down - Jail is frozen - System unstable

overshoot

Explorer
Joined
Jul 16, 2019
Messages
80
Hello,

I have an NFS shared mounted to my FreeNAS which is then mounted to my backup jail which is using rsnapshot.
Because of a power cut, that NFS share went down.
Now the jail is not accessible from console and I can't even kill the rsync process.
The FreeNAS seems less and less responsive and I can't seem to resolve the situation.

I have read that a hard reboot could be my only option but since I am stuck at home and my colleague are accessing the Server remotely, I am afraid the Server will go down and possibly not come back up.
I am unable to access my office for an extended period of time because of a lockdown where I live.

Is there anything I can do to kill that jail?

Thanks for your help.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
What kind of jail is it? warden or iocace

What do you see from iocage list ?
 

overshoot

Explorer
Joined
Jul 16, 2019
Messages
80
This is an iocage jail.
The "iocage list" is just hanging now where it used to show me the 2 jails a day or 2 ago.

Thanks for your answer.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
What about jls ?
 

overshoot

Explorer
Joined
Jul 16, 2019
Messages
80
It works:

JID IP Address Hostname Path
3 freebsd /mnt/Main/iocage/jails/Rsnapshot/root
4 Unifi /mnt/Main/iocage/jails/Unifi/root

The jail ID 3 is the problematic one.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
OK, so how about jexec 3 ps -A ?
 

overshoot

Explorer
Joined
Jul 16, 2019
Messages
80
Works!

PID TT STAT TIME COMMAND
3558 - IJ 0:00.00 cron: running job (cron)
3560 - DsJ 0:00.06 /usr/local/bin/perl -w /usr/local/bin/rsnapshot gamma
16842 - IJ 0:00.00 cron: running job (cron)
16848 - ZJ 14:30.71 <defunct>
27340 - DJ 2:10.48 /usr/local/bin/rsync -a --delete --numeric-ids /mnt/BAC
27341 - DJ 2:45.56 /usr/local/bin/rsync -a --delete --numeric-ids /mnt/BAC
27342 - ZJ 0:03.50 <defunct>
48074 - IJ 0:00.00 cron: running job (cron)
48077 - IsJ 0:00.00 /bin/sh - /usr/sbin/periodic daily
48115 - IJ 0:00.00 lockf -t 0 /var/run/periodic.daily.lock /bin/sh /usr/sb
48116 - IJ 0:00.01 /bin/sh /usr/sbin/periodic LOCKED daily
48125 - IJ 0:00.01 /bin/sh /usr/sbin/periodic LOCKED daily
48126 - IJ 0:00.00 mail -E -s freebsd daily run output root
48349 - IJ 0:00.00 /bin/sh /etc/periodic/daily/450.status-security
48350 - IJ 0:00.00 /bin/sh - /usr/sbin/periodic security
48352 - IJ 0:00.00 lockf -t 0 /var/run/periodic.security.lock /bin/sh /usr
48353 - IJ 0:00.00 /bin/sh /usr/sbin/periodic LOCKED security
48361 - IJ 0:00.00 /bin/sh /usr/sbin/periodic LOCKED security
48362 - IJ 0:00.00 mail -E -s freebsd daily security run output root
48363 - IJ 0:00.00 /bin/sh - /etc/periodic/security/100.chksetuid
48367 - DJ 0:00.01 find -sx / /dev/null ( ! -fstype local ) -prune -o -typ
48368 - IJ 0:00.00 /bin/sh - /etc/periodic/security/100.chksetuid
48370 - IJ 0:00.00 cat
58248 - IJ 0:00.00 cron: running job (cron)
58250 - IsJ 0:00.00 /bin/sh - /usr/sbin/periodic weekly
58573 - IJ 0:00.00 lockf -t 0 /var/run/periodic.weekly.lock /bin/sh /usr/s
58574 - IJ 0:00.01 /bin/sh /usr/sbin/periodic LOCKED weekly
58583 - IJ 0:00.00 /bin/sh /usr/sbin/periodic LOCKED weekly
58584 - IJ 0:00.00 mail -E -s freebsd weekly run output root
58585 - IJ 0:00.00 /bin/sh - /etc/periodic/weekly/310.locate
58590 - INJ 0:00.00 su -fm nobody
58591 - INJ 0:00.00 _su -m -f (csh)
58593 - INJ 0:00.00 /bin/sh /usr/libexec/locate.updatedb
58601 - DNJ 0:00.01 find -s / ! ( -fstype tmpfs -or -fstype zfs ) -prune -o
58602 - INJ 0:00.00 /bin/sh /usr/libexec/locate.mklocatedb -presort
58604 - INJ 0:00.00 locate.code /tmp/locateC1VchSzylD/mklocateuTu61FdwTc/_m
78242 - IJ 0:00.00 cron: running job (cron)
78246 - DsJ 0:00.06 /usr/local/bin/perl -w /usr/local/bin/rsnapshot beta
79412 - IJ 0:00.00 cron: running job (cron)
79416 - DsJ 0:00.06 /usr/local/bin/perl -w /usr/local/bin/rsnapshot beta
83368 - SsJ 0:06.80 /usr/sbin/syslogd -c -ss
83437 - IsJ 0:00.00 /usr/sbin/sshd
83441 - IsJ 0:11.28 /usr/sbin/cron -J 15 -s
88029 - IJ 0:00.00 cron: running job (cron)
88042 - DsJ 0:00.06 /usr/local/bin/perl -w /usr/local/bin/rsnapshot beta
94196 - IJ 0:00.00 cron: running job (cron)
94213 - DsJ 0:00.06 /usr/local/bin/perl -w /usr/local/bin/rsnapshot beta
58689 6 R+J 0:00.00 ps -A
 

overshoot

Explorer
Joined
Jul 16, 2019
Messages
80
I've tried: jexec 3 /etc/rc.shutdown
jexec: execvp: /etc/rc.shutdown: Permission denied
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
How about figuring out if you can find the process that's killing things...

jexec 3 top
 

overshoot

Explorer
Joined
Jul 16, 2019
Messages
80
I've tried killing the rsnapshot and rsync processes that were stuck because of the disconnection of the NFS share where these were writing.
I've tried a "kill -9 processID" but even though I am getting no error, the process are still there and it's been days now.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Have you tried the kill with the jexec?

The processes you can see from the host include the jail processes, but the host can't kill those processes normally.
 

overshoot

Explorer
Joined
Jul 16, 2019
Messages
80
Yes I did but it did the same, nothing.
I ended up restarting the Server completely and it worked, everything went back to normal.
Thanks for your help.
 
Top