Can't reboot or shutdown when an I / O suspending

Status
Not open for further replies.

yangjie

Cadet
Joined
Sep 17, 2014
Messages
8
Hello,everyone.
Let's start.....
Code:
[root@supool] ~# uname -a
FreeBSD supool.local 8.3-RELEASE-p8 FreeBSD 8.3-RELEASE-p8 #1 r252195M: Mon Aug 18 19:30:12 CST 2014     root@karas.karas:/root/trunk/os-base/amd64/root/trunk/FreeBSD/src/sys/FREENAS.amd64  amd64
[root@supool] /# camcontrol devlist
<SEAGATE ST4000NM0023 MS00>        at scbus2 target 9 lun 0 (da16,pass0)
<SEAGATE ST3450857SS XREC>         at scbus2 target 10 lun 0 (da17,pass1)
<SEAGATE ST31000424SS XRMA>        at scbus2 target 13 lun 0 (da19,pass3)
<SEAGATE ST31000424SS XRMA>        at scbus2 target 14 lun 0 (da20,pass4)
<SEAGATE ST31000424SS XRMA>        at scbus2 target 15 lun 0 (da21,pass5)
<GOOXI Bobcat 0d00>                at scbus2 target 16 lun 0 (ses0,pass6)
<Innostor Innostor 1.00>           at scbus8 target 0 lun 0 (da22,pass7)
[root@supool] /# zpool create tank raidz /dev/da19 /dev/da20 /dev/da21
[root@supool] /# zpool list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
tank  2.72T   180K  2.72T     0%  1.00x  ONLINE  -
[root@supool] /# zfs create tank/fs 
[root@supool] /# zfs list
NAME      USED  AVAIL  REFER  MOUNTPOINT
tank      171K  1.78T  41.3K  /tank
tank/fs  40.0K  1.78T  40.0K  /tank/fs
[root@supool] /# cd /tank/fs
[root@supool] /tank/fs# touch  test_file
[root@supool] /tank/fs# zpool status
  pool: tank
state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    tank        ONLINE       0     0     0
      raidz1-0  ONLINE       0     0     0
        da19    ONLINE       0     0     0
        da20    ONLINE       0     0     0
        da21    ONLINE       0     0     0

errors: No known data errors
[root@supool] /tank/fs#
/*now ,I ‘dd’ some data to test_file,it will take a long time*/
[root@supool] /tank/fs# dd if=/dev/zero of=/tank/fs/test_file bs=1M count=100000
/*then ,I unplug two disks of three disks. I find ‘dd’ don’t work.It seems to be suspended.
I try to use “kill” command to end the process of “dd”,but it don’t work.’dd’ can’t to be killed.
Now,I shutdown it, Similarly,it don’t work.
*/
Connecting to another ssh
[root@supool] ~# camcontrol devlist
<SEAGATE ST4000NM0023 MS00>        at scbus2 target 9 lun 0 (da16,pass0)
<SEAGATE ST3450857SS XREC>         at scbus2 target 10 lun 0 (da17,pass1)
<SEAGATE ST31000424SS XRMA>        at scbus2 target 13 lun 0 (da19,pass3)
<GOOXI Bobcat 0d00>                at scbus2 target 16 lun 0 (ses0,pass6)
<Innostor Innostor 1.00>           at scbus8 target 0 lun 0 (da22,pass7)
[root@supool] ~# ps –ax
…………
56057   1  D+     0:00.56 dd if=/dev/zero of=/tank/fs/test_file bs=1M count=100000
56890   2  Ss     0:00.01 -csh (csh)
57694   2  R+     0:00.00 ps -ax
41569   0  Is+    0:00.03 -csh (csh)
[root@supool] ~# kill -9 56057
[root@supool] ~# ps –ax
56057   1  D+     0:00.56 dd if=/dev/zero of=/tank/fs/test_file bs=1M count=100000
56890   2  Ss     0:00.01 -csh (csh)
58520   2  R+     0:00.00 ps -ax
41569   0  Is+    0:00.03 -csh (csh)
/*at the time , ZFS does not reflect.*/
[root@supool] ~# zpool status
  pool: tank
state: DEGRADED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-HC
  scan: none requested
config:

    NAME                      STATE     READ WRITE CKSUM
    tank                      DEGRADED     0 2.75K     0
      raidz1-0                DEGRADED     0 2.79K     0
        da19                  ONLINE       0     0     0
        10105509833566695808  REMOVED      0     0     0  was /dev/da20
        da21                  ONLINE       3 3.49K     0

errors: 2817 data errors, use '-v' for a list
[root@supool] ~#
/*I excute ‘zpool clear tank’,then look the ‘zpool status’*/
[root@supool] ~# zpool clear tank
cannot clear errors for tank: I/O error
[root@supool] ~# zpool status
  pool: tank
state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-HC
  scan: none requested
config:

    NAME                      STATE     READ WRITE CKSUM
    tank                      UNAVAIL      0     0     0
      raidz1-0                UNAVAIL      0     0     0
        da19                  ONLINE       0     0     0
        10105509833566695808  REMOVED      0     0     0  was /dev/da20
        9442496166343931282   UNAVAIL      0     0     0  was /dev/da21

errors: 4224 data errors, use '-v' for a list
/*I try to destroy ‘tank’,but the command is suspended.*/
[root@supool] ~# zpool destroy -f tank suspend
/*’dd’ alse is suspended*/
[root@supool] /tank/fs# dd if=/dev/zero of=/tank/fs/test_file bs=1M count=100000->suspend

/*Now ,ZFS can’t do anything.I ready to reboot*/

[root@supool]/#reboot
......
Sep 27 02:11:54 supool reboot:rebooted by root
Sep 27 02:11:54 supool syslogd:exiting on signal 15
Waiting .........................'vnlru'.....done
Waiting .........................'bufdaemon' ....done
Waiting ..........................'syncer' to stop ....time out
Syncing disks,buffers remaining ... 4 2 1
Final sync complete
/*nothing,It has remained so,I can't reboot*/
There is no workaround.


Any one knows?
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
A wild guess: one of the devices connected using USB ?

Anyway... Your activities are beyond FreeNAS environment. If you want to do anything of that type, you do not need (you do not want!) FreeNAS, just install FreeBSD...

Go to http://doc.freenas.org/ to learn how to use FreeNAS.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, you created a pool from the command line. That's already breaking FreeNAS' design (the manual tells you how to create pools... the only way to create pools). At that point I didn't read any farther because it's not a "real world" scenario.
 

yangjie

Cadet
Joined
Sep 17, 2014
Messages
8
Well, you created a pool from the command line. That's already breaking FreeNAS' design (the manual tells you how to create pools... the only way to create pools). At that point I didn't read any farther because it's not a "real world" scenario.
Through the 'FreeNAS' to create pool is the same to create pool by 'command line'.Even use “FreeNAS” to create pool, the result is the same.Because I find it by 'FreeNAS' at the first time.Key issues is that it does not shut down or reboot.Are there solutions to do?
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Yes, there is a solution. An easy one. Actually, it is not even a solution. It is one of IT fundamentals. Have two separate sets of disks:
  1. data
  2. .system and swap
By the way, you are forgetting that since you did not follow FreeNAS procedures (the manual), some of your actions are going against built-in defaults, consequently the FreeNAS framework cannot operate properly...

P.S. I do have two separate sets of disks.
 

yangjie

Cadet
Joined
Sep 17, 2014
Messages
8
Yes, there is a solution. An easy one. Actually, it is not even a solution. It is one of IT fundamentals. Have two separate sets of disks:
  1. data
  2. .system and swap
By the way, you are forgetting that since you did not follow FreeNAS procedures (the manual), some of your actions are going against built-in defaults, consequently the FreeNAS framework cannot operate properly...

P.S. I do have two separate sets of disks.
But I still can't to reboot or shutdown if I in such cases.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
And why are you surprised by this at all? In your first post you somehow expected a different result after a pool ended up being offline by removing 2 of 3 disks.

This threads sounds exactly like you don't know how this technology works and are expecting something else (what you are expecting isn't obvious).

But yes, you remove 2 of 3 disks from a pool that is a single RAIDZ1 and *bad things will happen*.

So... what's the problem???
 

mjws00

Guru
Joined
Jul 25, 2014
Messages
798
I think he just wants a clean reboot without punching the reset button. Failure of a hard disk, or two, or all on a data set should not lock up the system. We should be able to exit gracefully in all cases of failure.

Showing pool creation via CLI was the most efficient method for describing each step to reproduce the issue, reduces language issues. We could easily do this from the GUI.

Seems to me there is a communication issue more than a lack of understanding or intelligence. You'd know better than I, cyber, but is this possibly syslog and samba puking as we are writing by default to our "data" pools instead of dedicated space. I haven't played with this... but with the current default scheme if we lose the "main" pool (even momentarily due to a disconnect) we hang the entire system. Doesn't seem awesome.

@yangjie Try disabling syslog, and samba momentarily, not too mention kill the dd process if it survived... I don't know what else would lock shutdown. Hopefully we can get better info to the bsd guys. Gut says solarisguy is dead on. Unfortunately by default the FreeNAS appliance on a single pool is mixing .swap/system with data and leading to this. Following all the rules still leads to this spot.

Good luck, it seems to me this is a compromise in design choice. The nature of NanoBSD and unwritable mixed with Samba4 needing persistent storage leaves no better options. You can use separate pools, or wait for 9.3 with a zfs root. Maybe cyberjock has a more clever idea.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Assuming this is correct, I'm not seeing a difference between hitting the reset/power button and a graceful shutdown (if there is such a thing for this scenario.. haha).

You have no shares to access, nothing. The USB stick is read-only, so there's not much risk of things going wrong there. The pool is obviously not in a "clean" state but there's no chance you're going to be able to put it in a clean state as the file system is incomplete the second the second drive is removed(aka failed).

The only thing a graceful shutdown will give you is a semi-"clean" shutdown of the services that were running (assuming the didn't need the pool). But who cares if those services weren't shutdown cleanly as they'll be right back up on a reboot?

Again, not seeing where this is actually a problem. We all knows what happens when your pool goes away, not sure why this is:

1. Shocking.
2. Not expected.
3. A big deal.

Not sure which applies...

This seems 100% normal, the "best case scenario", and "all you can expect" for the design. If this kind of thing happened to me I'd be much more freaked out about whether the pool would mount on reboot or not than actually pressing the reset button.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
@yangjie, you are not providing enough of information...

If you tested my recommended setup, you did not share how exactly it looked like so I could comment on it.

If you're just testing ZFS interaction with the operating system, then FreeNAS is not the right place for such tests. You should perform them in FreeBSD, and FreeNAS will benefit from the improvements, as it is always based on FreeBSD code. Then if a bug is found it is known that it belongs to the FreeNAS layer.

I am not sure what exactly you are trying to accomplish...
  • ZFS (regardless of the operating system) will not complete writes when not enough of the devices are present in the zpool. That is by design! Remember ZFS tries to not loose data...
  • What are you trying to accomplish by rebooting a system with missing devices in a zpool while writing to it? Loose data? If you are trying to loose data, then ZFS was a very bad choice in the first place! ZFS design principles assume that one keeps replacing the failed devices while the pool still has enough of the good ones!
  • As cyberjock eluded to, with the enough of failed disks in your zpool your data is gone. So you might as well just unplug the power cord...
  • ZFS, like any technology, has certain limitations, and also rules and expectations. One of ZFS expectations is that a failed disk is being replaced↑, and not that a disk failure is followed by a reboot! Did you try to replace the disk(s) and see whether a reboot is possible?
  • If to all of the above you say that you just wanted to test..., then you had overlooked that ZFS tries not to loose data, so the lack of the disks in the zpool does not mean that these devices are truly gone. Maybe a cable was temporarily disconnected or a disk was taken out by mistake (and that does happen both at home and in the data center...) etc.
By the way, my advice on separating data from .system and swap prevents some race conditions and lockups that might be occurring in some borderline cases when using FreeNAS environment. For other operating systems, the rules stay the same, just implementation details are different.
 

yangjie

Cadet
Joined
Sep 17, 2014
Messages
8
Thank your response.
You know,CentOS with ZFS don't happen .At this case,centos can destroy the pool or reboot.So I can't understand that FreeBSD can't do it. Why ?I will provide the relevant information about centos.
 

yangjie

Cadet
Joined
Sep 17, 2014
Messages
8
I think he just wants a clean reboot without punching the reset button. Failure of a hard disk, or two, or all on a data set should not lock up the system. We should be able to exit gracefully in all cases of failure.

Showing pool creation via CLI was the most efficient method for describing each step to reproduce the issue, reduces language issues. We could easily do this from the GUI.

Seems to me there is a communication issue more than a lack of understanding or intelligence. You'd know better than I, cyber, but is this possibly syslog and samba puking as we are writing by default to our "data" pools instead of dedicated space. I haven't played with this... but with the current default scheme if we lose the "main" pool (even momentarily due to a disconnect) we hang the entire system. Doesn't seem awesome.

@yangjie Try disabling syslog, and samba momentarily, not too mention kill the dd process if it survived... I don't know what else would lock shutdown. Hopefully we can get better info to the bsd guys. Gut says solarisguy is dead on. Unfortunately by default the FreeNAS appliance on a single pool is mixing .swap/system with data and leading to this. Following all the rules still leads to this spot.

Good luck, it seems to me this is a compromise in design choice. The nature of NanoBSD and unwritable mixed with Samba4 needing persistent storage leaves no better options. You can use separate pools, or wait for 9.3 with a zfs root. Maybe cyberjock has a more clever idea.
Yes,you are right.“I think he just wants a clean reboot without punching the reset button. Failure of a hard disk, or two, or all on a data set should not lock up the system. ” is my meaning.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, you didn't simulate a failure. You unplugged a drive while it was online. That's not "exactly" like an actual failure. And if you read the FreeNAS manual it says to never unplug a drive without informing the OS first. You broke that warning in the manual. So why would you expect anything less than "not a good turnout".

And at the point that you've lost 2 disks with RAIDZ1, whether it be the result of them failing or you unplugging them, you are NEVER going to like the turnout. More than 1 disk missing from the pool = no pool.
 

yangjie

Cadet
Joined
Sep 17, 2014
Messages
8
Well, you didn't simulate a failure. You unplugged a drive while it was online. That's not "exactly" like an actual failure. And if you read the FreeNAS manual it says to never unplug a drive without informing the OS first. You broke that warning in the manual. So why would you expect anything less than "not a good turnout".

And at the point that you've lost 2 disks with RAIDZ1, whether it be the result of them failing or you unplugging them, you are NEVER going to like the turnout. More than 1 disk missing from the pool = no pool.
Yes,you are right.But in this case ,I't shouldn't that it can not be reboot.It can be the problem about FreeBSD OS itself or ZFS .FreeNAS is ok.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You just don't understand. No, it's not that it should still reboot. You're in no-man's-land.

The manual says to not just unplug disks without going through the WebGUI and detaching them. The WebGUI will not let you detach them all.

You're basically doing things that are impossible in the real world because of procedure and process, then are upset because the outcome isn't "appealing". This doesn't prove a problem with FreeNAS or FreeBSD. It proves that our warnings that you shouldn't unplug disks is still warranted and shouldn't be removed.
 
Status
Not open for further replies.
Top