so ive tested this again, this time with consant writting to the pool from (12x disk rdz1):
dd if=/dev/urandom of=blah.dd bs=8M count=100k
1- i pulled one of the disks, FN showed DEGRADED, (and the writes as seen by disk LEDs resumed)
about 2 minutes later i pushed the disk back in, the pool went to HEALTHY (and the writes as seen by disk LEDs resumed)
about 15minutes later i did the exact same thing, with the exact same disk, same scenario.
here are the log messages
Code:
Jan 3 23:00:47 freenas (da12:mps0:0:5:0): WRITE(10). CDB: 2a 00 f0 88 4f f8 00 00 18 00 length 12288 SMID 577 terminated ioc 804b loginfo 31110d00 scsi 0 state c xfer 0
Jan 3 23:00:47 freenas (da12:mps0:0:5:0): WRITE(10). CDB: 2a 00 f0 88 4f f8 00 00 18 00
Jan 3 23:00:47 freenas (da12:mps0:0:5:0): CAM status: CCB request completed with an error
Jan 3 23:00:47 freenas (da12:mps0:0:5:0): Retrying command
Jan 3 23:00:48 freenas mps0: mpssas_prepare_remove: Sending reset for target ID 5
Jan 3 23:00:48 freenas da12 at mps0 bus 0 scbus2 target 5 lun 0
Jan 3 23:00:48 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=15242145880798891173
Jan 3 23:00:48 freenas mps0: da12: <ATA Hitachi HUS72404 A5F0> s/n PCHVHM1B detached
Jan 3 23:00:48 freenas (da12:mps0:0:5:0): WRITE(10). CDB: 2a 00 f0 88 4f f8 00 00 18 00
Jan 3 23:00:48 freenas Unfreezing devq for target ID 5
Jan 3 23:00:48 freenas (da12:mps0:0:5:0): CAM status: CCB request aborted by the host
Jan 3 23:00:48 freenas (da12:mps0:0:5:0): Error 5, Periph was invalidated
Jan 3 23:00:48 freenas GEOM_MIRROR: Device swap1: provider da12p1 disconnected.
Jan 3 23:00:48 freenas (da12:mps0:0:5:0): Periph destroyed
Jan 3 23:00:50 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=15242145880798891173
Jan 3 23:02:51 freenas mps0: SAS Address for SATA device = 737e5c1790c3dcb4
Jan 3 23:02:51 freenas mps0: SAS Address from SATA device = 737e5c1790c3dcb4
Jan 3 23:02:51 freenas da12 at mps0 bus 0 scbus2 target 5 lun 0
Jan 3 23:02:51 freenas da12: <ATA Hitachi HUS72404 A5F0> Fixed Direct Access SPC-4 SCSI device
Jan 3 23:02:51 freenas da12: Serial Number PCHVHM1B
Jan 3 23:02:51 freenas da12: 600.000MB/s transfers
Jan 3 23:02:51 freenas da12: Command Queueing enabled
Jan 3 23:02:51 freenas da12: 3815447MB (7814037168 512 byte sectors)
Jan 3 23:02:51 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=14288572928810984592
Jan 3 23:02:51 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=7101654273334322128
Jan 3 23:02:51 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=13733655008980643034
Jan 3 23:02:51 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=14185762264907364071
Jan 3 23:02:51 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=18243972532056628131
Jan 3 23:02:51 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=5831967058495583272
Jan 3 23:02:52 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=12522633057879393081
Jan 3 23:02:52 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=15242145880798891173
Jan 3 23:02:52 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=18150942564094537261
Jan 3 23:02:52 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=4184126336002118391
Jan 3 23:02:52 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=4262714046301300370
Jan 3 23:02:52 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=14754857558554933470
Jan 3 23:13:43 freenas (da12:mps0:0:5:0): WRITE(10). CDB: 2a 00 f1 4e fb 70 00 00 18 00 length 12288 SMID 485 terminated ioc 804b loginfo 31110d00 scsi 0 state c xfer 0
Jan 3 23:13:43 freenas (da12:mps0:0:5:0): WRITE(10). CDB: 2a 00 f1 4e fb 70 00 00 18 00
Jan 3 23:13:43 freenas (da12:mps0:0:5:0): CAM status: CCB request completed with an error
Jan 3 23:13:43 freenas (da12:mps0:0:5:0): Retrying command
Jan 3 23:13:43 freenas mps0: mpssas_prepare_remove: Sending reset for target ID 5
Jan 3 23:13:43 freenas da12 at mps0 bus 0 scbus2 target 5 lun 0
Jan 3 23:13:43 freenas mps0: da12: <ATA Hitachi HUS72404 A5F0> s/n PCHVHM1B detached
Jan 3 23:13:43 freenas Unfreezing devq for target ID 5
Jan 3 23:13:43 freenas (da12:mps0:0:5:0): WRITE(10). CDB: 2a 00 f1 4e fb 70 00 00 18 00
Jan 3 23:13:43 freenas (da12:mps0:0:5:0): CAM status: CCB request aborted by the host
Jan 3 23:13:43 freenas (da12:mps0:0:5:0): Error 5, Periph was invalidated
Jan 3 23:13:43 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=15242145880798891173
Jan 3 23:13:43 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=15242145880798891173
Jan 3 23:15:44 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=15242145880798891173
Jan 3 23:15:44 freenas GEOM_ELI: Device mirror/swap1.eli destroyed.
Jan 3 23:15:45 freenas (da12:mps0:0:5:0): Periph destroyed
Jan 3 23:15:45 freenas GEOM_MIRROR: Device swap1: provider destroyed.
Jan 3 23:15:45 freenas GEOM_MIRROR: Device swap1 destroyed.
Jan 3 23:15:46 freenas GEOM_MIRROR: Device mirror/swap1 launched (2/2).
Jan 3 23:15:47 freenas GEOM_ELI: Device mirror/swap1.eli created.
Jan 3 23:15:47 freenas GEOM_ELI: Encryption: AES-XTS 128
Jan 3 23:15:47 freenas GEOM_ELI: Crypto: hardware
Jan 3 23:19:58 freenas mps0: SAS Address for SATA device = 737e5c1790c3dcb4
Jan 3 23:19:58 freenas mps0: SAS Address from SATA device = 737e5c1790c3dcb4
Jan 3 23:19:58 freenas da12 at mps0 bus 0 scbus2 target 5 lun 0
Jan 3 23:19:58 freenas da12: <ATA Hitachi HUS72404 A5F0> Fixed Direct Access SPC-4 SCSI device
Jan 3 23:19:58 freenas da12: Serial Number PCHVHM1B
Jan 3 23:19:58 freenas da12: 600.000MB/s transfers
Jan 3 23:19:58 freenas da12: Command Queueing enabled
Jan 3 23:19:58 freenas da12: 3815447MB (7814037168 512 byte sectors)
Jan 3 23:19:59 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=14288572928810984592
Jan 3 23:19:59 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=7101654273334322128
Jan 3 23:19:59 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=13733655008980643034
Jan 3 23:19:59 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=14185762264907364071
Jan 3 23:19:59 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=18243972532056628131
Jan 3 23:19:59 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=5831967058495583272
Jan 3 23:19:59 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=12522633057879393081
Jan 3 23:19:59 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=15242145880798891173
Jan 3 23:19:59 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=18150942564094537261
Jan 3 23:19:59 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=4184126336002118391
Jan 3 23:19:59 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=4262714046301300370
Jan 3 23:19:59 freenas ZFS: vdev state changed, pool_guid=16709539263909507437 vdev_guid=14754857558554933470
root@freenas:~ #
So to make things more interesting, i made a new (different/separate from pool above) 3 disk raidz1 (with 3x ssds as all its all i had free in the system).
THIS TIME i want to test remove DISK A, reinsert it, (so back to healthy, BUT IT DID AUTO DO A RESILVER this time, and completed in 10seconds AND ONLY for 680mb , which makes sense as the amount of urandom data DD had written while DISK A was gone)
I then remove DISK B , reinsert it (i think this should break the pool if no resilvers are done?) -- it again did a AUTO re-silver for 2.96gb (as i had left diskB pulled for a bit longer, but not much)
DISK A pull/reconnect:
pool: testSSD
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: resilvered 680M in 0 days 00:00:10 with 0 errors on Fri Jan 4 01:55:48 2019
config:
DISK B pull/reconnect:
pool: testSSD
state: ONLINE
scan: resilvered 2.96G in 0 days 00:00:38 with 0 errors on Fri Jan 4 01:59:01 2019
config:
(at this point i know the scenario is pretty a normal as ZFS is auto doing, and completing its resilvering)
now with the pool back to healthy (after A and B pull/replace and their auto resilver), i wanted to try this with DISK C , i pull that one, and the disk IO on the other 2 stops for good, for once. and when i re-insert it a minute or 2 later, still no disk io. The pool shows as healthy too. In the alerts though it says "unable to open /dev/ada3" so its kindof hung. (i think this is an issue either with the SSD or more likely the intel sas SCU,).
tomorrow im going to test my 12x HDD pool with some more pull and replace to see if i see it do the auto-resilver there. (it certainly was not showing that in the zpool status, like it did on this 2nd, 3x SSD pool).
(so far, ive been very impressed with ZFS 's performance/handling of all this BS im throwing at it to better learn)