Slow spa_sync, can't boot/mount volume after hard reset

prodigyman

Cadet
Joined
Mar 24, 2020
Messages
3
Hello everyone,

I've got a 11.2U6 system with following specs
HPE Apollo 4500 Blade Gen 8, 192GB DDR3 ECC RAM, 2x E5-2450v2
HPE P420i R60 presenting single 87.3 TiB volume to FreeNAS for purpose of iSCISI archival very low OIPS/load sharing only.

After 8 month of runtime the FreeNAS os just froze, was able to ping the MGMT IP, but GUI was not coming up and iSCISI sharing was not working, actually sent all ESXi hosts connected to it to PDL state.

Had to hard reboot the host since console was unresponsive and after that exactly what described in this post is happening


After hard reboot it tries to import the original ZFS volume on boot and then after displays following commands below and then after 5-8 hours just freezes the system, please note the drive are indeed being used and read all the time (flash all the time, more proof on that after)

Beginning ZFS volume imports....
slow spa_sync: started 1660 seconds ago, calls 108


I've tried to install 11.3U1 it booted and import the volume and it's just stuck adding it in the GUI and then again after 12+ hours become unresponsive

Reading the issue in KB27409 installed a fresh 11.0U4 and tried to import it from console

root@freenas:~ # zpool import
pool: HP-P420i-R60
id: 10355478833423154548
state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
see: http://illumos.org/msg/ZFS-8000-EY
config:

HP-P420i-R60 ONLINE
gptid/403741fa-e949-11e9-95e1-00155d005401 ONLINE


So looks good, then I used:

zpool import HP-P420i-R60 -f
cannot import 'HP-P420i-R60': pool may be in use from other system, it was last accessed by node4.151fs.cloud.stage2data.com (hostid: 0xc6a1485e) on Mon Mar 23 23:08:30 2020 <== This was the 11.3U1 system I tired to import on on, not the original
use '-f' to import anyway
root@freenas:~ # zpool import -f HP-P420i-R60

And then it just hangs here, GUI works so I looked at stats and it shows contact 100% load on DA1 drive (this volume) READING data at 120-200 MB/sec (see screenshot attached).

Original install 11.2U6 loads fine with all the settings if I detach the volume/controller from it. I tied even restoring settings in fresh 11.2U6 install and after reboot it's doing the same thing with SPA SYNC and reading the drives.

I am trying to have this volume working and not sure what else I can do, waiting doesn't help as all systems and versions just hang, is there anything else I can try as the volume it self looks healthy with no data curations and all scrub operations on it succeed with no issues.

Any help would be greatly appreciated.

Thank you!
 

Attachments

  • Screen Shot 2020-03-24 at 2.52.03 PM.png
    Screen Shot 2020-03-24 at 2.52.03 PM.png
    531.6 KB · Views: 165

prodigyman

Cadet
Joined
Mar 24, 2020
Messages
3
Also an update, was able to mount it read only on the same system but anytime mounting it with read/write it just hangs there "reading" the volume in some sort of loop
 
Top