FreeNAS unresponsive during resilver

Knogle

Dabbler
Joined
Jan 25, 2014
Messages
28
Ahoy friends.
I had some issues regarding my FreeNAS system. I got a pool running 8 hard disks, having a RAID-Z2.
The FreeNAS system was running as a virtual machine on a Fedora host, with RAID Controller passed through.
Unfortunately i had some issues, because somehow, 1 or 2 disks get disconnected, and immediatly reconnected at the same time. This happens more or less 1 time per week. But unfortunately the pool always get's broken due to it.
Now i got a dedicated storage system, having the same RAID Controller, but now i wanted to perform a resilver process. Unfortunately the system becomes unresponsive , and the "dashboard" does not display anymore. Also "zpool status" and all these zpool commands simply hang and give no response.
What might be the issue here? Unfortunately one disk failed so i have to complete the resilvering process because one disk failed. But i am not able to finish the resilvering process because the system becomes unresponsive.
May a hardware issue cause this problem? How can i troubleshoot?
In case of my old system, there were "PCIE #SERR" errors, but not sure which device has caused them. On my new system the issue with the unresponsiveness continues the same, maybe i should check for "PCIE #SERR" as well.
In addition to that: I got a LSI 9211 8i RAID Controller, together with 1x LSI Lenovo SAS Expander.
Initially i thought, it must be the FreeNAS system it self which causes this trouble. Now moving to a fresh system and install it persists the same. So there must be anything wrong with the LSI HBA, the expander, or the pool itself!
Thanks in advance!

Not responding:

Screenshot from 2021-06-20 21-26-53.png


Page not loading

Screenshot from 2021-06-20 21-26-17.png


Screenshot from 2021-06-20 21-26-40.png
 
Joined
Jan 7, 2015
Messages
1,155
I vote expander.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
If you are still using the RAID controller, this should be changed to an HBA running in IT mode. The forum has loads of information if you are interested in more details.

Are any of your drives SMR? If so that should also be changed, although it may not contribute directly to the issue at hand. Also, a more complete description of your hardware would be great (and I think is even requested by forum rules).

Good luck!
 

Knogle

Dabbler
Joined
Jan 25, 2014
Messages
28
I vote expander.
Thanks a lot. I was able to go along, i have tried another HBA, this time 9207 8i instead of the 9211 8i, same issue. So i further tried to replace the Expander by another HBA, so i have used the LSI 9211 as well as the 9207 to connect all the devices. Still the same issue, so the only option left: There must be something wrong with the pool, does someone have any idea how to recover the pool?
 

Knogle

Dabbler
Joined
Jan 25, 2014
Messages
28
I have upgraded to TrueNAS Core right now, now i am able to get something working, at least a little bit.
I think there is some bug on FreeNAS 11.3, because in case of TrueNAS it doesn't render the system unresponsive.
Even though, i am curious about what i will get out in a working state of this pool.


Code:
Warning: settings changed through the CLI are not written to
the configuration database and will be reset on reboot.

root@freenas[~]# zpool status data
  pool: data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Jun 20 10:54:34 2021
        11.7T scanned at 38.2G/s, 73.2G issued at 238M/s, 16.6T total
        5.37M resilvered, 0.43% done, 20:11:27 to go
config:

        NAME                                                  STATE     READ WRITE CKSUM
        data                                                  DEGRADED     0 0     0
          raidz2-0                                            DEGRADED     0 0     0
            gptid/5168a444-cacf-11eb-891f-133dff369a55.eli    ONLINE       0 0   797  (resilvering)
            gptid/9155711b-6596-11eb-a75c-8f89b0c061c4.eli    ONLINE       0 0   187  (resilvering)
            gptid/86b9fa2f-6596-11eb-a75c-8f89b0c061c4.eli    ONLINE       0 0   120  (resilvering)
            gptid/9285ff04-6596-11eb-a75c-8f89b0c061c4.eli    ONLINE       0 0   163  (resilvering)
            gptid/9411f31f-6596-11eb-a75c-8f89b0c061c4.eli    ONLINE       0 0   797  (resilvering)
            spare-5                                           DEGRADED     0 0   279
              gptid/9b4a1344-6596-11eb-a75c-8f89b0c061c4.eli  ONLINE       0 0     0  (resilvering)
              336223762456561297                              UNAVAIL      0 0     0  was /dev/gptid/0cea51b3-6caf-11eb-aa19-19444082b40a.eli
            gptid/aab972e4-6596-11eb-a75c-8f89b0c061c4.eli    ONLINE       0 0   169  (resilvering)
            gptid/ae626c03-6596-11eb-a75c-8f89b0c0
 
Top