SilverNetworks
Cadet
- Joined
- Aug 9, 2020
- Messages
- 6
Greetings iXsystems Community,
This is an update to my original post over at: https://www.truenas.com/community/t...-seconds-shutting-down-controller-done.86761/
First, I would like to thank bsdimp for investing time and effort into attempting to resolve this issue properly. My apologies for not following up sooner and testing bsdimp’s patch.
For those interested, I’ve been running my original patched version since 2020 on a number of different systems and have not had any issues. One such system is running a 48x8TB HDD RAID-Z3 array, and is under very heavy load for days or weeks at a time.
Since https://github.com/openzfs/zfs/issues/15526, I’ve been keeping an eye out for a patched version of TrueNAS Core. Now that 13.0-U6.1 has been released to fix the aforementioned bug, I thought it would be timely to compile some new drivers with the “idle reboot bug” fix.
Attached are three compiled versions accompanied by their source code and diffs:
3.2-10-131001: My patch, as per the previous post, where we simply don’t reset the controller and pretend like everything is fine
3.2-10-131002: bsdimp’s version of the patch, with driver version updated to “131002”
3.2-10-131003: bsdimp’s version, but I left the timeout check at the default 20 seconds and updated the driver version to “131003”
Installing the driver is straight-forward:
The old driver should report something like:
Driver : 3.2-10 (1)
The new driver should report something like:
Driver : 3.2-10 (131003)
Note:
Diffs for quick reference:
3.2-10-131001:
aacraid.c.diff
2188c2188,2193
< aac_reset_adapter(sc);
---
> /*
> * Resetting the adapter causes FreeNAS 11.x/12.x to panic with a "command not in queue" error.
> * Let's not reset the adapter and carry on as if nothing happened.
> * This is the command we are leaving out:
> * aac_reset_adapter(sc);
> */
aacraid_var.h.diff
54c54
< # define AAC_DRIVER_BUILD 1
---
> # define AAC_DRIVER_BUILD 131001
3.2-10-131002:
aacraid.c.diff
3799a3800
> #if 0 /* Controller seems to return these at least some of the time after we reset */
3817c3818
<
---
> #endif
aacraid_var.h.diff
54c54
< # define AAC_DRIVER_BUILD 1
---
> # define AAC_DRIVER_BUILD 131002
115c115
< #define AAC_PERIODIC_INTERVAL20/* seconds */
---
> #define AAC_PERIODIC_INTERVAL1/* seconds */
3.2-10-131003:
aacraid.c.diff
3799a3800
> #if 0 /* Controller seems to return these at least some of the time after we reset */
3817c3818
<
---
> #endif
aacraid_var.h.diff
54c54
< # define AAC_DRIVER_BUILD 1
---
> # define AAC_DRIVER_BUILD 131003
Cheers,
Greg
PS: I just noticed that my comment in my patch references "FreeNAS 11.x/12.x." Perhaps I'll remember to update this on the next run.
This is an update to my original post over at: https://www.truenas.com/community/t...-seconds-shutting-down-controller-done.86761/
First, I would like to thank bsdimp for investing time and effort into attempting to resolve this issue properly. My apologies for not following up sooner and testing bsdimp’s patch.
For those interested, I’ve been running my original patched version since 2020 on a number of different systems and have not had any issues. One such system is running a 48x8TB HDD RAID-Z3 array, and is under very heavy load for days or weeks at a time.
Since https://github.com/openzfs/zfs/issues/15526, I’ve been keeping an eye out for a patched version of TrueNAS Core. Now that 13.0-U6.1 has been released to fix the aforementioned bug, I thought it would be timely to compile some new drivers with the “idle reboot bug” fix.
Attached are three compiled versions accompanied by their source code and diffs:
3.2-10-131001: My patch, as per the previous post, where we simply don’t reset the controller and pretend like everything is fine
3.2-10-131002: bsdimp’s version of the patch, with driver version updated to “131002”
3.2-10-131003: bsdimp’s version, but I left the timeout check at the default 20 seconds and updated the driver version to “131003”
Installing the driver is straight-forward:
- check current driver version: arcconf getconfig 1 ad|grep Driver
- copy desired aacraid.ko to /boot/kernel/
- add the following /boot/loader.conf:
- aacraid_load="YES"
- reboot
- verify the new driver is loaded: arcconf getconfig 1 ad|grep Driver
The old driver should report something like:
Driver : 3.2-10 (1)
The new driver should report something like:
Driver : 3.2-10 (131003)
Note:
- All of the standard "use at your risk" and "ymmv" disclaimers apply if you choose to use the attached drivers.
- Driver source came from FreeBSD 13.1-RELEASE
- All drivers were compiled on FreeBSD 13.1-RELEASE
- I've not confirmed if the idle timeout bug occurs in TrueNAS Core 13.x with the in-box driver since I don't have any in production yet. I've created these drivers as 'safety net' fixes, but perhaps someone could confirm if they have seen this issue in modern versions of TureNAS Core.
Diffs for quick reference:
3.2-10-131001:
aacraid.c.diff
2188c2188,2193
< aac_reset_adapter(sc);
---
> /*
> * Resetting the adapter causes FreeNAS 11.x/12.x to panic with a "command not in queue" error.
> * Let's not reset the adapter and carry on as if nothing happened.
> * This is the command we are leaving out:
> * aac_reset_adapter(sc);
> */
aacraid_var.h.diff
54c54
< # define AAC_DRIVER_BUILD 1
---
> # define AAC_DRIVER_BUILD 131001
3.2-10-131002:
aacraid.c.diff
3799a3800
> #if 0 /* Controller seems to return these at least some of the time after we reset */
3817c3818
<
---
> #endif
aacraid_var.h.diff
54c54
< # define AAC_DRIVER_BUILD 1
---
> # define AAC_DRIVER_BUILD 131002
115c115
< #define AAC_PERIODIC_INTERVAL20/* seconds */
---
> #define AAC_PERIODIC_INTERVAL1/* seconds */
3.2-10-131003:
aacraid.c.diff
3799a3800
> #if 0 /* Controller seems to return these at least some of the time after we reset */
3817c3818
<
---
> #endif
aacraid_var.h.diff
54c54
< # define AAC_DRIVER_BUILD 1
---
> # define AAC_DRIVER_BUILD 131003
Cheers,
Greg
PS: I just noticed that my comment in my patch references "FreeNAS 11.x/12.x." Perhaps I'll remember to update this on the next run.