Pool degraded no SMART failure

Nighteyes

Dabbler
Joined
Nov 2, 2021
Messages
18
I will this weekend, sadly not enough time to do that right now. It would be weird that if I flashed it incorrect since it worked 2 years without problems. But I'm getting kind of desperate so ill try anyway.

I increased the voltage from 1.2v to 1.25v (which is still quite low considering my main rig runs a lot higher) and set the speed of the ram to 2400Mbps (I know most people say MT/s :P) manually now. Haven't had that specific error yet.
 

Nighteyes

Dabbler
Joined
Nov 2, 2021
Messages
18
I found the following thread which describes my problem exactly I think:
We have success with 11.1 and 12.1-RELEASE standard installation. No compiling and mixing driver versions. Problem was with 12.0 and 11.3-RELEASE.

Since I came from 11.1 and went to 11.3 there is a big chance that this is the reason. It also seems to happen more often when there is more IO load on the server which is also correct since I added a VM that takes 2GB memory and runs pi-hole.
The thread I found:
Freebsd bugzilla

I will still check if I flashed correctly (and post the results before I upgrade/update to Truenas), but it seems obvious now that it is just a driver bug in those Freefsb versions. If so, it would be very nice if this is documented somewhere for other people that use larger drives.
 

Nighteyes

Dabbler
Joined
Nov 2, 2021
Messages
18
Provide the output of "sas2flash.efi –list" (I think that is the command) so we can verify that the M1015 was flashed correctly. Not that we don't believe you but it's easy to mess this up. Been there myself a long time ago.

Also the other CPU error you got looks like an ECC issue or unsupported CPU, not sure and I didn't look into it very hard but there are postings about that error message. I would also recommend that you burn-in your CPU and RAM again to verify that you have no issues. you have done a number of changes and you may have induced an error.

I ran sas2flash.efi -list and it gave me the following:
Code:
Controller                SAS2008 (B2)
Firmware Product ID        0x2213 (IT)
Firmware Version         15.00.00.00
NVDATA Vendor            LSI
NVDATA Product ID        SAS9211-8i

Seems fine to me.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
I ran sas2flash.efi -list and it gave me the following:
Code:
Controller                SAS2008 (B2)
Firmware Product ID        0x2213 (IT)
Firmware Version         15.00.00.00
NVDATA Vendor            LSI
NVDATA Product ID        SAS9211-8i

Seems fine to me.
The firmware version you want to use on that card is 20.00.07.00

Broadcom doesn't make it easy to find, but try here:

Browse down to the firmware section, and look for this package:
broadcom-20-00-07-00.jpg


The firmware file you want is available in the ZIP archive.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994

Nighteyes

Dabbler
Joined
Nov 2, 2021
Messages
18
The firmware version you want to use on that card is 20.00.07.00

Broadcom doesn't make it easy to find, but try here:

Browse down to the firmware section, and look for this package:
View attachment 51597

The firmware file you want is available in the ZIP archive.
Thanks to both of you, I'll flash the card and let you know if that solves the problems.

Flash done, now the waiting :).
 
Last edited:

Nighteyes

Dabbler
Joined
Nov 2, 2021
Messages
18
Sadly it didn't solve the problem.
I also upgraded to Truenas 12.0-U7 but that also didn't help.

What Freefsb version is Truenas 12.0 based on? If it is 12.0 then the solution might be in 12.1 for me if I can believe the bug tracker I linked earlier.
 

Nighteyes

Dabbler
Joined
Nov 2, 2021
Messages
18
Before I do anything else from what I read in the bugtracker I posted earlier someone listed some options:
a: also disable ZFS cache flush
sysctl vfs.zfs.cache_flush_disable=1

b: experiment with larger timeout values
also observe "gstat" output and ensure the first column L(q) is continually returning to zero and not getting stuck for any of the drives

c: try setting reducing the SCIC speed to 3.0 in the controller settings; just to eliminate some disk firmware speed compatibility issue.

Another suggestion made was adding a SSD as extra cache to L2ARC. What steps would you suggest I take?
As reference the bugtracker again.
I might also create a bottleneck by just limiting the PCI-E bus on the motherboard?

If this does not work I don't know how to fix it... it probably would mean running two systems and just downgrading to Freenas 11.1 again since I can't downgrade my pool which forces me to make a new one.
 

Nighteyes

Dabbler
Joined
Nov 2, 2021
Messages
18
Ok I found the following blog and I might have fixed the problem as of 16 hours I don't see any errors yet.
https://blog.quindorian.org/2019/09/ironwolf10tbfirmwarefix.html/

It seems it was a bug on the disks I use which is fixed with a firmware update. I will in a week let you know what the status is.

Edit:
Spearfoot and joeschmuck this seems to have been the fix. No errors for almost 3 days (while normally I would have gotten several).
So for other people with similar symptoms they might need to update their firmware on their drives. Quite happy that it is working, I was getting worried :D.

Edit 2:
A few weeks without problems now!
 
Last edited:
Top