KDB: enter: panic around 1 in 5 reboots

Remoter

Cadet
Joined
Mar 11, 2021
Messages
6
I wonder if anyone can help with this strange fault I've been having with Freenas 11.3-U5. I've attached a screenshot of what happens around 1 in every 5 reboots. I've been trying to figure this out for about 2 weeks now and guess I spent about 50 hours trying to find hardware faults, including multiple passes of Memtest with ECC enabled and disabled. Needless to say, I can't find anything so far.

Here's my hardware:

1 x Supermicro H8SML-7F motherboard
1 x AMD FX-6300 CPU
4 x Kingston 8GB ECC RAM ( KVR16E11/8 ) (running at correct 1333Mhz speed)
10 x Seagate 3TB Barracuda Hard disks ( ST3000DM001 ) running in RAIDZ2 with GELI encryption
1 x 500GB 2.5 SSD (I tried 3 different USB memory sticks too)
1 x 500w PSU (motherboard has staggered spin up feature for HDD's)

Latest BIOS and latest BMC firmware. BMC showing no errors in the the event log.

System otherwise rock solid stable 6+years as far as I know, although admittedly I wasn't paying too much attention until recently, when I upgraded from freenas 9 to freenas 11.

If it helps I can also upload IKVM video of bootup.
 

Attachments

  • vlcsnap-2021-03-18-03h33m15s171.jpg
    vlcsnap-2021-03-18-03h33m15s171.jpg
    197.7 KB · Views: 206

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
The screenshot seems to indicate it's in the middle of trying to move some memory around.

Possibly some issues with a hairline heat-crack or something like that perhaps which only manifests some of the time.

I would suspect bad hardware somewhere in the chain.

You could try a reinstall of the OS (restore saved config) to see if that changes anything, but I wouldn't put much stock in the likelihood of OS corruption that you haven't seen a message about somewhere.
 

Remoter

Cadet
Joined
Mar 11, 2021
Messages
6
Excellent, thank you very much. I guess I probably reinstalled Freenas around 6 or 7 times now using various different boot disks but same problem every time.

Another strange thing is that after I installed 11.3-U5 I checked 'Dmesg | grep memory' and my 'real memory' was 32GB but my 'available memory' was only 22GB. So I reseated all memory and now my 'real memory' is 32GB but my 'available memory' is only 31GB. Furthermore, if I use 1 stick it's down around 256MB, and if I use 2 sticks it's down around 384MB. The placement /order of the sticks makes no difference (I tried all combinations). I would expect it to down about 8MB for the onboard video regardless of how many sticks it's using so I think something is wrong here.

I'll likely buy all new Mainboard, RAM, + CPU later this week and report back, unless anyone has any better ideas?

Thanks again sretalla
 

Remoter

Cadet
Joined
Mar 11, 2021
Messages
6
The screenshot seems to indicate it's in the middle of trying to move some memory around.

Possibly some issues with a hairline heat-crack or something like that perhaps which only manifests some of the time.

I would suspect bad hardware somewhere in the chain.

You could try a reinstall of the OS (restore saved config) to see if that changes anything, but I wouldn't put much stock in the likelihood of OS corruption that you haven't seen a message about somewhere.

Update: As supermicro boards are almost impossible to source in the UK atm without huge delays, I persisted in trying to find the fault and after 20+hours eventually traced it to the CPU. In summary the system is working fine now with either one of the other 2 CPU's I dug out of some other machines. Unfortunately neither of the 2 support AES-NI so I will likely have to reconstruct my GELI array otherwise performance will be appalling.

One thing that's strange though. Using windows 7 the suspected faulty CPU can run Prime95 torture test (CPU + Memory) all day no problem. Also I didn't notice any reboot problems after 30+ attempts with windows 7. Although with Freenas 11.3 U5 I can have kernel panic boot problems 3 times in 10 minutes. So I'm just wondering if we're 100% sure there's no compatibility problem between Freenas / FreeBSD and this range of CPU ? I'm going to assume the CPU is faulty somehow, although if anyone knows anything more I'd be very interested to hear.

Thanks again,
Remoter
 

Remoter

Cadet
Joined
Mar 11, 2021
Messages
6
Update again: I borrowed my friends PC which has the same CPU (FX6300) but everything else different and guess what guys, it's got exactly the same fault as described in the OP with freenas 11.3 U5 but can run Win 7 for days on end.

Not sure what to think now but I'm leaning more towards this being a be a bug in freenas / freebsd...

Either that or 2 faulty CPU's (I very much doubt...)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
It could simply be an incompatibility with the CPU. There is a large amount of work that goes in to making operating systems compatible with various flaws or caveats with certain CPU's. The manufacturers generally do not fix their silicon and send out new CPU's, but rather expect operating systems to work around the flaws. Examples include the Pentium F00F/FDIV bugs, Spectre/Meltdown, etc., to mention just a few major examples.

Back in 2012, I was pushing VERY hard for forum members to use Intel CPU's on server grade boards, because there were known issues with the AMD APU's, and I don't recall the FX CPU's being much better, while virtually everyone who followed the hardware guide had great success. There is no particular reason to think that your hardware has gotten any better in the last nine years, but we can agree that it is disappointing that FreeBSD hasn't become more compatible with it in the meantime.

Unfortunately, because there are so many different mainboards and CPU's out there, it is very difficult to support them all, because you'd really need to have a lab with test hosts available, on an ongoing basis, to be able to solve issues, and then on an ongoing basis to make sure there are no regressions. The FreeBSD Project does not have this.
 

Remoter

Cadet
Joined
Mar 11, 2021
Messages
6
Yes agree. Whilst it's unfortunate that so much time was spent on this, it's unfair to hold grudges as the software was provided for free. Nevertheless, I've decided to rebuild the original system and use Ubuntu server + mdadm. If anyone's interested, feel free to post here an I'll update accordingly, otherwise assume it's working.
 
Top