After Reboot, HBA keep getting reset and pool become unavail

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
That P13 firmware is just screaming to be updated to P16, which is actually known to stable (with 16.00.10 or whatever the specific version is). Things were bad enough that they decided to release three major versions in not that long a period, which goes to show the limits of "it worked before".
 

morphin

Dabbler
Joined
Jun 27, 2023
Messages
31
That P13 firmware is just screaming to be updated to P16, which is actually known to stable (with 16.00.10 or whatever the specific version is). Things were bad enough that they decided to release three major versions in not that long a period, which goes to show the limits of "it worked before".

Correct, But this is an oem card and does not have update.
Using official firmware may create problems with an OEM card.
Also as I know, sas3flash does not allow to upgrade OEM cards with not OEM firmware. Atleast it was a deal with dell oem cards before.
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
There is lsirec and lsiutil on github with more flexibility if you want to go the upgrade path, but I wouldn't be in a hurry. They helped me cross flash a Fujitsu 2008 HBA to generic LSI, but still messy.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Correct, But this is an oem card and does not have update.
Using official firmware may create problems with an OEM card.
Also as I know, sas3flash does not allow to upgrade OEM cards with not OEM firmware. Atleast it was a deal with dell oem cards before.
99% of OEM cards have nothing special about them. There's a thread over at STH dedicated to crossflashing LSI SAS3 controllers.
 
Last edited:

morphin

Dabbler
Joined
Jun 27, 2023
Messages
31
There is lsirec and lsiutil on github with more flexibility if you want to go the upgrade path, but I wouldn't be in a hurry. They helped me cross flash a Fujitsu 2008 HBA to generic LSI, but still messy.

What do you mean by messy?
Did you hit any problem after upgrade?

If I find a similar card and someone who is updated and does not have any problem. I could easly choose this path.
I'm afraid that if it wont work and I break the card, the same card is very expensive and I can not use OEM alternatives because the shape is not suitable with this server's port.
 
Last edited:

morphin

Dabbler
Joined
Jun 27, 2023
Messages
31
99% of OEM cards have nothing special about them. There's a thread over at STH dedicated to crossflashing LSI SAS3 controller's.

This is also correct. They only change vendor, product name, etc.
I don't want to break, playing with not official firmware is always dangerous game.

Can you send the thread link? I will check.
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
What do you mean by messy?
Did you hit any problem after upgrade?

If I find a similar card and someone who is updated and does not have any problem. I could easly choose this path.
I'm afraid that if it wont work and I break the card, the same card is very expensive and I can not use OEM alternatives because the shape is not suitable with this server's port.
Messy depends on the card. I have a 2008 card. I had bricked the HBA years ago trying to use sas2flash between different motherboard where I could do some things on one motherboard and some things on the other motherboard, and it was in my spares box. I found those tools, and tried reflashing it, had to find a SBR that would work, ended up using the SBR from a different Fujitsu card that I think I found on STH. So experimental, messy, until I got something that worked. Since the card was bricked, I didn't have anything to lose. The nice thing about lsirec is that it allows you to shutdown the card, and load the firmware after boot, and restart the card, or at least it worked for me. That might be a nice test for you, loading software without flashing, but it might also fail because of the data on the flash, or corrupt it too I guess. Backup as you well know would be a good idea, and lsiutil should be able to do that. I didn't do this running TN, as I needed to compile the software, so not sure if you can do that on TN. I compiled on ubuntu server and also proxmox, and it worked on those. The card seems to work, but it is my backup. I did find some issues that it would only work in an x16 slot, not an x8 slot, but haven't chased that down much, like thru the BIOS, but it did have the same behavior on all the x8 and x16 slots on the H11SSL. When I get the H11SSL in a case with some disks connected I'll look at it more closely.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

morphin

Dabbler
Joined
Jun 27, 2023
Messages
31
Messy depends on the card. I have a 2008 card. I had bricked the HBA years ago trying to use sas2flash between different motherboard where I could do some things on one motherboard and some things on the other motherboard, and it was in my spares box. I found those tools, and tried reflashing it, had to find a SBR that would work, ended up using the SBR from a different Fujitsu card that I think I found on STH. So experimental, messy, until I got something that worked. Since the card was bricked, I didn't have anything to lose. The nice thing about lsirec is that it allows you to shutdown the card, and load the firmware after boot, and restart the card, or at least it worked for me. That might be a nice test for you, loading software without flashing, but it might also fail because of the data on the flash, or corrupt it too I guess. Backup as you well know would be a good idea, and lsiutil should be able to do that. I didn't do this running TN, as I needed to compile the software, so not sure if you can do that on TN. I compiled on ubuntu server and also proxmox, and it worked on those. The card seems to work, but it is my backup. I did find some issues that it would only work in an x16 slot, not an x8 slot, but haven't chased that down much, like thru the BIOS, but it did have the same behavior on all the x8 and x16 slots on the H11SSL. When I get the H11SSL in a case with some disks connected I'll look at it more closely.

Thanks for sharing your knowledge.
Can you help me on finding the right firmware for my card?
The known name is "SAS3316" I suppose the conversion is "9361-16i" I suppose.
I search both but couldn't find the right firmware.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I can not use Core because network driver is not exist.

All of the recommended network cards work equally well on both CORE and SCALE. Going on an off-road adventure with Randy's Discount Ethernet Adapter and Dry Cleaning Loyalty Punch Card leads to a situation where it becomes difficult to assist you because you have robbed yourself of the ability to use a known stable platform. Every time I hear about people wanting to use Linux "because of the better hardware support", I groan because this is the semi-predictable outcome of this. Fortunately your server has the ethernet I/O on a module and it could be replaced. I'm guessing you have, what, the Qlogic on there right now?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The known name is "SAS3316" I suppose the conversion is "9361-16i" I suppose.

The 9361 is a RAID-on-chip (SAS3108) based card and there is no official firmware that will bludgeon it down to function as an HBA. It is likely that an experienced firmware hacker could get the SAS3008 firmware to run on it, but that's not a beginner level project.
 

morphin

Dabbler
Joined
Jun 27, 2023
Messages
31
All of the recommended network cards work equally well on both CORE and SCALE. Going on an off-road adventure with Randy's Discount Ethernet Adapter and Dry Cleaning Loyalty Punch Card leads to a situation where it becomes difficult to assist you because you have robbed yourself of the ability to use a known stable platform. Every time I hear about people wanting to use Linux "because of the better hardware support", I groan because this is the semi-predictable outcome of this. Fortunately your server has the ethernet I/O on a module and it could be replaced. I'm guessing you have, what, the Qlogic on there right now?

I did not choose this path. I decided to use Core, Network was not supported, My network card is OEM and my server is not a normal server that you can easly change things. And it is 2bay blade server and the network is shared. I did not want to deal with manually implemented network hardware and I choosed the Scale path instead. After everything was ready with Scale, I tested pool, I did not see any problem. I started the stability test and rebooted a server and found the HBA reset problem during boot sequence. Now I solved this problem and it seems that everything working atleast. Now I don't want to turn around and deal with a network driver who does not have official support. Because I lost too much time and it was not expected for me.

If anyone asks me, I would not choose this hardware but, it is what it is.

The network card is Cisco UCS VIC 1455.

1689162258361.png
 

morphin

Dabbler
Joined
Jun 27, 2023
Messages
31
The 9361 is a RAID-on-chip (SAS3108) based card and there is no official firmware that will bludgeon it down to function as an HBA. It is likely that an experienced firmware hacker could get the SAS3008 firmware to run on it, but that's not a beginner level project.

Any time requiring solution is not solution for me at the moment.
 

morphin

Dabbler
Joined
Jun 27, 2023
Messages
31
Let rewind the clock here a bit, shall we?
You've identified several symptoms of the problem, most of which surround middleware.

ix-zfs.service?

All it does is import the pool on boot. It literally exists for that function. So disabling it on your system of course will result in the pool not importing on boot up...


You are mistaken.
1- When I import the pool via gui, why this problem not occurs?
As I understand, the difference is the pool cache and trying to import the pool via cache file and I think the problem is, my drives addresses changing at every reboot and cache file does not match the current state and creates this issue. This is just a guess, I still could not get a time for debugging the import_pool.py

2- I just realised, I may have the problem with Truenas reboot sequence too:

1689163061435.png
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Any time requiring solution is not solution for me at the moment.

Then it may be that there is no solution for you. We do not guarantee that TrueNAS will run on any server that you happen to show up here with, and in fact TrueNAS is known to be rather picky about the exact hardware it will run on; the closer you can get to the Supermicro based platforms originally used by TrueNAS Enterprise, the better your luck may be. This includes using actual LSI HBA's rather than trying to rely on crossflashing, and Chelsio cards for faster-than-gig applications (though most of Intel's product line is known to work well too).

My own take on this, after having glanced through this thread, is that this is another one of those weird cases where the Linux MPT3SAS driver seems to be trying to take over a card that it isn't really designed for. FreeBSD handles LSI 93xx HBA under the MPR driver and it would be really interesting to see how this showed up under TrueNAS CORE. I get the distinct impression that your 9361 would show up as a RAID controller but as I haven't actually tried the experiment. It could be that the 9361 with Linux needs to be tweaked or avoided or something like that. We had another user within the past few months with "weird stuff".

Anyways, while it may be inconvenient for you, forum recommendations are based on pragmatic factors such as "LSI HBA's work correctly, most other things including LSI RAID do not" or "Chelsio and Intel have properly implemented support for all the things while most other cards do not". Therefore when you show up with an LSI 9361 RAID based card and it is not working correctly, or you "cannot" try CORE because you've brought a crummy network card to the game, there's nothing much we can do for you other than to admire your brazenness and/or misfortune (depending on how you look at it) at your having brought a knife to a gunfight, as the kids these days say. You are of course free to eBay your existing server and buy something more appropriate, or to press on until the specific nature of your current server's deficiencies are more fully understood.
 

morphin

Dabbler
Joined
Jun 27, 2023
Messages
31
Then it may be that there is no solution for you. We do not guarantee that TrueNAS will run on any server that you happen to show up here with, and in fact TrueNAS is known to be rather picky about the exact hardware it will run on; the closer you can get to the Supermicro based platforms originally used by TrueNAS Enterprise, the better your luck may be. This includes using actual LSI HBA's rather than trying to rely on crossflashing, and Chelsio cards for faster-than-gig applications (though most of Intel's product line is known to work well too).

My own take on this, after having glanced through this thread, is that this is another one of those weird cases where the Linux MPT3SAS driver seems to be trying to take over a card that it isn't really designed for. FreeBSD handles LSI 93xx HBA under the MPR driver and it would be really interesting to see how this showed up under TrueNAS CORE. I get the distinct impression that your 9361 would show up as a RAID controller but as I haven't actually tried the experiment. It could be that the 9361 with Linux needs to be tweaked or avoided or something like that. We had another user within the past few months with "weird stuff".

Anyways, while it may be inconvenient for you, forum recommendations are based on pragmatic factors such as "LSI HBA's work correctly, most other things including LSI RAID do not" or "Chelsio and Intel have properly implemented support for all the things while most other cards do not". Therefore when you show up with an LSI 9361 RAID based card and it is not working correctly, or you "cannot" try CORE because you've brought a crummy network card to the game, there's nothing much we can do for you other than to admire your brazenness and/or misfortune (depending on how you look at it) at your having brought a knife to a gunfight, as the kids these days say. You are of course free to eBay your existing server and buy something more appropriate, or to press on until the specific nature of your current server's deficiencies are more fully understood.
I built too many custom systems and never had a problem anything like this.
When I saw this server, the first thing I said was "sell this, we can get better system with that price" but this was not an option.
Life gave me only work with this hardware and I hit a different bug at every step and spent more time then I've expected. Thats why I don't want to deal time consuming solutions anymore because I spent too much time already.

I consider hardware changing but with the weird shape of the cards make this choice impossible. What can I do?

So instead of blaiming anyone, I tried to solve this problem with starting "I have a weird case and weir server"
I know that maybe this system can hit different issue at the future I couldn't see with my test bench, I'm trying everything to prevent this right now and ZFS is very safe and I trust it.

As you can see from my previous screenshot, yes the LSI CHIP is 3316 ROC, But they used this CHIP to create a HBA card and their purpose was ablity for changing the card from IT to IR with their firmwares.

But... As you see they almost abandone the project or does not care that they still use P13.

Yes I think its a misfortune for me. But at least I try and didn't give up until find a solution and it is working now and I know the up and downs.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
We had another user within the past few months with "weird stuff".
I was writing a long post comparing this to the Dell Gen 12 PERCs running SAS2308 IT firmware... but then I noticed that the mpr driver explicitly supports the 3316, even though it is primarily sold as an "entry-to-mid level ROC":
HARDWARE
These controllers are supported by the mpr driver:

o Broadcom Ltd./Avago Tech (LSI) SAS 3004 (4 Port SAS)
o Broadcom Ltd./Avago Tech (LSI) SAS 3008 (8 Port SAS)
o Broadcom Ltd./Avago Tech (LSI) SAS 3108 (8 Port SAS)
o Broadcom Ltd./Avago Tech (LSI) SAS 3216 (16 Port SAS)
o Broadcom Ltd./Avago Tech (LSI) SAS 3224 (24 Port SAS)
o Broadcom Ltd./Avago Tech (LSI) SAS 3316 (16 Port SAS)
o Broadcom Ltd./Avago Tech (LSI) SAS 3324 (24 Port SAS)
o Broadcom Ltd./Avago Tech (LSI) SAS 3408 (8 Port SAS/PCIe)
o Broadcom Ltd./Avago Tech (LSI) SAS 3416 (16 Port SAS/PCIe)
o Broadcom Ltd./Avago Tech (LSI) SAS 3508 (8 Port SAS/PCIe)
o Broadcom Ltd./Avago Tech (LSI) SAS 3516 (16 Port SAS/PCIe)
o Broadcom Ltd./Avago Tech (LSI) SAS 3616 (16 Port SAS/PCIe)
o Broadcom Ltd./Avago Tech (LSI) SAS 3708 (8 Port SAS/PCIe)
o Broadcom Ltd./Avago Tech (LSI) SAS 3716 (16 Port SAS/PCIe)
o Broadcom Ltd./Avago Tech (LSI) SAS 3816 (16 Port SAS/PCIe)
o Broadcom Ltd./Avago Tech (LSI) SAS 3916 (16 Port SAS/PCIe)
So, this really looks like it only needs to be flashed with stock firmware. The major question is what sort of lock-in the server itself has implemented.
 

morphin

Dabbler
Joined
Jun 27, 2023
Messages
31
I was writing a long post comparing this to the Dell Gen 12 PERCs running SAS2308 IT firmware... but then I noticed that the mpr driver explicitly supports the 3316, even though it is primarily sold as an "entry-to-mid level ROC":

So, this really looks like it only needs to be flashed with stock firmware. The major question is what sort of lock-in the server itself has implemented.

Finally a good new. :grin: Thank you for checking.
What kind of "sort of lock-in the server" are we talking about?

I noticed only 2 things:
1- The mpt3sas resets which happens only with "import_pool.py" if I import the pool from Truenas dashboard, I don't have any problem.
2- I'm starting to do benchmark and I didn't like the test results and maybe official firmware will help on this.
 
Top