Bizarre upgrade from 12.1 to 13.0

smcclos

Dabbler
Joined
Jan 22, 2021
Messages
43
OK, this is a weird one. I did an upgrade from 12.0 U8 to 13.0 U3.1 last week, and all went well.

-Log onto today and have the following errors:
CRITICAL
Pool vdisk01 state is ONLINE: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:

Disk HITACHI HUSSL402 CLAR200 XQVAR36A is REMOVED

2023-02-24 16:31:06 (America/New_York)

CRITICAL
Pool vdisk02 state is UNAVAIL: One or more devices are faulted in response to IO failures.
The following devices are not healthy:

Disk ATA TOSHIBA HDWQ140 Y7J1K1ZIFPBE is REMOVED
Disk ATA TOSHIBA HDWQ140 Y7LDK0D4FPBE is REMOVED

2023-02-24 16:31:06 (America/New_York)

XQVAR36A is on da3
Y7J1K1ZIFPBE is on da1
Y7LDK0D4FPB is on da2

So three disk are offline, and one pool is gone because it lost 2 out of 4 disks. But I noticed that they all went offline at the exact same time (2023-02-24 16:31:06).
I am thinking that it might be a power isssue, like a bad cord that took out three disks. So I power off the system, and move Y7LDK0D4FPB from da2 to da5, power up the system, and voila, all the disks are online, and both pools (vdisk01, and vdisk02) are perfectly fine.

The system has been behaving perfectly until the update, and so far, so good with the rebooot. My only thought is since the problem was with da1 to da3, that could it be chip controller incompatibility.

The system was one of those Cheslo 12000 systems with a tyan S5512 motherboard that has 12 SATA/SAS connections on it. I haven't traced the SATA/SAS wires to the ports on the board, but considering that the problem was on da1, da2, and da3, my guess is that they are on the same controller chip
 
Last edited:

smcclos

Dabbler
Joined
Jan 22, 2021
Messages
43
8 days later, and it looks like the Tyan S5512WGM2NR is incompatible with TrueNAS 13.0. Is there anyone else out there using this hardware. This system has the 2008 SAS controller built into the motherboard, and it has 12 SAS/SATA ports.

Last night I needed to roll it back to 12.0 U8.1 kernel in order for it to stabilize.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Couple of things:
  • The da disks are all SAS (well, strictly speaking SCSI of some variety). SATA (or PATA) disks have ada identifiers.
  • The SAS2008 is very widely used. Unless it's damaged or something along those lines it should be working...
  • ...provided it's actually using IT firmare. What's the output of sas2flash -listall?
 

smcclos

Dabbler
Joined
Jan 22, 2021
Messages
43
Couple of things:
  • The da disks are all SAS (well, strictly speaking SCSI of some variety). SATA (or PATA) disks have ada identifiers.
  • The SAS2008 is very widely used. Unless it's damaged or something along those lines it should be working...
  • ...provided it's actually using IT firmare. What's the output of sas2flash -listall?
I installed the IT firmware before the initial install of 12.0 U2.1 on April 3, 2021, and until the 13.0 U3.1 upgrade went in, I never had any issues, nor did I use the console. My code version is 20.00.04.00. I do know there is a 20.00.07.00 version out there, but I wanted to stabilize my system before changing anything else.

It has been purring like a kitten again once I went to 12.0 U8.1
 

Big Tuna

Cadet
Joined
Nov 19, 2023
Messages
4
8 days later, and it looks like the Tyan S5512WGM2NR is incompatible with TrueNAS 13.0. Is there anyone else out there using this hardware. This system has the 2008 SAS controller built into the motherboard, and it has 12 SAS/SATA ports.

Last night I needed to roll it back to 12.0 U8.1 kernel in order for it to stabilize.
Curious if there's been any resolution to this. I'm running the same Tyan motherboard and can confirm I'm experiencing the same issues with Truenas 13. I get controller resets and IO errors causing the pool to stop responding. After a restart, all is well for a bit but the errors will always start up again eventually.

I have been running 12.0 U8.1 for quite some time and it is rock solid so it is definitely an issue with 13.

I had firmware version 13.00.01.00 on the embedded controller and thought I'd try upgrading it to 20.00.04.00 in the hopes that would take care of it. Sad to say the answer is no after all that work. Still on 12.0 U8.1 awaiting some sort of explanation or solution. I see several posts in the forum describing similar issues when upgrading to 13. I'd love to be able to get on the latest supported version.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
May I ask to spend a little more care on the version numbers? In the subject of the initial post it's "12.1 to 13.0". Since there never was a TrueNAS 12.1 release version, only experimental "nightlies", my initial reaction before opening the thread was "expected, running experimental software ...".

Then I opened the thread and in the first paragraph it's "12.0 to 13.1" - again, there is no TN 13.1 release. So what to make of this?

Following the thread it became clear that we are facing a real problem with a particular mainboard and an upgrade from 12.0 to 13.0 here. I don't have anything to add for now to the technical discussion, it's going in a good direction, I hope. But please people, be precise with software versions. This is important!
 

Big Tuna

Cadet
Joined
Nov 19, 2023
Messages
4
The correct version is 20.00.07.0, give that a try.

Thanks for the response. I finally was able to get the firmware updated to 20.00.07.00, but unfortunately I'm still having issues. The cycle seems to be timeouts and aborts followed by reinitialization of the controller. It then starts over with similar pattern until the controller becomes unresponsive and requires reboot.

Code:
Nov 22 03:02:55 truenas (da1:mps0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 7f b6 3d 88 00 00 00 08 00 00 length 4096 SMID 1695 Command timeout on target  (0x000a) 60000 set, 60.120401072 elapsed
Nov 22 03:02:55 truenas mps0: Sending abort to target 1 for SMID 1695
Nov 22 03:02:55 truenas (da1:mps0:0:1:0): READ(16). CDB: 88 00 00 00 00 01 7f b6 3d 88 00 00 00 08 00 00 length 4096 SMID 1695 Aborting command 0xfffffe00c511e5a8
Nov 22 03:02:55 truenas (da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 10 f6 d2 48 00 00 a8 00 length 86016 SMID 1293 Command timeout on target 0(0x0009) 60000 set, 60.78054881 elapsed
Nov 22 03:02:55 truenas mps0: Sending abort to target 0 for SMID 1293
Nov 22 03:02:55 truenas (da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 10 f6 d2 48 00 00 a8 00 length 86016 SMID 1293 Aborting command 0xfffffe00c50fc978
Nov 22 03:02:55 truenas (da1:mps0:0:1:0): WRITE(16). CDB: 8a 00 00 00 00 01 7b 97 ee b8 00 00 00 48 00 00 length 36864 SMID 1730 Command timeout on
target 1(0x000a) 60000 set, 60.78438291 elapsed
Nov 22 03:02:55 truenas (da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 10 f6 d2 48 00 00 98 00 length 77824 SMID 1782 Command timeout on target 2(0x000b) 60000 set, 60.78713692 elapsed
Nov 22 03:02:55 truenas mps0: Sending abort to target 2 for SMID 1782
Nov 22 03:02:55 truenas (da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 10 f6 d2 48 00 00 98 00 length 77824 SMID 1782 Aborting command 0xfffffe00c5125a90
Nov 22 03:03:02 truenas (xpt0:mps0:0:2:0): SMID 3 task mgmt 0xfffffe00c5090408 timed out
Nov 22 03:03:02 truenas mps0: Reinitializing controller
Nov 22 03:03:02 truenas mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd Nov 22 03:03:02 truenas mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
Nov 22 03:06:11 truenas (da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 01 83 e0 23 68 00 00 00 08 00 00 length 4096 SMID 281 Command timeout on
target 2(0x000b) 60000 set, 60.278754672 elapsed
Nov 22 03:06:11 truenas mps0: Sending abort to target 2 for SMID 281
Nov 22 03:06:11 truenas (da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 01 83 e0 23 68 00 00 00 08 00 00 length 4096 SMID 281 Aborting command 0xfffffe00c50a7998 


I should mention that I have confirmed the same activity in both 13.0-U4 and 13.0-U5.3. But again, rolling back to 12.0-U8.1 takes care of all problems.

Hopefully this is helpful to someone and would love to know if there's anyone out there that has gotten the S5512WGM2NR working successfully on 13.0

May I ask to spend a little more care on the version numbers?

Thanks for the reminder. I admit I overlooked this when browsing the thread initially. Apologies for any confusion caused.
 

Big Tuna

Cadet
Joined
Nov 19, 2023
Messages
4
Posting back here for anyone else facing a similar issue with a built-in SAS2008 controller.

I had some time to do some experimenting and I believe I've discovered a potential solution/work around. On a whim, I decided to check settings in the embedded controller BIOS for the sas2008 using Ctrl+C during bootup.

From the main adapter list screen, I was able to access the Global Properties menu using Alt+N.

Code:
Pause When Boot Alert Displayed  [No]
Boot Information Display Mode    [Display adapters & installed devices]
Support Interrupt                [Hook interrupt, the Default]        


My support interrupt was set to the "Hook interrupt" (default) as shown above. I changed this to: "Bypass interrupt hooks," saved and rebooted.

Prior to this, I would get errors within the first 20 minutes of operation and I'm now going on 48 hours rock solid running 13.0 U6. I'm fairly confident this was the issue but time will tell.

What I don't know is if there are any downsides to bypassing interrupt hooks. I'm not knowledgeable enough to know what this setting actually does but maybe someone else can enlighten me. I plan to do some performance tests from 12.0 U8.1 with this setting enabled and 13.0 U6 with it bypassed just to see if there is any difference.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
embedded controller BIOS for the sas2008 using Ctrl+C during bootup
You really should not be using the legacy BIOS option ROM in 2023. If you need to boot from the card, use the UEFI ROM - set the option ROM mode in system firmware to UEFI only and/or remove the legacy ROM from the card's flash.

More generally: If you see any prompts to press random key combinations to enter this or that setup utility, besides those from the system firmware itself, that means you still have legacy BIOS opROMs being loaded. That's bad for a variety of reasons that boil down to "UEFI is a mess, but it's better than what came before". Modern UEFI option ROMs hook into the system firmware setup menu and provide their configuration options as additional menus in that same application, which is far more convenient.
My support interrupt was set to the "Hook interrupt" (default) as shown above. I changed this to: "Bypass interrupt hooks," saved and rebooted.

Prior to this, I would get errors within the first 20 minutes of operation and I'm now going on 48 hours rock solid running 13.0 U6. I'm fairly confident this was the issue but time will tell.
It's certainly referring to int13h, the IBM PC BIOS interrupt to request disk services under DOS. It shouldn't be causing trouble, but I guess that's a bug in Tyan's system firmware.
 

Big Tuna

Cadet
Joined
Nov 19, 2023
Messages
4
It's certainly referring to int13h, the IBM PC BIOS interrupt to request disk services under DOS. It shouldn't be causing trouble, but I guess that's a bug in Tyan's system firmware.
Well, it appears I spoke too soon so I have to reluctantly eat some crow. Errors are coming back and seem to be triggered by heavy file transfer operations in SMB.

I'm back to holding tight at 12.0-U8.1 and hoping it resolves itself in a future release. If not, I may have to bite the bullet on some new hardware.
 

smcclos

Dabbler
Joined
Jan 22, 2021
Messages
43
May I ask to spend a little more care on the version numbers? In the subject of the initial post it's "12.1 to 13.0". Since there never was a TrueNAS 12.1 release version, only experimental "nightlies", my initial reaction before opening the thread was "expected, running experimental software ...".

Then I opened the thread and in the first paragraph it's "12.0 to 13.1" - again, there is no TN 13.1 release. So what to make of this?

Following the thread it became clear that we are facing a real problem with a particular mainboard and an upgrade from 12.0 to 13.0 here. I don't have anything to add for now to the technical discussion, it's going in a good direction, I hope. But please people, be precise with software versions. This is important!
Fixed
 

smcclos

Dabbler
Joined
Jan 22, 2021
Messages
43
Well, it appears I spoke too soon so I have to reluctantly eat some crow. Errors are coming back and seem to be triggered by heavy file transfer operations in SMB.

I'm back to holding tight at 12.0-U8.1 and hoping it resolves itself in a future release. If not, I may have to bite the bullet on some new hardware.
I froze this system as 12.0 U8.1.

My upcoming plan is to build a new 13.0 U6.1 system, migrate the data, and then try a new build on the Tyan hardware.

There might be a solution with this system with 13.0 U6.1, or even Scale, but I am not going to put my data at a risk.
 
Top