D0peX
Cadet
- Joined
- May 23, 2023
- Messages
- 4
Hi Forum
I have been struggling with this one for so long now, without any answers.
Cannot find too much information about it too.
When running Truenas 13.0-U4, the `mps` re-inits after lots of command timeouts.
I can reproduce this issue 100% with starting a bunch of random writes with fio. The issue starts after ram has been filled by zfs cache.
When rolling back to 12.0-U8, no issues at all. This completely singles out the issue to be software.
I've tried some tunables in loader like
Without any success.
I also cannot find any mention of this controller starting to be buggy in freebsd 13, which i'd find strange if that were the case. I mean... It's freebsd, used by many for storage. With this in mind, i'm suspecting it to be a config/setting that Truenas ships with 13. But i do not know what/where to look for.
Small snippet of what happens:
I would appreciate any input, and could do some testing too if needed.
Thanks!
I have been struggling with this one for so long now, without any answers.
Cannot find too much information about it too.
When running Truenas 13.0-U4, the `mps` re-inits after lots of command timeouts.
I can reproduce this issue 100% with starting a bunch of random writes with fio. The issue starts after ram has been filled by zfs cache.
When rolling back to 12.0-U8, no issues at all. This completely singles out the issue to be software.
I've tried some tunables in loader like
Code:
hw.mps.disable_msix=1 hw.mps.disable_msi=1
Without any success.
I also cannot find any mention of this controller starting to be buggy in freebsd 13, which i'd find strange if that were the case. I mean... It's freebsd, used by many for storage. With this in mind, i'm suspecting it to be a config/setting that Truenas ships with 13. But i do not know what/where to look for.
Small snippet of what happens:
Code:
mps0: Reinitializing controller
mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
(da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 46 74 2e b8 00 08 00 00 length 1048576 SMID 597 Command timeout on target 0(0x0009) 60000 set, 60.780533875 elapsedI would appreciate any input, and could do some testing too if needed.
Thanks!
Code:
root@sirius[~]# mpsutil show all
Adapter:
mps0 Adapter:
Board Name: SAS9211-8i
Board Assembly:
Chip Name: LSISAS2008
Chip Revision: ALL
BIOS Revision: 7.39.02.00
Firmware Revision: 20.00.07.00
Integrated RAID: no
SATA NCQ: ENABLED
PCIe Width/Speed: x8 (5.0 GB/sec)
IOC Speed: Full
Temperature: Unknown/Unsupported
PhyNum CtlrHandle DevHandle Disabled Speed Min Max Device
0 0001 0009 N 6.0 1.5 6.0 SAS Initiator
1 0002 000a N 6.0 1.5 6.0 SAS Initiator
2 N 1.5 6.0 SAS Initiator
3 N 1.5 6.0 SAS Initiator
4 0003 000b N 6.0 1.5 6.0 SAS Initiator
5 0004 000c N 6.0 1.5 6.0 SAS Initiator
6 N 1.5 6.0 SAS Initiator
7 N 1.5 6.0 SAS Initiator
Devices:
B____T SAS Address Handle Parent Device Speed Enc Slot Wdt
00 00 4433221100000000 0009 0001 SATA Target 6.0 0001 03 1
00 01 4433221101000000 000a 0002 SATA Target 6.0 0001 02 1
00 02 4433221104000000 000b 0003 SATA Target 6.0 0001 07 1
00 03 4433221105000000 000c 0004 SATA Target 6.0 0001 06 1
Enclosures:
Slots Logical ID SEPHandle EncHandle Type
08 56c92bf00003bcd7 0001 Direct Attached SGPIO
Expanders:
NumPhys SAS Address DevHandle Parent EncHandle SAS Level
Code:
mps0: Reinitializing controller
mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
(da1:mps0:0:1:0): WRITE(10). CDB: 2a 00 bd b3 10 58 00 08 00 00 length 1048576 SMID 440 Command timeout on target 1(0x000a) 60000 set, 60.590619855 elapsed
mps0: Sending abort to target 1 for SMID 440
(da1:mps0:0:1:0): WRITE(10). CDB: 2a 00 bd b3 10 58 00 08 00 00 length 1048576 SMID 440 Aborting command 0xfffffe00c51edf40
(da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db af a1 d8 00 08 00 00 length 1048576 SMID 432 Command timeout on target 0(0x0009) 60000 set, 60.561992076 elapsed
mps0: Sending abort to target 0 for SMID 432
(da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db af a1 d8 00 08 00 00 length 1048576 SMID 432 Aborting command 0xfffffe00c51ed480
(da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db af a9 d8 00 08 00 00 length 1048576 SMID 392 Command timeout on target 0(0x0009) 60000 set, 60.530714629 elapsed
(da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db af b1 d8 00 08 00 00 length 1048576 SMID 306 Command timeout on target 0(0x0009) 60000 set, 60.501117892 elapsed
(da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 f0 41 46 20 00 01 00 00 length 131072 SMID 211 Command timeout on target 2(0x000b) 60000 set, 60.475944824 elapsed
mps0: Sending abort to target 2 for SMID 211
(da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 f0 41 46 20 00 01 00 00 length 131072 SMID 211 Aborting command 0xfffffe00c51dab88
(da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db af b9 d8 00 08 00 00 length 1048576 SMID 375 Command timeout on target 0(0x0009) 60000 set, 60.483962856 elapsed
(da1:mps0:0:1:0): WRITE(10). CDB: 2a 00 bd b3 20 58 00 08 00 00 length 1048576 SMID 160 Command timeout on target 1(0x000a) 60000 set, 60.441826032 elapsed
(da1:mps0:0:1:0): WRITE(10). CDB: 2a 00 bd b3 18 58 00 08 00 00 length 1048576 SMID 158 Command timeout on target 1(0x000a) 60000 set, 60.451334556 elapsed
(da1:mps0:0:1:0): WRITE(10). CDB: 2a 00 bd b3 30 58 00 08 00 00 length 1048576 SMID 121 Command timeout on target 1(0x000a) 60000 set, 60.423550204 elapsed
(da1:mps0:0:1:0): WRITE(10). CDB: 2a 00 bd b3 28 58 00 08 00 00 length 1048576 SMID 269 Command timeout on target 1(0x000a) 60000 set, 60.432836445 elapsed
(da1:mps0:0:1:0): WRITE(10). CDB: 2a 00 be 41 46 e8 00 00 78 00 length 61440 SMID 417 Command timeout on target 1(0x000a) 60000 set, 60.404959286 elapsed
(da1:mps0:0:1:0): WRITE(10). CDB: 2a 00 bd b3 38 58 00 05 00 00 length 655360 SMID 404 Command timeout on target 1(0x000a) 60000 set, 60.414351577 elapsed
(da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db af c1 d8 00 08 00 00 length 1048576 SMID 433 Command timeout on target 0(0x0009) 60000 set, 60.241136485 elapsed
(da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 f0 41 46 20 00 01 00 00 length 131072 SMID 364 Command timeout on target 0(0x0009) 60000 set, 60.222849270 elapsed
(da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db af c9 d8 00 05 00 00 length 655360 SMID 322 Command timeout on target 0(0x0009) 60000 set, 60.232077823 elapsed
(xpt0:mps0:0:2:0): SMID 9 task mgmt 0xfffffe00c51c9c18 timed out
mps0: Reinitializing controller
mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
(da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db b8 18 d8 00 08 00 00 length 1048576 SMID 216 Command timeout on target 0(0x0009) 60000 set, 60.203261674 elapsed
mps0: Sending abort to target 0 for SMID 216
(da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db b8 18 d8 00 08 00 00 length 1048576 SMID 216 Aborting command 0xfffffe00c51db240
(da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 db b8 28 d8 00 08 00 00 length 1048576 SMID 398 Command timeout on target 2(0x000b) 60000 set, 60.204955892 elapsed
mps0: Sending abort to target 2 for SMID 398
(da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 db b8 28 d8 00 08 00 00 length 1048576 SMID 398 Aborting command 0xfffffe00c51ea6d0
(da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db b8 28 d8 00 08 00 00 length 1048576 SMID 535 Command timeout on target 0(0x0009) 60000 set, 60.186371608 elapsed
Code:
Extended brand string: " Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz"
CLFLUSH instruction cache line size: 8
Initial APIC ID: 1
Hyper threading siblings: 16
root@sirius[~]# zpool status sirius-b
pool: sirius-b
state: ONLINE
scan: scrub repaired 768K in 1 days 04:32:21 with 0 errors on Mon May 22 04:32:21 2023
config:
NAME STATE READ WRITE CKSUM
sirius-b ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gptid/78f45af7-083b-11eb-8d8d-00259077a4ba ONLINE 0 0 0
gptid/7d4362ef-083b-11eb-8d8d-00259077a4ba ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
gptid/877b5643-083b-11eb-8d8d-00259077a4ba ONLINE 0 0 0
gptid/89fe6a49-083b-11eb-8d8d-00259077a4ba ONLINE 0 0 0
errors: No known data errors