D0peX
Cadet
- Joined
- May 23, 2023
- Messages
- 4
Hi Forum
I have been struggling with this one for so long now, without any answers.
Cannot find too much information about it too.
When running Truenas 13.0-U4, the `mps` re-inits after lots of command timeouts.
I can reproduce this issue 100% with starting a bunch of random writes with fio. The issue starts after ram has been filled by zfs cache.
When rolling back to 12.0-U8, no issues at all. This completely singles out the issue to be software.
I've tried some tunables in loader like
Without any success.
I also cannot find any mention of this controller starting to be buggy in freebsd 13, which i'd find strange if that were the case. I mean... It's freebsd, used by many for storage. With this in mind, i'm suspecting it to be a config/setting that Truenas ships with 13. But i do not know what/where to look for.
Small snippet of what happens:
I would appreciate any input, and could do some testing too if needed.
Thanks!
I have been struggling with this one for so long now, without any answers.
Cannot find too much information about it too.
When running Truenas 13.0-U4, the `mps` re-inits after lots of command timeouts.
I can reproduce this issue 100% with starting a bunch of random writes with fio. The issue starts after ram has been filled by zfs cache.
When rolling back to 12.0-U8, no issues at all. This completely singles out the issue to be software.
I've tried some tunables in loader like
Code:
hw.mps.disable_msix=1 hw.mps.disable_msi=1
Without any success.
I also cannot find any mention of this controller starting to be buggy in freebsd 13, which i'd find strange if that were the case. I mean... It's freebsd, used by many for storage. With this in mind, i'm suspecting it to be a config/setting that Truenas ships with 13. But i do not know what/where to look for.
Small snippet of what happens:
Code:
mps0: Reinitializing controller mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc> (da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 46 74 2e b8 00 08 00 00 length 1048576 SMID 597 Command timeout on target 0(0x0009) 60000 set, 60.780533875 elapsed
I would appreciate any input, and could do some testing too if needed.
Thanks!
Code:
root@sirius[~]# mpsutil show all Adapter: mps0 Adapter: Board Name: SAS9211-8i Board Assembly: Chip Name: LSISAS2008 Chip Revision: ALL BIOS Revision: 7.39.02.00 Firmware Revision: 20.00.07.00 Integrated RAID: no SATA NCQ: ENABLED PCIe Width/Speed: x8 (5.0 GB/sec) IOC Speed: Full Temperature: Unknown/Unsupported PhyNum CtlrHandle DevHandle Disabled Speed Min Max Device 0 0001 0009 N 6.0 1.5 6.0 SAS Initiator 1 0002 000a N 6.0 1.5 6.0 SAS Initiator 2 N 1.5 6.0 SAS Initiator 3 N 1.5 6.0 SAS Initiator 4 0003 000b N 6.0 1.5 6.0 SAS Initiator 5 0004 000c N 6.0 1.5 6.0 SAS Initiator 6 N 1.5 6.0 SAS Initiator 7 N 1.5 6.0 SAS Initiator Devices: B____T SAS Address Handle Parent Device Speed Enc Slot Wdt 00 00 4433221100000000 0009 0001 SATA Target 6.0 0001 03 1 00 01 4433221101000000 000a 0002 SATA Target 6.0 0001 02 1 00 02 4433221104000000 000b 0003 SATA Target 6.0 0001 07 1 00 03 4433221105000000 000c 0004 SATA Target 6.0 0001 06 1 Enclosures: Slots Logical ID SEPHandle EncHandle Type 08 56c92bf00003bcd7 0001 Direct Attached SGPIO Expanders: NumPhys SAS Address DevHandle Parent EncHandle SAS Level
Code:
mps0: Reinitializing controller mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc> (da1:mps0:0:1:0): WRITE(10). CDB: 2a 00 bd b3 10 58 00 08 00 00 length 1048576 SMID 440 Command timeout on target 1(0x000a) 60000 set, 60.590619855 elapsed mps0: Sending abort to target 1 for SMID 440 (da1:mps0:0:1:0): WRITE(10). CDB: 2a 00 bd b3 10 58 00 08 00 00 length 1048576 SMID 440 Aborting command 0xfffffe00c51edf40 (da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db af a1 d8 00 08 00 00 length 1048576 SMID 432 Command timeout on target 0(0x0009) 60000 set, 60.561992076 elapsed mps0: Sending abort to target 0 for SMID 432 (da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db af a1 d8 00 08 00 00 length 1048576 SMID 432 Aborting command 0xfffffe00c51ed480 (da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db af a9 d8 00 08 00 00 length 1048576 SMID 392 Command timeout on target 0(0x0009) 60000 set, 60.530714629 elapsed (da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db af b1 d8 00 08 00 00 length 1048576 SMID 306 Command timeout on target 0(0x0009) 60000 set, 60.501117892 elapsed (da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 f0 41 46 20 00 01 00 00 length 131072 SMID 211 Command timeout on target 2(0x000b) 60000 set, 60.475944824 elapsed mps0: Sending abort to target 2 for SMID 211 (da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 f0 41 46 20 00 01 00 00 length 131072 SMID 211 Aborting command 0xfffffe00c51dab88 (da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db af b9 d8 00 08 00 00 length 1048576 SMID 375 Command timeout on target 0(0x0009) 60000 set, 60.483962856 elapsed (da1:mps0:0:1:0): WRITE(10). CDB: 2a 00 bd b3 20 58 00 08 00 00 length 1048576 SMID 160 Command timeout on target 1(0x000a) 60000 set, 60.441826032 elapsed (da1:mps0:0:1:0): WRITE(10). CDB: 2a 00 bd b3 18 58 00 08 00 00 length 1048576 SMID 158 Command timeout on target 1(0x000a) 60000 set, 60.451334556 elapsed (da1:mps0:0:1:0): WRITE(10). CDB: 2a 00 bd b3 30 58 00 08 00 00 length 1048576 SMID 121 Command timeout on target 1(0x000a) 60000 set, 60.423550204 elapsed (da1:mps0:0:1:0): WRITE(10). CDB: 2a 00 bd b3 28 58 00 08 00 00 length 1048576 SMID 269 Command timeout on target 1(0x000a) 60000 set, 60.432836445 elapsed (da1:mps0:0:1:0): WRITE(10). CDB: 2a 00 be 41 46 e8 00 00 78 00 length 61440 SMID 417 Command timeout on target 1(0x000a) 60000 set, 60.404959286 elapsed (da1:mps0:0:1:0): WRITE(10). CDB: 2a 00 bd b3 38 58 00 05 00 00 length 655360 SMID 404 Command timeout on target 1(0x000a) 60000 set, 60.414351577 elapsed (da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db af c1 d8 00 08 00 00 length 1048576 SMID 433 Command timeout on target 0(0x0009) 60000 set, 60.241136485 elapsed (da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 f0 41 46 20 00 01 00 00 length 131072 SMID 364 Command timeout on target 0(0x0009) 60000 set, 60.222849270 elapsed (da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db af c9 d8 00 05 00 00 length 655360 SMID 322 Command timeout on target 0(0x0009) 60000 set, 60.232077823 elapsed (xpt0:mps0:0:2:0): SMID 9 task mgmt 0xfffffe00c51c9c18 timed out mps0: Reinitializing controller mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc> (da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db b8 18 d8 00 08 00 00 length 1048576 SMID 216 Command timeout on target 0(0x0009) 60000 set, 60.203261674 elapsed mps0: Sending abort to target 0 for SMID 216 (da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db b8 18 d8 00 08 00 00 length 1048576 SMID 216 Aborting command 0xfffffe00c51db240 (da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 db b8 28 d8 00 08 00 00 length 1048576 SMID 398 Command timeout on target 2(0x000b) 60000 set, 60.204955892 elapsed mps0: Sending abort to target 2 for SMID 398 (da2:mps0:0:2:0): WRITE(10). CDB: 2a 00 db b8 28 d8 00 08 00 00 length 1048576 SMID 398 Aborting command 0xfffffe00c51ea6d0 (da0:mps0:0:0:0): WRITE(10). CDB: 2a 00 db b8 28 d8 00 08 00 00 length 1048576 SMID 535 Command timeout on target 0(0x0009) 60000 set, 60.186371608 elapsed
Code:
Extended brand string: " Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz" CLFLUSH instruction cache line size: 8 Initial APIC ID: 1 Hyper threading siblings: 16 root@sirius[~]# zpool status sirius-b pool: sirius-b state: ONLINE scan: scrub repaired 768K in 1 days 04:32:21 with 0 errors on Mon May 22 04:32:21 2023 config: NAME STATE READ WRITE CKSUM sirius-b ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gptid/78f45af7-083b-11eb-8d8d-00259077a4ba ONLINE 0 0 0 gptid/7d4362ef-083b-11eb-8d8d-00259077a4ba ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 gptid/877b5643-083b-11eb-8d8d-00259077a4ba ONLINE 0 0 0 gptid/89fe6a49-083b-11eb-8d8d-00259077a4ba ONLINE 0 0 0 errors: No known data errors