Again or Still siisch0: Timeout on slot X and IRQ Storm on SiI3132 SATA/Multiplier

Status
Not open for further replies.

mrsieb

Dabbler
Joined
Jan 30, 2012
Messages
20
Hi There, after more than 6 month playing arround an the NasSystem reading every Thread what i found about this Topic,
I am still not able to solve this anoying Problem.

At First i onlly get this on my external E-Sata Cube with 8Bays (2xPortmultipliere connect to SiI3132 SATA controller)

First at all i tryed nearly everthing suggest in other Threads...
*) change AHCI to IDE
*) hint.siisch.0.sata_rev=1
*) hint.MSI.X.message=1
*) change all samsung disks (HD204UI in case of Firmware Bug)
*) chang all Sata Cabels
*) an tons of other loader.conf and sysctl settings
*) change the whole motherboard

Fazit when i boot the NAS everthing seams to be OK after a couble of hours
its starts to trigger "interrupt storm detected on "irq16:"; throttling interrupt source"
Irq go up to 150k/s Interrupts over for a couble of seconds and go back to 4k/s according to systat -v 1

Maybe this helps also, if the system is quiete this IRQ Storm start when i trigger a Shutdown
with the PowerButton on the maschien... then the shutdown needs 10 times longer than normal
in case of this IRQ storm between... Sometime the storm prevent me from connecting via ssh or WebGUI
until i reboot...

According to this Post -> http://forums.freebsd.org/showthread.php?t=17432
and this -> http://lists.freebsd.org/pipermail/freebsd-fs/2010-September/009438.html I belive this is still a kind of bug in the siis(4) driver when you are using
portmultiplieres....
The interessting thing is it only happens always to the siisch0: channel, does not
matter which disk are connected to this Multipliere Block....

Still this is vey anoying to scrub or rebuild the Pool every week in case of siis timeouts disk freezes.
Also if you pull out the Disk and check it standalone in a other system with smart or other
diagnose Tool the Harddsik pass everything... What i want to say is i am sure there is no deffekt hardware
in the system! (even cables)


greetings....
 

mrsieb

Dabbler
Joined
Jan 30, 2012
Messages
20
...

Still watching out:

Also playing arround with the
sysctl hw.interrupt.treshold I set it to a
extrem high value -> 2Mio instead of 2000

if watch systat -v 1 the IRQ 16 has an averrage rate of 1000/sec
and looks pretty good....

But Still IRQ Storm is detected after a couple of hours operation.
(even the Storage is 100% idle)

Interessting is the exact Timming of this error:

Code:
Sep 26 11:00:01 NASI kernel: interrupt storm detected on "irq16:"; throttling interrupt source
Sep 26 11:00:32 NASI last message repeated 31 times
Sep 26 11:02:33 NASI last message repeated 121 times
Sep 26 11:12:34 NASI last message repeated 601 times
Sep 26 11:22:35 NASI last message repeated 601 times
Sep 26 11:32:36 NASI last message repeated 601 times
Sep 26 11:42:37 NASI last message repeated 573 times
Sep 26 11:52:38 NASI last message repeated 601 times
Sep 26 12:02:39 NASI last message repeated 601 times
Sep 26 12:12:40 NASI last message repeated 601 times
Sep 26 12:22:41 NASI last message repeated 601 times
Sep 26 12:32:42 NASI last message repeated 601 times
Sep 26 12:42:42 NASI last message repeated 600 times
Sep 26 12:52:42 NASI last message repeated 600 times
Sep 26 13:02:42 NASI last message repeated 600 times
Sep 26 13:12:42 NASI last message repeated 600 times
Sep 26 13:22:42 NASI last message repeated 600 times
Sep 26 13:32:42 NASI last message repeated 600 times
Sep 26 13:42:42 NASI last message repeated 600 times


So, exact every 10 Minutes, what happens on FreeNas every 10 minutes ?
Maybe this helps to get closer to the source of the problem....

greetings
 

mrsieb

Dabbler
Joined
Jan 30, 2012
Messages
20
What is using irq16?
(vmstat -i will tell you)

allready checked

Code:

interrupt                          total       rate
irq16: ehci0 siis0+           3833466001      42477
irq19: re0                     147076998       1629
irq23: ehci1                      181109          2
cpu0: timer                    180497938       2000
irq256: ahci0                   89392759        990
cpu3: timer                    180437900       1999
cpu1: timer                    180437901       1999
cpu2: timer                    180437901       1999
Total                         4791928507      53097



Code:
cat /var/log/dmesg.today | grep "irq 16"
pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0
vgapci0: <VGA-compatible display> port 0xf000-0xf03f mem 0xfb400000-0xfb7fffff,0xc0000000-0xcfffffff irq 16 at device 2.0 on pci0
ehci0: <EHCI (generic) USB 2.0 controller> mem 0xfbc03000-0xfbc033ff irq 16 at device 26.0 on pci0
siis0: <SiI3132 SATA controller> port 0xe000-0xe07f mem 0xfbb84000-0xfbb8407f,0xfbb80000-0xfbb83fff irq 16 at device 0.0 on pci2
pcib3: <ACPI PCI-PCI bridge> irq 16 at device 28.1 on pci0
siis1: <SiI3132 SATA controller> port 0xc000-0xc07f mem 0xfb884000-0xfb88407f,0xfb880000-0xfb883fff irq 16 at device 0.0 on pci6


Its coming from the Sata Controller, stillt producing sometime 183k Intrerupts/sec

found a other post http://tinyurl.com/987x5v9
according to this i belive more and more there is some issue with IRQ handling
in BSD 8.2 - 9.0
 

tingo

Contributor
Joined
Nov 5, 2011
Messages
137
Hmm, by any chance, is the usb stick you are using to boot from connected to the system via ehci0?
If so, can you try if connecting it via ehci1 makes a difference? (Probably not, but worth a try anyway)
 

mrsieb

Dabbler
Joined
Jan 30, 2012
Messages
20
Hmm, by any chance, is the usb stick you are using to boot from connected to the system via ehci0?
If so, can you try if connecting it via ehci1 makes a difference? (Probably not, but worth a try anyway)

Yeah,I had also already supposed this, so i did a dd to a Flash drive which is connected to a normal Sata Port.
So no USB device is connected even no keyboard.
I also try to disable every USB extra functions in the BIOS of the Mobo.
But still IRQ Storm :-(
 

tingo

Contributor
Joined
Nov 5, 2011
Messages
137
Do you get an interrupt storm if you are not using the port multipliers?
(the only machine I have which is using the siis driver doesn't have a port multiplier)
 

mrsieb

Dabbler
Joined
Jan 30, 2012
Messages
20
The problem is, there are 12 Disks in my zpool, i have no maschin with enought internal Sata Ports,
so without the Multiplier i am not able to set the pool online.

If the Pool is offline means the E-Sata Box is not connected there is no IRQ storm.

Anyhow i know my config is not the best solution E-Sata Drives with multipliers.
Any how the IRQ Storm become less agressiv when i throttel the speed
on the PMP0 and PMP1 with hint.siisch.0.sata_rev=1 and hint.siisch.1.sata_rev=1
in the loader conf. So the speed is 150MB/s instead of 300MB/s for each multipliere channel.
At least still there are 8 Hardisk on a dual E-sata Controller in a 1xPCIe Bus maybe this
is the Bottleneck?!?
 

tingo

Contributor
Joined
Nov 5, 2011
Messages
137
What kind (make oand model) of E-sata boxes do you have?

Do you have spare harddrives so you can run a test pool without the e-sata boxes?
Or a spare motherboard (preferable with a differnt chipset) so you can test with that?

The problem you have hit is the hard kind; if you are going to solve it, you will need to spend quite a bit of time on it, trying different parts, and / or providing logs and error messages to the FreeBSD developers (but that would require that you ran FreeBSD on this setup, and was able to reproduce the problem under FreeBSD).
 

mrsieb

Dabbler
Joined
Jan 30, 2012
Messages
20
What kind (make oand model) of E-sata boxes do you have?

DAS801t frome Edge10
-> http://kauppa.cityplus.fi/Support/Edge10/EdgeDAS8.pdf

inside are 2x5port multipliers each using only 4 ports. Unfortunaly i dont
now the brand of the Backplane maybe this is a very cheap hardware.
I have all 8 disk in this Bay and 4 Disk internal.
It is one zpool builded ab with 3xMirros eacht 4 disk RAID-Z1



Do you have spare harddrives so you can run a test pool without the e-sata boxes?
Or a spare motherboard (preferable with a differnt chipset) so you can test with that?

Indeed i have a couple of spare Disks, hmm that i will do/try build ab a pool with 4 disk only conneted to
the internal Sata Ports. I will also look again for a different motherboard....


The problem you have hit is the hard kind; if you are going to solve it, you will need to spend quite a bit of time on it, trying different parts, and / or providing logs and error messages to the FreeBSD developers (but that would require that you ran FreeBSD on this setup, and was able to reproduce the problem under FreeBSD).

How ever i am fight around with this issue since 6 month, so additional 6 Month should not be the problem :smile: Running Nativ FreeBSD is also not the problem, in case of the easy import of an
Z-Pool. I allready have a second flash drive with BSD 9.0 on it. I was able to boot ab import the pool and work with it, but anyhow after a while IRQ Storm is coming at least a siis slot port time out
of some disks and i have to reboot to get de disk back in the Pool. The problem to find some BSD Developer who is willing to go deeper into the problem.

At least thank you for your response and greetings to Norway -> offtopic -> you have ever been at The Gathering in Hamar ? ,-)
 

mrsieb

Dabbler
Joined
Jan 30, 2012
Messages
20
So after This Kernel Trap, i set in the BIOS the MultiCore CPU to only one CPU only an educated guess....
Any how after a couple of hours....IRQ Storms seams to be quite now... but...

Code:
siisch0: Timeout on slot 29
siisch0: siis_timeout is 07040000 ss 3c000000 rs 3c000000 es 00000000 sts 801b2000 serr 00680000
siisch0:  ... waiting for slots 1c000000
siisch0: Timeout on slot 28
siisch0: siis_timeout is 07040000 ss 3c000000 rs 3c000000 es 00000000 sts 801b2000 serr 00680000
siisch0:  ... waiting for slots 0c000000
siisch0: Timeout on slot 27
siisch0: siis_timeout is 07040000 ss 3c000000 rs 3c000000 es 00000000 sts 801b2000 serr 00680000
siisch0:  ... waiting for slots 04000000
siisch0: Timeout on slot 26
siisch0: siis_timeout is 07040000 ss 3c000000 rs 3c000000 es 00000000 sts 801b2000 serr 00680000


When a think now about Multiplieres and or Esata Boxes....I talk to alot people every body have problems with
multiplieres does not matter which Operation System... Windows Linux BSD... everybody is fighting arround with this stuff....
The attempt to save money with Portmultipliers backfires......
 

tingo

Contributor
Joined
Nov 5, 2011
Messages
137
At least thank you for your response and greetings to Norway -> offtopic -> you have ever been at The Gathering in Hamar ? ,-)

No, I haven't been at TG in person, I guess I'm too old now, and when I was younger I lived on the other (northern) end of this country. But I follow the news about TG every year. Interesting stuff.
 
Status
Not open for further replies.
Top