As I'm experiencing this problem, too, I've made a recopilation of tests I've done, in order to see if they can help anyone. It's about my own experiences whit this error, and I'm still investigating how to fix, or at least minimize this issue with the ahci timeouts while I try to remove the rust to my written english (sorry, pals!)
What I've found until now...
Specs: Gigabyte GA990FXA-UD5 w/ AMD phenom II 955 & 16GB RipjawsX@1600MHZ and an Intel NC380T PCIe 4x dual gigabit LAN adapter
Using onboard SAS controller plus two aditional DELL SAS 6/ir (both flashed with LSI 1068IT firm). Internal controller is using 4x Samsung HD154UI drives in Raidz mode. The other two boards have 8x Samsung HD204UI and 8X Samsung HD753LJ drives, all of them in Raidz2 config. Everything into a Norco RCP-4020 chassis and powered with an OCZ 700W PSU.
The SAS boards are working like a charm, but the integrated one (AMD SB950/IXF700) is another story (more like a nightmare, i should say...). When copying single files, everything's ok with this controller, but when it is under heavy duty... Ta-daaa! the AHCI timeouts appears.
Until now, all i've found about this problem is that the involved mobos are using an AMD SB*** integrated SATA controller, and the fact that processor speed or ram amount doesn't seems to affect this problems for better or for worse...
Now, what i've tried until now...
-Set SATA in IDE mode instead of AHCI: Timeouts
-Test with another drives (HD753LJ instead of the HD154UI installed): Timeouts
-Add to loader.conf this line 'hint.ahci.0.msi=0' to see if there was a problem with irq handling: Timeouts
-Disabling all on-board hardware in BIOS but the essential ones: Timeouts
-Changing the graphic card for a PCI one: Timeouts
-Removing all cards (network, additional SAS controllers, etc): Timeouts
-Updating BIOS from F5 to latest one (F7): Timeouts
-Plugging in disks directly into the onboard controller (avoiding the Norco backplanes, i mean): Timeouts
-Disabling all services in Freenas (CIFS, SMART, etc): .....Well, guess what!
After that, I've tested the 8.0.4-p1 amd64 Freenas version on another mobo (Asus Crosshair + AMD Phenom 9950 + 8 GB 800MHz DDR2 RAM), and the drives worked perfectly, without any errors nor timeouts...
Then I've tried to plug 6 drives into the onboard controller instead of 4 in order to fill all the onboard ports, then the timeouts have noticeabilly decreased. Even with the timeouts the system continued copying/moving data without hanging totally like before...
BUT!!! (pero! - mais! - aber! - porem!), since i cannot feel satisfied until i've tortured enough this config to see where the problem could be.............
I tested the ne Raidz2 of 6 disks creating some data with the 'dd' command. Sadly, I found the same situation as previously: when dataset is under heavy load... (dd if=/dev/zero of=/mnt/TEST/TEST/testfile01 bs=100m count=200) + (dd if=/mnt/TEST/TEST/testfile02 of=/dev/null bs=100m count=200) = TIMEOUT!!
After that, I destroyed the raidz2 dataset, and created a striped dataset with the 6 drives. Next, I did the following test (all lines executing simultaneously)
dd if=/dev/zero of=/mnt/TEST/TEST/test01 bs=100m count=300 &
dd if=/dev/zero of=/mnt/TEST/TEST/test02 bs=100m count=300 &
dd if=/dev/zero of=/mnt/TEST/TEST/test03 bs=100m count=300 &
dd if=/dev/zero of=/mnt/TEST/TEST/test04 bs=100m count=300 &
dd if=/dev/zero of=/mnt/TEST/TEST/test05 bs=100m count=300 &
dd if=/mnt/TEST/TEST/test06 of=/dev/null bs=100m count=300 &
dd if=/mnt/TEST/TEST/test07 of=/dev/null bs=100m count=300 &
dd if=/mnt/TEST/TEST/test08 of=/dev/null bs=100m count=300 &
dd if=/mnt/TEST/TEST/test09 of=/dev/null bs=100m count=300 &
dd if=/mnt/TEST/TEST/test10 of=/dev/null bs=100m count=300 &
Did I mentioned that I made these test during a scrub? So please, don't ask how many time I stood looking at the screen like an idiot to see if there was any errors. It's too depressing... :D
...
...
...
...
...
...
...And after a looong wait, finally the errors appeared again, but I don't know if forcing the filesystem at this level could produce the timeouts even in a "working" system...
As a final test, I've tested the whole system with a Freenas 0.7.2.8191 Amd64 LiveCd, and after many hours repeating the 'dd' + scrub combination, the problems haven't appeared, so I think this discard that there could be a problem with the disks, or even the mobo SATA controller (this points a 'problem' between AMD chipset and the new AHCI driver, perhaps?)
Any ideas or suggestions about this, or should I test the physics law that everything can fly... with the use of adequate dose of bad mood and brute force? :D
What I've found until now...
Specs: Gigabyte GA990FXA-UD5 w/ AMD phenom II 955 & 16GB RipjawsX@1600MHZ and an Intel NC380T PCIe 4x dual gigabit LAN adapter
Using onboard SAS controller plus two aditional DELL SAS 6/ir (both flashed with LSI 1068IT firm). Internal controller is using 4x Samsung HD154UI drives in Raidz mode. The other two boards have 8x Samsung HD204UI and 8X Samsung HD753LJ drives, all of them in Raidz2 config. Everything into a Norco RCP-4020 chassis and powered with an OCZ 700W PSU.
The SAS boards are working like a charm, but the integrated one (AMD SB950/IXF700) is another story (more like a nightmare, i should say...). When copying single files, everything's ok with this controller, but when it is under heavy duty... Ta-daaa! the AHCI timeouts appears.
Until now, all i've found about this problem is that the involved mobos are using an AMD SB*** integrated SATA controller, and the fact that processor speed or ram amount doesn't seems to affect this problems for better or for worse...
Now, what i've tried until now...
-Set SATA in IDE mode instead of AHCI: Timeouts
-Test with another drives (HD753LJ instead of the HD154UI installed): Timeouts
-Add to loader.conf this line 'hint.ahci.0.msi=0' to see if there was a problem with irq handling: Timeouts
-Disabling all on-board hardware in BIOS but the essential ones: Timeouts
-Changing the graphic card for a PCI one: Timeouts
-Removing all cards (network, additional SAS controllers, etc): Timeouts
-Updating BIOS from F5 to latest one (F7): Timeouts
-Plugging in disks directly into the onboard controller (avoiding the Norco backplanes, i mean): Timeouts
-Disabling all services in Freenas (CIFS, SMART, etc): .....Well, guess what!
After that, I've tested the 8.0.4-p1 amd64 Freenas version on another mobo (Asus Crosshair + AMD Phenom 9950 + 8 GB 800MHz DDR2 RAM), and the drives worked perfectly, without any errors nor timeouts...
Then I've tried to plug 6 drives into the onboard controller instead of 4 in order to fill all the onboard ports, then the timeouts have noticeabilly decreased. Even with the timeouts the system continued copying/moving data without hanging totally like before...
BUT!!! (pero! - mais! - aber! - porem!), since i cannot feel satisfied until i've tortured enough this config to see where the problem could be.............
I tested the ne Raidz2 of 6 disks creating some data with the 'dd' command. Sadly, I found the same situation as previously: when dataset is under heavy load... (dd if=/dev/zero of=/mnt/TEST/TEST/testfile01 bs=100m count=200) + (dd if=/mnt/TEST/TEST/testfile02 of=/dev/null bs=100m count=200) = TIMEOUT!!
After that, I destroyed the raidz2 dataset, and created a striped dataset with the 6 drives. Next, I did the following test (all lines executing simultaneously)
dd if=/dev/zero of=/mnt/TEST/TEST/test01 bs=100m count=300 &
dd if=/dev/zero of=/mnt/TEST/TEST/test02 bs=100m count=300 &
dd if=/dev/zero of=/mnt/TEST/TEST/test03 bs=100m count=300 &
dd if=/dev/zero of=/mnt/TEST/TEST/test04 bs=100m count=300 &
dd if=/dev/zero of=/mnt/TEST/TEST/test05 bs=100m count=300 &
dd if=/mnt/TEST/TEST/test06 of=/dev/null bs=100m count=300 &
dd if=/mnt/TEST/TEST/test07 of=/dev/null bs=100m count=300 &
dd if=/mnt/TEST/TEST/test08 of=/dev/null bs=100m count=300 &
dd if=/mnt/TEST/TEST/test09 of=/dev/null bs=100m count=300 &
dd if=/mnt/TEST/TEST/test10 of=/dev/null bs=100m count=300 &
Did I mentioned that I made these test during a scrub? So please, don't ask how many time I stood looking at the screen like an idiot to see if there was any errors. It's too depressing... :D
...
...
...
...
...
...
...And after a looong wait, finally the errors appeared again, but I don't know if forcing the filesystem at this level could produce the timeouts even in a "working" system...
As a final test, I've tested the whole system with a Freenas 0.7.2.8191 Amd64 LiveCd, and after many hours repeating the 'dd' + scrub combination, the problems haven't appeared, so I think this discard that there could be a problem with the disks, or even the mobo SATA controller (this points a 'problem' between AMD chipset and the new AHCI driver, perhaps?)
Any ideas or suggestions about this, or should I test the physics law that everything can fly... with the use of adequate dose of bad mood and brute force? :D