LSI (Avago) 9207-8i with Seagate 10TB Enterprise (ST10000NM0016)

fdh5555

Cadet
Joined
Dec 29, 2017
Messages
4
On one of the two servers that I built with Seagate IronWolf 10GB and Enterprise 10GB drives, I've been getting these errors. Interestingly, on one server with an LSI 9300, I didn't get any errors. The other server has been getting errors. Initially, I had 6 drives using the motherboard (X10SDV) SATA ports and ACHI driver. I'd get about 10 FLUSHCACHE48 errors per day spread across all 6 drives. They appeared like:
Code:
Jan 6 09:00:50 ahcich34: Timeout on slot 31 port 0
Jan 6 09:00:50 ahcich34: is 00000000 cs 80000000 ss 00000000 rs 80000000 tfd c0 serr 00000000 cmd 0004df17
Jan 6 09:00:50 (ada4:ahcich34:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
Jan 6 09:00:50 (ada4:ahcich34:0:0:0): CAM status: Command timeout
Jan 6 09:00:50 (ada4:ahcich34:0:0:0): Retrying command
Hi, have you managed to resolve this FLUSHCACHE48 issue yet? I've got 8 Seagate ST8000NM0016 enterprise drives connected to SATA ports of X11SSH-F motherboard and I think I've been experiencing exactly the same problem. The machine worked fine except for annoying random 10 sec freezes. However I'm worrying that one day I will eventually get multiple drive failures if I don't do something about it.
 

mattlach

Patron
Joined
Oct 14, 2012
Messages
280
Like you, I was able to import my pool with Linux (Ubuntu 17.10). I continued getting errors (I/O errors) on the drives, but none that affected the pool.


While I haven't considered drives to be permanently affected by drivers, there is one behavior of my drives that makes me wonder if something similar could be the case. Specifically, I have one server with 8 of these drives that had no issues for 3 months. Then, within 30 minutes of running the sas3flash utility to inquire the firmware version of the LSI 3008 (9300-8i) card, I started getting the SYNCHRONIZE CACHE timeout errors on all the drives in the pool and those intermittent timeout errors have continued since -- bizarre behavior that I can't explain


Yeah, I'll reiterate from my previous post, for whatever its worth to help give you guys a data point.

I have 12 of these drives hooked up to two LSI 9211-8's in IT mode Phase 20 firmware under Proxmox which is Debian based.

I phased them in (did the replace one drive at a time and resilver to oncrease pool size) over three months, with the last drive going in on December 20th.

I have yet to have any problems over heavy use, and having gone through several scrubs at this point.
 

wreedps

Patron
Joined
Jul 22, 2015
Messages
225
I am having same issues on multiple systems with LSI 9207 and FreeNAS 11
 

skyyxy

Contributor
Joined
Jul 16, 2016
Messages
136
I am having same issues on multiple systems with LSI 9207 and FreeNAS 11
I replace some error disks, and everything works fine. my understand is about seagate's firmware, but ofcourse not only seagate but also WD or else. just unlucky. atleast for me it is. I used Freenas to build 6 servers for my friends, everything is perfect but except Seagate hdds, The EXOS enterprise disks has big problem with RAID sometimes. May be you can change to WD gold or esle even the Seagate Desktop.
 

wreedps

Patron
Joined
Jul 22, 2015
Messages
225
I have 2 servers running FreeNAS11 U5 with LSI 9207-8i controllers and it is failing WDs, Ultrastars, Seagate, and Intel SSDs left and right. I guess we are going to go back to FreeNAS 9. When will this problem be fixed?
 

skyyxy

Contributor
Joined
Jul 16, 2016
Messages
136
I have 2 servers running FreeNAS11 U5 with LSI 9207-8i controllers and it is failing WDs, Ultrastars, Seagate, and Intel SSDs left and right. I guess we are going to go back to FreeNAS 9. When will this problem be fixed?
In my experience is not about the Freenas version. the 11 is much more better than 9, espically the network and HBA driver and performance. I'm sure your issues is about the HDD because its happend times in my server and my friend's server. So everytime I build the new one the first thing is not copy the important data to new server, is copy some huge files to server just for test the hdds for few days(usually 3-5 days). If everything works fine and not CRC report that I just try to copy workfiles.

Every funny thing is : full hdds will creazy report error even just only 1 or 2 hdds really has problem, so I think you need check the freenas report that witch one or two has a lot of write and read and crc error and just replace it or them.
 

soulburn

Contributor
Joined
Jul 6, 2014
Messages
100
Sadly I have these drives and 2x FreeNAS 11.2 systems (one with LSI 2008 and the other with LSI 3008) and they have this issue. Nothing in this post worked to fix it. Some amount of disks will always fail, especially during a scrub. It's a huge problem for me.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Sadly I have these drives and 2x FreeNAS 11.2 systems (one with LSI 2008 and the other with LSI 3008) and they have this issue. Nothing in this post worked to fix it. Some amount of disks will always fail, especially during a scrub. It's a huge problem for me.
I have a server at work that is populated with sixty (60) of those drives running on a LSI 3008 controller and it has been working perfectly under heavy load including when the data (around 310 TB) was loaded in initially and when the system runs scrubs. If there is a problem, I would say it is not the drives. There may need to be some change, but I don't see the drives being at fault.

Firmware version on the SAS controller?

What else is in the system?
 

Evi Vanoost

Explorer
Joined
Aug 4, 2016
Messages
91
One of the 3TB Seagate models in the past had an issue when writes were issued at the same time a SMART command was issued (the drive would drop out). Try disabling the smartd process and anything else the controllers may be doing at the same time.

Check with Seagate whether or not they have a newer firmware as well as the controllers. Also check the cabling and power supply, especially if you're dealing with SAS-3 (12G), el-cheapo wiring is very sensitive.
 

soulburn

Contributor
Joined
Jul 6, 2014
Messages
100
I have a server at work that is populated with sixty (60) of those drives running on a LSI 3008 controller and it has been working perfectly under heavy load including when the data (around 310 TB) was loaded in initially and when the system runs scrubs. If there is a problem, I would say it is not the drives. There may need to be some change, but I don't see the drives being at fault.

Firmware version on the SAS controller?

What else is in the system?

I have 20 of these drives in total. The problem exists on two totally separate servers, one LSI 2008 based and one LSI 3008 based. For the LSI 3008 based server, it's on a Supermicro SuperStorage 6048R-E1CR36L server with 2x 8 core Xeons, 256 GB ECC RAM that is on Supermicro's compatibility list (I even tried swapping to totally different RAM to see if I had bad modules), 2x Supermicro DOM's for boot volumes, and also 3x Intel DC PCIe NVMe drives that are used for cache and ZIL. I am using a Mellanox 40GbE adapter in the aforementioned server in question and and an Intel X520-DA2 in the other server.

Code:
SAS3008IT Controller Firmware Release Note
---------------------------------------------
May, 2018
Revision 101


Firmware Name
---------------
3008IT16.ROM


Firmware Version
------------------
16.00.01.00


NVDATA Version
------------------
0E.01.30.28


OPROM Version
------------------
8.37.00.00
 

soulburn

Contributor
Joined
Jul 6, 2014
Messages
100
One of the 3TB Seagate models in the past had an issue when writes were issued at the same time a SMART command was issued (the drive would drop out). Try disabling the smartd process and anything else the controllers may be doing at the same time.

Check with Seagate whether or not they have a newer firmware as well as the controllers. Also check the cabling and power supply, especially if you're dealing with SAS-3 (12G), el-cheapo wiring is very sensitive.

Thanks for the reply. Unfortunately I have checked and there are no firmware updates. Also the problems persist on two totally separate servers with completely different hardware (power supplies, backplanes, and cables are completely different).
 

mattlach

Patron
Joined
Oct 14, 2012
Messages
280
I have 20 of these drives in total. The problem exists on two totally separate servers, one LSI 2008 based and one LSI 3008 based. For the LSI 3008 based server, it's on a Supermicro SuperStorage 6048R-E1CR36L server with 2x 8 core Xeons, 256 GB ECC RAM that is on Supermicro's compatibility list (I even tried swapping to totally different RAM to see if I had bad modules), 2x Supermicro DOM's for boot volumes, and also 3x Intel DC PCIe NVMe drives that are used for cache and ZIL. I am using a Mellanox 40GbE adapter in the aforementioned server in question and and an Intel X520-DA2 in the other server.

Code:
SAS3008IT Controller Firmware Release Note
---------------------------------------------
May, 2018
Revision 101


Firmware Name
---------------
3008IT16.ROM


Firmware Version
------------------
16.00.01.00


NVDATA Version
------------------
0E.01.30.28


OPROM Version
------------------
8.37.00.00


I'm even more convinced now that there is something either in BSD, BSD's LSI drivers, the BSD port of ZFS or FreeNAS itself at play here.

I've been running 12 of these Seagate ST10000NM0016 in a Debian Linux box with ZoL using two IBM M1015 reflashed to LSI 92011-8i JBOD mode for a hear and a half now without a single issue.
 

mloiterman

Dabbler
Joined
Jan 30, 2013
Messages
45
I'm even more convinced now that there is something either in BSD, BSD's LSI drivers, the BSD port of ZFS or FreeNAS itself at play here.

I've been running 12 of these Seagate ST10000NM0016 in a Debian Linux box with ZoL using two IBM M1015 reflashed to LSI 92011-8i JBOD mode for a hear and a half now without a single issue.

Having very similar issues with LSI 3008, Seagate and SuperMicro. My drives are not 3.5mm 10Tb, but 4Tb 2.5mm.

https://www.ixsystems.com/community...cache-command-timeout-error.55067/post-530528

So frustrated as I have invested a lot in of money into this server. Is there nothing that can be done apart from tearing it down and starting with a Linux base?
 

curtii

Dabbler
Joined
Jul 29, 2016
Messages
32
I've had pretty much the exact same issue since buying 4x 10TB Seagate Ironwolf drives. My original controller was a 9207-8i, and while troubleshooting this issue I purchased a 9300-8i, which did not change the behavior in any way.

In my experience, a very reliable way to trigger the disks to throw errors and get failed out of the pool would be to hammer the pool with write activity. on two or three instances, I would have no issues for weeks, and then on initiating a heavy write process on the pool, one or more disks would get failed out of the pool within a couple hours.

I did have the pool go into a fully "UNAVAIL" state, with 3 of the drives having been kicked out of the pool for errors. Even in that state, a reboot seems to bring it back online, and a subsequent scrub turned up no data errors. Very strange behavior all around.

I know this is an impractical workaround for larger arrays, but for my 4x disk setup, I tested another option which was to plug the four drives into the Motherboard SATA ports (My motherboard is a Supermicro X10SRi-F). I wouldn't yet say I'm 100% confident this has sidestepped the issue, but I am close: After moving to those SATA ports, I initiated a bigger write load than I had on the two or three previous tries which triggered at least one drive to "Fail". At the same time, I ran a scrub, so there was constant, heavy disk activity for at least 3 hours - 0 errors on any of the drives.

I'll be continuing to monitor this in the coming weeks, and will post an update here one way or the other.
 

soulburn

Contributor
Joined
Jul 6, 2014
Messages
100
I did have the pool go into a fully "UNAVAIL" state, with 3 of the drives having been kicked out of the pool for errors. Even in that state, a reboot seems to bring it back online, and a subsequent scrub turned up no data errors. Very strange behavior all around.

I can also confirm this exact behavior and solution.
 

Holt Andrei Tiberiu

Contributor
Joined
Jan 13, 2016
Messages
129
Can I have a dumb question. Why is everybody trying to put SATA drives on SAS card's and expect performance and stability?
I also had a ton of problems using sata drives, especially in shelf's. After switching to SAS, problems went away. But again, why SATA ? No offence, but is intended for Home Use.
SMB or Enterprise is for SAS. Wonder why SATA never got the 12 gigabit bandwidth?

WD Red, Gold, these are Firmware Marketing. At those prices you could buy SAS.

And SAS drives are intended for 24/7 operation.

I have 2 lsi 2308 card's connected to 2 Dell shelf's. Each shelf has a mix brand and capacity SAS drives. 0 errors.
1 shelf has 24 2.5 inch drives, the other has 12 3.5 inch drives. I have from Seagate 3tb to hgst 8 tb drives and 0 errors or warning messages. Ah yeah, and multipath, Wich surprisingly works.
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175

Holt Andrei Tiberiu

Contributor
Joined
Jan 13, 2016
Messages
129
Aaa. Yeah. How is it going so far for some ?
 

Holt Andrei Tiberiu

Contributor
Joined
Jan 13, 2016
Messages
129
regarding the Seagate Drves, did you try to do a FW upgrade, is any available ?
 
Top