LSI (Avago) 9207-8i with Seagate 10TB Enterprise (ST10000NM0016)

fdh5555 · Mar 11, 2018

kmr99 said:
On one of the two servers that I built with Seagate IronWolf 10GB and Enterprise 10GB drives, I've been getting these errors. Interestingly, on one server with an LSI 9300, I didn't get any errors. The other server has been getting errors. Initially, I had 6 drives using the motherboard (X10SDV) SATA ports and ACHI driver. I'd get about 10 FLUSHCACHE48 errors per day spread across all 6 drives. They appeared like:

Code:
Jan 6 09:00:50 ahcich34: Timeout on slot 31 port 0 Jan 6 09:00:50 ahcich34: is 00000000 cs 80000000 ss 00000000 rs 80000000 tfd c0 serr 00000000 cmd 0004df17 Jan 6 09:00:50 (ada4:ahcich34:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 Jan 6 09:00:50 (ada4:ahcich34:0:0:0): CAM status: Command timeout Jan 6 09:00:50 (ada4:ahcich34:0:0:0): Retrying command

Hi, have you managed to resolve this FLUSHCACHE48 issue yet? I've got 8 Seagate ST8000NM0016 enterprise drives connected to SATA ports of X11SSH-F motherboard and I think I've been experiencing exactly the same problem. The machine worked fine except for annoying random 10 sec freezes. However I'm worrying that one day I will eventually get multiple drive failures if I don't do something about it.

mattlach · Mar 15, 2018

kmr99 said:
Like you, I was able to import my pool with Linux (Ubuntu 17.10). I continued getting errors (I/O errors) on the drives, but none that affected the pool.

While I haven't considered drives to be permanently affected by drivers, there is one behavior of my drives that makes me wonder if something similar could be the case. Specifically, I have one server with 8 of these drives that had no issues for 3 months. Then, within 30 minutes of running the sas3flash utility to inquire the firmware version of the LSI 3008 (9300-8i) card, I started getting the SYNCHRONIZE CACHE timeout errors on all the drives in the pool and those intermittent timeout errors have continued since -- bizarre behavior that I can't explain

Yeah, I'll reiterate from my previous post, for whatever its worth to help give you guys a data point.

I have 12 of these drives hooked up to two LSI 9211-8's in IT mode Phase 20 firmware under Proxmox which is Debian based.

I phased them in (did the replace one drive at a time and resilver to oncrease pool size) over three months, with the last drive going in on December 20th.

I have yet to have any problems over heavy use, and having gone through several scrubs at this point.

wreedps · Jun 16, 2018

I am having same issues on multiple systems with LSI 9207 and FreeNAS 11

skyyxy · Jun 18, 2018

wreedps said:
I am having same issues on multiple systems with LSI 9207 and FreeNAS 11

I replace some error disks, and everything works fine. my understand is about seagate's firmware, but ofcourse not only seagate but also WD or else. just unlucky. atleast for me it is. I used Freenas to build 6 servers for my friends, everything is perfect but except Seagate hdds, The EXOS enterprise disks has big problem with RAID sometimes. May be you can change to WD gold or esle even the Seagate Desktop.

wreedps · Jun 18, 2018

I have 2 servers running FreeNAS11 U5 with LSI 9207-8i controllers and it is failing WDs, Ultrastars, Seagate, and Intel SSDs left and right. I guess we are going to go back to FreeNAS 9. When will this problem be fixed?

skyyxy · Jun 18, 2018

wreedps said:
I have 2 servers running FreeNAS11 U5 with LSI 9207-8i controllers and it is failing WDs, Ultrastars, Seagate, and Intel SSDs left and right. I guess we are going to go back to FreeNAS 9. When will this problem be fixed?

In my experience is not about the Freenas version. the 11 is much more better than 9, espically the network and HBA driver and performance. I'm sure your issues is about the HDD because its happend times in my server and my friend's server. So everytime I build the new one the first thing is not copy the important data to new server, is copy some huge files to server just for test the hdds for few days(usually 3-5 days). If everything works fine and not CRC report that I just try to copy workfiles.

Every funny thing is : full hdds will creazy report error even just only 1 or 2 hdds really has problem, so I think you need check the freenas report that witch one or two has a lot of write and read and crc error and just replace it or them.

soulburn · May 1, 2019

Sadly I have these drives and 2x FreeNAS 11.2 systems (one with LSI 2008 and the other with LSI 3008) and they have this issue. Nothing in this post worked to fix it. Some amount of disks will always fail, especially during a scrub. It's a huge problem for me.

Chris Moore · May 1, 2019

soulburn said:
Sadly I have these drives and 2x FreeNAS 11.2 systems (one with LSI 2008 and the other with LSI 3008) and they have this issue. Nothing in this post worked to fix it. Some amount of disks will always fail, especially during a scrub. It's a huge problem for me.

I have a server at work that is populated with sixty (60) of those drives running on a LSI 3008 controller and it has been working perfectly under heavy load including when the data (around 310 TB) was loaded in initially and when the system runs scrubs. If there is a problem, I would say it is not the drives. There may need to be some change, but I don't see the drives being at fault.

Firmware version on the SAS controller?

What else is in the system?

Evi Vanoost · May 1, 2019

One of the 3TB Seagate models in the past had an issue when writes were issued at the same time a SMART command was issued (the drive would drop out). Try disabling the smartd process and anything else the controllers may be doing at the same time.

Check with Seagate whether or not they have a newer firmware as well as the controllers. Also check the cabling and power supply, especially if you're dealing with SAS-3 (12G), el-cheapo wiring is very sensitive.

soulburn · May 1, 2019

Chris Moore said:
I have a server at work that is populated with sixty (60) of those drives running on a LSI 3008 controller and it has been working perfectly under heavy load including when the data (around 310 TB) was loaded in initially and when the system runs scrubs. If there is a problem, I would say it is not the drives. There may need to be some change, but I don't see the drives being at fault.

Firmware version on the SAS controller?

What else is in the system?

I have 20 of these drives in total. The problem exists on two totally separate servers, one LSI 2008 based and one LSI 3008 based. For the LSI 3008 based server, it's on a Supermicro SuperStorage 6048R-E1CR36L server with 2x 8 core Xeons, 256 GB ECC RAM that is on Supermicro's compatibility list (I even tried swapping to totally different RAM to see if I had bad modules), 2x Supermicro DOM's for boot volumes, and also 3x Intel DC PCIe NVMe drives that are used for cache and ZIL. I am using a Mellanox 40GbE adapter in the aforementioned server in question and and an Intel X520-DA2 in the other server.

Code:

SAS3008IT Controller Firmware Release Note
---------------------------------------------
May, 2018
Revision 101


Firmware Name
---------------
3008IT16.ROM


Firmware Version
------------------
16.00.01.00


NVDATA Version
------------------
0E.01.30.28


OPROM Version
------------------
8.37.00.00

soulburn · May 1, 2019

Evi Vanoost said:
One of the 3TB Seagate models in the past had an issue when writes were issued at the same time a SMART command was issued (the drive would drop out). Try disabling the smartd process and anything else the controllers may be doing at the same time.

Check with Seagate whether or not they have a newer firmware as well as the controllers. Also check the cabling and power supply, especially if you're dealing with SAS-3 (12G), el-cheapo wiring is very sensitive.

Thanks for the reply. Unfortunately I have checked and there are no firmware updates. Also the problems persist on two totally separate servers with completely different hardware (power supplies, backplanes, and cables are completely different).

mattlach · May 1, 2019

soulburn said:
I have 20 of these drives in total. The problem exists on two totally separate servers, one LSI 2008 based and one LSI 3008 based. For the LSI 3008 based server, it's on a Supermicro SuperStorage 6048R-E1CR36L server with 2x 8 core Xeons, 256 GB ECC RAM that is on Supermicro's compatibility list (I even tried swapping to totally different RAM to see if I had bad modules), 2x Supermicro DOM's for boot volumes, and also 3x Intel DC PCIe NVMe drives that are used for cache and ZIL. I am using a Mellanox 40GbE adapter in the aforementioned server in question and and an Intel X520-DA2 in the other server.

Code:
SAS3008IT Controller Firmware Release Note --------------------------------------------- May, 2018 Revision 101 Firmware Name --------------- 3008IT16.ROM Firmware Version ------------------ 16.00.01.00 NVDATA Version ------------------ 0E.01.30.28 OPROM Version ------------------ 8.37.00.00

I'm even more convinced now that there is something either in BSD, BSD's LSI drivers, the BSD port of ZFS or FreeNAS itself at play here.

I've been running 12 of these Seagate ST10000NM0016 in a Debian Linux box with ZoL using two IBM M1015 reflashed to LSI 92011-8i JBOD mode for a hear and a half now without a single issue.

mloiterman · May 12, 2019

mattlach said:
I'm even more convinced now that there is something either in BSD, BSD's LSI drivers, the BSD port of ZFS or FreeNAS itself at play here.

I've been running 12 of these Seagate ST10000NM0016 in a Debian Linux box with ZoL using two IBM M1015 reflashed to LSI 92011-8i JBOD mode for a hear and a half now without a single issue.

Having very similar issues with LSI 3008, Seagate and SuperMicro. My drives are not 3.5mm 10Tb, but 4Tb 2.5mm.

https://www.ixsystems.com/community...cache-command-timeout-error.55067/post-530528

So frustrated as I have invested a lot in of money into this server. Is there nothing that can be done apart from tearing it down and starting with a Linux base?

curtii · Jun 27, 2019

I've had pretty much the exact same issue since buying 4x 10TB Seagate Ironwolf drives. My original controller was a 9207-8i, and while troubleshooting this issue I purchased a 9300-8i, which did not change the behavior in any way.

In my experience, a very reliable way to trigger the disks to throw errors and get failed out of the pool would be to hammer the pool with write activity. on two or three instances, I would have no issues for weeks, and then on initiating a heavy write process on the pool, one or more disks would get failed out of the pool within a couple hours.

I did have the pool go into a fully "UNAVAIL" state, with 3 of the drives having been kicked out of the pool for errors. Even in that state, a reboot seems to bring it back online, and a subsequent scrub turned up no data errors. Very strange behavior all around.

I know this is an impractical workaround for larger arrays, but for my 4x disk setup, I tested another option which was to plug the four drives into the Motherboard SATA ports (My motherboard is a Supermicro X10SRi-F). I wouldn't yet say I'm 100% confident this has sidestepped the issue, but I am close: After moving to those SATA ports, I initiated a bigger write load than I had on the two or three previous tries which triggered at least one drive to "Fail". At the same time, I ran a scrub, so there was constant, heavy disk activity for at least 3 hours - 0 errors on any of the drives.

I'll be continuing to monitor this in the coming weeks, and will post an update here one way or the other.

soulburn · Jun 27, 2019

curtii said:
I did have the pool go into a fully "UNAVAIL" state, with 3 of the drives having been kicked out of the pool for errors. Even in that state, a reboot seems to bring it back online, and a subsequent scrub turned up no data errors. Very strange behavior all around.

I can also confirm this exact behavior and solution.

Holt Andrei Tiberiu · Jul 29, 2019

Can I have a dumb question. Why is everybody trying to put SATA drives on SAS card's and expect performance and stability?
I also had a ton of problems using sata drives, especially in shelf's. After switching to SAS, problems went away. But again, why SATA ? No offence, but is intended for Home Use.
SMB or Enterprise is for SAS. Wonder why SATA never got the 12 gigabit bandwidth?

WD Red, Gold, these are Firmware Marketing. At those prices you could buy SAS.

And SAS drives are intended for 24/7 operation.

I have 2 lsi 2308 card's connected to 2 Dell shelf's. Each shelf has a mix brand and capacity SAS drives. 0 errors.
1 shelf has 24 2.5 inch drives, the other has 12 3.5 inch drives. I have from Seagate 3tb to hgst 8 tb drives and 0 errors or warning messages. Ah yeah, and multipath, Wich surprisingly works.

Ericloewe · Jul 29, 2019

Holt Andrei Tiberiu said:
Why is everybody trying to put SATA drives on SAS card's and expect performance and stability?

Because SAS controllers are designed for that.

Holt Andrei Tiberiu · Jul 29, 2019

Aaa. Yeah. How is it going so far for some ?

Holt Andrei Tiberiu · Aug 4, 2019

regarding the Seagate Drves, did you try to do a FW upgrade, is any available ?

Quindor · Sep 8, 2019

New information and a potential fix for these drives has appeared, please check it out in this topic.

Important Announcement for the TrueNAS Community.

LSI (Avago) 9207-8i with Seagate 10TB Enterprise (ST10000NM0016)

Cadet

Patron

Patron

Contributor

Patron

Contributor

Contributor

Hall of Famer

Explorer

Contributor

Contributor

Patron

Dabbler

Dabbler

Contributor

Contributor

Server Wrangler

Contributor

Contributor

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "LSI (Avago) 9207-8i with Seagate 10TB Enterprise (ST10000NM0016)"

Similar threads