New Seagate Exos 8TB kicked from Pool

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
UPDATED: I returned the drive. Either the drive is bad or Exos operates differently with certain setups. Even after flashing latest firmware, it still threw out cam errors. I don't believe it is an issue with my system as no other drives cause problems and putting the old drive back in the same place doesn't throw any errors either. Will just keep going with WD schucked.


I have recently been upgrading my pool with 8TB drives. I normally have been schucking 8TB elements/easy stores, but I saw an 8TB exos on sale new and figured I would give it a try. I tested the drive first by doing a smart short, long, conveyance, and bad blocks testing. Everything came back clean with no errors. I replaced a disk in my pool with it and resilvered and everything was good.

About 4-5 days later, the drive drops out of the pool with a zpool status showing too many errors. I didn't see anything immediately wrong, so I resilvered and pushed it back in. I started looking at the logs and I am seeing issues from earlier below
Code:
Mar  1 05:26:05 infinity (da3:mps0:0:11:0): Retrying command (per sense data)
Mar  1 05:26:05 infinity mps0: Controller reported scsi ioc terminated tgt 11 SMID 1136 loginfo 31080000
Mar  1 05:26:05 infinity (da3:mps0:0:11:0): READ(10). CDB: 28 00 2f 4c 30 50 00 01 00 00
Mar  1 05:26:05 infinity (da3:mps0:0:11:0): CAM status: CCB request completed with an error
Mar  1 05:26:05 infinity (da3:mps0:0:11:0): Retrying command, 3 more tries remain
Mar  1 05:26:05 infinity (da3:mps0:0:11:0): READ(10). CDB: 28 00 2f 4c 2f 50 00 01 00 00
Mar  1 05:26:05 infinity (da3:mps0:0:11:0): CAM status: SCSI Status Error
Mar  1 05:26:05 infinity (da3:mps0:0:11:0): SCSI status: Check Condition
Mar  1 05:26:05 infinity (da3:mps0:0:11:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Mar  1 05:26:05 infinity (da3:mps0:0:11:0): Retrying command (per sense data)
Mar  1 05:26:16 infinity mps0: Controller reported scsi ioc terminated tgt 11 SMID 892 loginfo 31080000


I started looking into it and I am not sure what is going on. Was thinking maybe an adapter problem, but this only started happening when I put in this new drive. I have 5 other 8TB drives from WD and they run fine with no errors or problems. I ran into maybe this being a firmware issue with the card, but I am not sure what card I have. The eBay listing says it's an IBM 9200-8i, but I can't find any reference to it anywhere. I ran mpsutil show all
Code:
Adapter:
mps0 Adapter:
       Board Name: IBM 6Gb SSD HBA
   Board Assembly: H3-25113-01D
        Chip Name: LSISAS2008
    Chip Revision: ALL:
    BIOS Revision: 7.05.06.00
Firmware Revision: 5.30.02.00
  Integrated RAID: no
         SATA NCQ: ENABLED
 PCIe Width/Speed: x8 (5.0 GB/sec)
        IOC Speed: Full
      Temperature: Unknown/Unsupported

PhyNum  CtlrHandle  DevHandle  Disabled  Speed   Min    Max    Device
0       0001        0009       N         6.0     1.5    6.0    SAS Initiator
1       0001        0009       N         6.0     1.5    6.0    SAS Initiator
2       0001        0009       N         6.0     1.5    6.0    SAS Initiator
3       0001        0009       N         6.0     1.5    6.0    SAS Initiator
4       0001        0009       N         6.0     1.5    6.0    SAS Initiator
5       0001        0009       N         6.0     1.5    6.0    SAS Initiator
6       0001        0009       N         6.0     1.5    6.0    SAS Initiator
7       0001        0009       N         6.0     1.5    6.0    SAS Initiator

Devices:
B____T    SAS Address      Handle  Parent    Device        Speed Enc  Slot  Wdt
          5005076028ccb790 0009    0001      SMP Target    6.0   0002 00    8
          5005076028ccb799 000a    0009      SATA Target   6.0   0002 255   1
          5005076028ccb79a 000b    0009      SATA Target   6.0   0002 255   1
          5005076028ccb79b 000c    0009      SATA Target   6.0   0002 255   1
          5005076028ccb79c 000d    0009      SATA Target   6.0   0002 255   1
          5005076028ccb79d 000e    0009      SATA Target   6.0   0002 255   1
          5005076028ccb79e 000f    0009      SATA Target   6.0   0002 255   1
          5005076028ccb79f 0010    0009      SATA Target   6.0   0002 255   1
          5005076028ccb7a0 0011    0009      SATA Target   6.0   0002 255   1
          5005076028ccb7a1 0012    0009      SATA Target   6.0   0002 255   1
          5005076028ccb7a2 0013    0009      SATA Target   6.0   0002 255   1
          5005076028ccb7a3 0014    0009      SATA Target   6.0   0002 255   1
          5005076028ccb7a4 0015    0009      SATA Target   6.0   0002 255   1
          5005076028ccb7b7 0016    0009      SEP Target    6.0   0002 00    1

Enclosures:
Slots      Logical ID     SEPHandle  EncHandle    Type
  08    500605b0024b8860               0001     Direct Attached SGPIO
  39    5005076028ccb790    0016       0002     External SES-2

Expanders:
NumPhys   SAS Address     DevHandle   Parent  EncHandle  SAS Level
  39    5005076028ccb790    0009       0001     0002       1

     Phy  RemotePhy  DevHandle  Speed  Min   Max    Device
     00      03        0001      6.0   1.5   6.0   SAS Initiator
     01      02        0001      6.0   1.5   6.0   SAS Initiator
     02      01        0001      6.0   1.5   6.0   SAS Initiator
     03      00        0001      6.0   1.5   6.0   SAS Initiator
     04      07        0001      6.0   1.5   6.0   SAS Initiator
     05      06        0001      6.0   1.5   6.0   SAS Initiator
     06      05        0001      6.0   1.5   6.0   SAS Initiator
     07      04        0001      6.0   1.5   6.0   SAS Initiator
     08      00        000a      6.0   1.5   6.0   SATA Target
     09      00        000b      6.0   1.5   6.0   SATA Target
     10      00        000c      6.0   1.5   6.0   SATA Target
     11      00        000d      6.0   1.5   6.0   SATA Target
     12      00        000e      6.0   1.5   6.0   SATA Target
     13      00        000f      6.0   1.5   6.0   SATA Target
     14      00        0010      6.0   1.5   6.0   SATA Target
     15      00        0011      6.0   1.5   6.0   SATA Target
     16      00        0012      6.0   1.5   6.0   SATA Target
     17      00        0013      6.0   1.5   6.0   SATA Target
     18      00        0014      6.0   1.5   6.0   SATA Target
     19      00        0015      6.0   1.5   6.0   SATA Target
     20                                1.5   6.0   No Device
     21                                1.5   6.0   No Device
     22                                1.5   6.0   No Device
     23                                1.5   6.0   No Device
     24                                1.5   6.0   No Device
     25                                1.5   6.0   No Device
     26                                1.5   6.0   No Device
     27                                1.5   6.0   No Device
     28                                1.5   6.0   No Device
     29                                1.5   6.0   No Device
     30                                1.5   6.0   No Device
     31                                1.5   6.0   No Device
     32                                1.5   6.0   No Device
     33                                1.5   6.0   No Device
     34                                1.5   6.0   No Device
     35                                1.5   6.0   No Device
     36                                1.5   6.0   No Device
     37                                1.5   6.0   No Device
     38      00        0016      6.0   6.0   6.0   SEP Target


Even if I could find model, I am not sure if I could find firmware. I do have a weirder setup in the sense that I have this HBA connected to a SAS expander card to connect 12 drives. Not sure if its a combo there and I just need to get two new HBAs that are standard LSI brand instead of OEM and go from there or what.

As soon as I perform a scrub, errors occur and I start getting the errors that I see above.

Anyone have any thoughts or ideas?
 
Last edited:

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
Initiated a new scrub and alot of cam errors, but no zpool errors and scrub completed successfully.
 

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
Initiated another scrub and a lot of cam errors and it was kicked out of the pool again.
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
Did you "burn in" the new drive? Drives tend to die of old age, abuse, or in infancy... Run a SMART test against it. You may just need to RMA it back to Seagate.
 

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
Did you "burn in" the new drive? Drives tend to die of old age, abuse, or in infancy... Run a SMART test against it. You may just need to RMA it back to Seagate.
Yes, burned in drives with smart short, conveyance, long, and a bad blocks test. I have been trying to do smart longs, but they keep getting interrupted. I believe this is because a scrub interrupts the test. Initiating one now, but I think it will pass.

I am going to cross flash the firmware on the HBA and update it to the latest, if it is still having issues after that, then I will probably return it to Microcenter and just keep going WD schucked as they are reliable. I made sure this drive was CMR, but it might be doing something weird under the hood.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
a SAS expander is in no way whatsoever
a weirder setup
your firmware is like 12 versions out of date. an exos drive is fairly high end, anything "Weird under the hood" would only be because it's a DOA drive or something in your setup is FUBAR.
I would be rather surprised that you are getting anything but SMR drives by shucking them. I have never understood the obsession with getting crappy shucked drives, i am quite sure i have spent less on drives by just getting drives that arent cheap crap :/

ebay is flooded with fake LSI cards. can you link the item, or give some pictures of the card itself? when I first was getting LSI cards, i got some fake cards myself, and they eventually just stopped working correctly. I sent them to @jgreco who has been putting off examining them for awhile now. he has at least one article on how to try and avoid the fakes.
 

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
a SAS expander is in no way whatsoever

your firmware is like 12 versions out of date. an exos drive is fairly high end, anything "Weird under the hood" would only be because it's a DOA drive or something in your setup is FUBAR.
I would be rather surprised that you are getting anything but SMR drives by shucking them. I have never understood the obsession with getting crappy shucked drives, i am quite sure i have spent less on drives by just getting drives that arent cheap crap :/

ebay is flooded with fake LSI cards. can you link the item, or give some pictures of the card itself? when I first was getting LSI cards, i got some fake cards myself, and they eventually just stopped working correctly. I sent them to @jgreco who has been putting off examining them for awhile now. he has at least one article on how to try and avoid the fakes.

I buy 8tb+ external WDs. As far as I can tell from my research, all WD drives 8tb and above use CMR, so I should be safe with schucked 8tb. I have never had any problems with any of the schucked drives.

As for HBA, the model information is in the mpsutil command above. The eBay listing said it was an IBM 9200-8i. It's still running IBM firmware, but in IT mode. I am going to flash it to LSI and P20 to see if that helps today.
 

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
UPDATE: I cross flashed my HBA card to LSI P20 latest firmware, I performed a scrub, but CAM errors still kept getting kicked out.

I performed a smart long test and the drive came back good. Either Seagate is doing some weird stuff with their enterprise drives causing them to be finicky in certain setups and ZFS, or I have a bad drive that I can't really prove.
 
Last edited:

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
I ended up returning the drive and will go back to WD schucks. Either something is wrong with the drive, or how 8TB exos communicate with my setup. I can't prove it's the drive, but don't think it's my setup because I swap an older 2tb seagate drive back in and no errors.
 
Top