H710 Mini D1, SAS drives OK, SATA drives loads of errors (DELL R720XD)

cfcaballero

Dabbler
Joined
Nov 26, 2017
Messages
45
I just got this used server, and followed the amazing cross-flashing instructions and tools on @fohdeesha 's website (thank you!) for this card. The vendor (local here in Thailand) included two original SAS drives, so I used those for a redundant boot pool. So far, so good.

But now that I am starting to test out the TrueNAS 12 install in earnest with 6 WD Red SATA drives installed in other bays, I am getting all kinds of read and write errors on that pool, getting marked as degraded.

Most interesting hint so far is that when I run a dd command to double-check which drive is which bay, the two SAS drives are fine, but the SATA drives throw an i/o error.
Code:
# dd bs=4M if=/dev/da2 of=/dev/null
dd: /dev/da2: Input/output error


Here are the console SCSI errors:
Code:
Apr  3 12:03:39 tn-llk21 mps0: Controller reported scsi ioc terminated tgt 11 SMID 495 loginfo 31080000
Apr  3 12:03:39 tn-llk21 mps0: Controller reported scsi ioc terminated tgt 11 SMID 1643 loginfo 31080000
Apr  3 12:03:39 tn-llk21 mps0: Controller reported scsi ioc terminated tgt 11 SMID 1077 loginfo 31080000
Apr  3 12:03:39 tn-llk21 mps0: Controller reported scsi ioc terminated tgt 11 SMID 1072 loginfo 31080000
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 00 ec 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: CCB request completed with an error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command, 3 more tries remain
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 00 ed 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: CCB request completed with an error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command, 3 more tries remain
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 00 ee 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: CCB request completed with an error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command, 3 more tries remain
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 00 ef 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: CCB request completed with an error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command, 3 more tries remain
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 00 eb 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: SCSI Status Error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI status: Check Condition
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command (per sense data)
Apr  3 12:03:39 tn-llk21 mps0: Controller reported scsi ioc terminated tgt 11 SMID 2086 loginfo 31080000
Apr  3 12:03:39 tn-llk21 mps0: Controller reported scsi ioc terminated tgt 11 SMID 613 loginfo 31080000
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 00 f3 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: CCB request completed with an error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command, 3 more tries remain
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 00 f4 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: CCB request completed with an error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command, 3 more tries remain
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 00 f2 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: SCSI Status Error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI status: Check Condition
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command (per sense data)
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 00 f7 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: SCSI Status Error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI status: Check Condition
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command (per sense data)
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 01 09 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: SCSI Status Error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI status: Check Condition
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command (per sense data)
Apr  3 12:03:39 tn-llk21 mps0: Controller reported scsi ioc terminated tgt 11 SMID 154 loginfo 31080000
Apr  3 12:03:39 tn-llk21 mps0: Controller reported scsi ioc terminated tgt 11 SMID 1361 loginfo 31080000
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 01 16 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: CCB request completed with an error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command, 3 more tries remain
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 01 17 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: CCB request completed with an error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command, 3 more tries remain
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 01 15 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: SCSI Status Error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI status: Check Condition
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command (per sense data)
Apr  3 12:03:39 tn-llk21 mps0: Controller reported scsi ioc terminated tgt 11 SMID 170 loginfo 31080000
Apr  3 12:03:39 tn-llk21 mps0: Controller reported scsi ioc terminated tgt 11 SMID 1806 loginfo 31080000
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 01 16 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: CCB request completed with an error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command, 2 more tries remain
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 01 17 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: CCB request completed with an error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command, 2 more tries remain
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 01 15 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: SCSI Status Error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI status: Check Condition
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command (per sense data)
Apr  3 12:03:39 tn-llk21 mps0: Controller reported scsi ioc terminated tgt 11 SMID 291 loginfo 31080000
Apr  3 12:03:39 tn-llk21 mps0: Controller reported scsi ioc terminated tgt 11 SMID 776 loginfo 31080000
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 01 16 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: CCB request completed with an error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command, 1 more tries remain
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 01 17 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: CCB request completed with an error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command, 1 more tries remain
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 01 15 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: SCSI Status Error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI status: Check Condition
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command (per sense data)
Apr  3 12:03:39 tn-llk21 mps0: Controller reported scsi ioc terminated tgt 11 SMID 1000 loginfo 31080000
Apr  3 12:03:39 tn-llk21 mps0: Controller reported scsi ioc terminated tgt 11 SMID 1486 loginfo 31080000
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 01 16 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: CCB request completed with an error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command, 0 more tries remain
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 01 17 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: CCB request completed with an error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command, 0 more tries remain
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 01 15 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: SCSI Status Error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI status: Check Condition
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Retrying command (per sense data)
Apr  3 12:03:39 tn-llk21 mps0: Controller reported scsi ioc terminated tgt 11 SMID 1396 loginfo 31080000
Apr  3 12:03:39 tn-llk21 mps0: Controller reported scsi ioc terminated tgt 11 SMID 1576 loginfo 31080000
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 01 16 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: CCB request completed with an error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Error 5, Retries exhausted
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 01 17 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: CCB request completed with an error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Error 5, Retries exhausted
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): READ(10). CDB: 28 00 00 01 15 00 00 01 00 00
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): CAM status: SCSI Status Error
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI status: Check Condition
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Apr  3 12:03:39 tn-llk21 (da2:mps0:0:11:0): Error 5, Retries exhausted

There are different SCSI errors for the SAS drives, but they don't result in any errors in pool status:
Code:
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): Retrying command (per sense data)
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): READ(10). CDB: 28 00 00 0c bb 00 00 01 00 00
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): CAM status: SCSI Status Error
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): SCSI status: Check Condition
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): Info: 0xcbb3b
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): Retrying command (per sense data)
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): READ(10). CDB: 28 00 00 0c cd 00 00 01 00 00
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): CAM status: SCSI Status Error
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): SCSI status: Check Condition
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): Info: 0xccd71
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): Retrying command (per sense data)
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): READ(10). CDB: 28 00 00 0d 01 00 00 01 00 00
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): CAM status: SCSI Status Error
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): SCSI status: Check Condition
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): Info: 0xd0145
Apr  3 12:03:30 tn-llk21 (da1:mps0:0:9:0): Retrying command (per sense data)

I've done a bunch of Google searches, and haven't turned up any reason (other than a bad card) why the SAS drives would be OK but the SATA ones not.

Appreciate any advice and/or insight, with many thanks in advance.
 

Attachments

  • H710-IT-properties.png
    H710-IT-properties.png
    16.6 KB · Views: 110
  • H710-IT-topology.png
    H710-IT-topology.png
    21.1 KB · Views: 134

cfcaballero

Dabbler
Joined
Nov 26, 2017
Messages
45
Despite my instinct that the different behavior for SATA and SAS drives likely indicated a config or driver problem, I kept researching other similar errors for similar controllers and servers. Accordingly, I tried re-seating all of the relevant cards and connectors, and to my very pleasant surprise, all seems to be error free now.

It reminds me of my teenage summer internship at New York City's public radio station, where I shadowed a venerable veteran engineer troubleshooting some noise on one of our audio circuits between studios. He simply yanked the amplifier in question out of the chassis and reheated some iffy looking solder joints. Problem solved.
 
Top