Help with Alert for not capable of SMART self-check

Amsoil_Jim

Contributor
Joined
Feb 22, 2016
Messages
175
My system is set up for SMART test to run automatically and this morning i woke up the and alert stating
Device" /dev/da7 [SAT], not capable of SMART self-check.
This was in the log:
Code:
May 26 02:08:00 TrueNAS     (da7:mps0:0:24:0): READ(16). CDB: 88 00 00 00 00 03 5e c6 6a 00 00 00 00 40 00 00 length 32768 SMID 646 Command timeout on target 24(0x0013) 60000 set, 60.259423622 elapsed
May 26 02:08:00 TrueNAS mps0: Sending abort to target 24 for SMID 646
May 26 02:08:00 TrueNAS     (da7:mps0:0:24:0): READ(16). CDB: 88 00 00 00 00 03 5e c6 6a 00 00 00 00 40 00 00 length 32768 SMID 646 Aborting command 0xfffffe013f465410
May 26 02:08:00 TrueNAS     (da7:mps0:0:24:0): READ(16). CDB: 88 00 00 00 00 03 5e c6 69 c0 00 00 00 40 00 00 length 32768 SMID 1560 Command timeout on target 24(0x0013) 60000 set, 60.260409941 elapsed
May 26 02:08:00 TrueNAS     (da7:mps0:0:24:0): READ(16). CDB: 88 00 00 00 00 03 5e c6 69 80 00 00 00 40 00 00 length 32768 SMID 321 Command timeout on target 24(0x0013) 60000 set, 60.261060888 elapsed
May 26 02:08:01 TrueNAS mps0: Controller reported scsi ioc terminated tgt 24 SMID 1777 loginfo 31130000
May 26 02:08:01 TrueNAS mps0: Controller reported scsi ioc terminated tgt 24 SMID 262 loginfo 31130000
May 26 02:08:01 TrueNAS mps0: Controller reported scsi ioc terminated tgt 24 SMID 204 loginfo 31130000
May 26 02:08:01 TrueNAS mps0: Controller reported scsi ioc terminated tgt 24 SMID 1905 loginfo 31130000
May 26 02:08:01 TrueNAS mps0: Controller reported scsi ioc terminated tgt 24 SMID 1560 loginfo 31140000
May 26 02:08:01 TrueNAS mps0: Controller reported scsi ioc terminated tgt 24 SMID 321 loginfo 31140000
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): WRITE(16). CDB: 8a 00 00 00 00 01 a5 26 ab 48 00 00 00 18 00 00
May 26 02:08:01 TrueNAS mps0: Controller reported scsi ioc terminated tgt 24 SMID 323 loginfo 31140000
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): CAM status: CCB request completed with an error
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): Retrying command, 3 more tries remain
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): WRITE(16). CDB: 8a 00 00 00 00 01 a5 1c 9f a0 00 00 00 40 00 00
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): CAM status: CCB request completed with an error
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): Retrying command, 3 more tries remain
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): READ(16). CDB: 88 00 00 00 00 03 5e c6 69 c0 00 00 00 40 00 00
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): CAM status: CCB request completed with an error
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): Retrying command, 3 more tries remain
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): READ(16). CDB: 88 00 00 00 00 03 5e c6 69 80 00 00 00 40 00 00
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): CAM status: CCB request completed with an error
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): Retrying command, 3 more tries remain
May 26 02:08:01 TrueNAS mps0: Controller reported scsi ioc terminated tgt 24 SMID 551 loginfo 31140000
May 26 02:08:01 TrueNAS mps0: Controller reported scsi ioc terminated tgt 24 SMID 663 loginfo 31140000
May 26 02:08:01 TrueNAS mps0: Finished abort recovery for target 24
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): READ(16). CDB: 88 00 00 00 00 03 6d c0 09 20 00 00 00 40 00 00
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): CAM status: CCB request completed with an error
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): Retrying command, 3 more tries remain
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): READ(16). CDB: 88 00 00 00 00 03 5e c6 6a 00 00 00 00 40 00 00
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): CAM status: Command timeout
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): Retrying command, 3 more tries remain
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): READ(16). CDB: 88 00 00 00 00 03 6d c0 09 60 00 00 00 40 00 00
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): CAM status: CCB request completed with an error
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): Retrying command, 3 more tries remain
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): WRITE(16). CDB: 8a 00 00 00 00 02 38 f9 08 20 00 00 00 08 00 00
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): CAM status: CCB request completed with an error
May 26 02:08:01 TrueNAS (da7:mps0:0:24:0): Retrying command, 3 more tries remain

The hard drive is a WD 14 TB white label drive shucked from an Easystore.
I SSH'd into the machine and performed the short self test without issue. Is this something I should be worried about or is this some kind of bug?
 
Joined
Jun 2, 2019
Messages
591
@Amsoil_Jim

Your signature lists a RAID controller
Raid Controller: LSI 9211-4I HBA
Did you flash it into IT mode or are you using it as a HW RAID controller?

Also your X9DRI-LN4F+ motherboard, has a HW RAID controller
  • SATA: 2x SATA3 Ports, 8x SATA2 Ports, Supports RAID 0, 1, 5, 10
I've read posts of people getting SuperMicro to provide new firmware that will put the onboard controller into IT mode.

May not have been the best choice of mobo or controller

Some reading

 
Last edited:

Amsoil_Jim

Contributor
Joined
Feb 22, 2016
Messages
175
I believe it’s in IT mode, it’s been so long since I bought the server. I honestly can’t remember but I know I didn’t do it myself.
 
Joined
Jun 2, 2019
Messages
591
@Amsoil_Jim

1. Did you ever get root cause for same problem (different port) from March 19th?
2. Is this the same non enterprise 14TB drive "shucked" from an Easystore, just in a different bay?
3. Were there any heavy workloads at during the time of occurrence?
4. Have you looked at temps of the drives or the controllers?

There's one caveat. Keep your HBA cool. It is an embedded computer and throws off about 10 watts. Failure to have airflow directed over your HBA can cause overheat, and in extreme cases LSI HBA's have been found to vomit random bits all over, which isn't good for ZFS.
 
Last edited:

Amsoil_Jim

Contributor
Joined
Feb 22, 2016
Messages
175
@Amsoil_Jim

1. Did you ever get root cause for same problem (different port) from March 19th?
2. Is this the same non enterprise 14TB drive "shucked" from an Easystore, just in a different bay?
3. Were there any heavy workloads at during the time of occurrence?
4. Have you looked at temps of the drives or the controllers?
1. No, I completely forgot about that as I never had another issue until now.
2. I’ll have to check if it’s the same drive in the same bay when I get home later, I removed 6 older 3TB drives, but I didn’t move any.
3. At the time of this instance there shouldn’t have been any heavy work loads, only scheduled SMART test running on the 12 drives.
4. The 10TB drives in the system hover around 34 Celsius but the 14TB drive hover around 39 to 41 Celsius.

EDIT: I just checked and it the same drive in the same bay.
 
Joined
Jun 2, 2019
Messages
591
@Amsoil_Jim

I guess you have to isolate between the drive, carrier, backplane, or cables.
 
Top