- Joined
- Dec 8, 2017
- Messages
- 442
I put together a pool of WD Blue SSDs (WDS200T2B0A) a few months ago, and for the first month or two they worked without any error. All of a sudden in the last month I'm getting CAM status errors on them about 1-2 times per week (seems to be after updating from 11.2-U6 to 11.2-U7). Usually they were recovering from the error without actually degrading the pool, but today one finally dropped out:
This has now happened on 4 of these drives. In searching around I saw there can be issues with RZAT and DRAT support behind LSI controllers (I have a 9305 running firmware 16.00.01.00). As far as I can tell these are the relevant fields for these drives:
I'm sure I would have been better off running enterprise SSDs, and if that's so, I can accept my mistake, but would like to get some verification that it's actually an issue with these SSDs.
The other area I've explored is cooling of the HBA. There are currently fans attached to the heatsink on the 9405 which makes it run relatively cool, as in cool enough to touch comfortable shortly after power on.
I have several MX500s and several ST4000VN000 spinning drives on the 9305 that have not had this issue.
Code:
Jan 5 03:08:20 nas smartd[23433]: Device: /dev/da9 [SAT], failed to read SMART Attribute Data Jan 5 03:08:20 nas (da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00 length 4096 SMID 316 Aborting command 0xfffffe0001568640 Jan 5 03:08:20 nas mpr0: Sending reset from mprsas_send_abort for target ID 9 Jan 5 03:08:20 nas (pass13:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 length 512 SMID 278 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0 Jan 5 03:08:20 nas (da9:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 length 512 SMID 842 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0 Jan 5 03:08:20 nas mpr0: (da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00 Jan 5 03:08:20 nas Unfreezing devq for target ID 9 Jan 5 03:08:20 nas (da9:mpr0:0:9:0): CAM status: Command timeout Jan 5 03:08:20 nas (da9:mpr0:0:9:0): Retrying command Jan 5 03:08:20 nas (da9:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 Jan 5 03:08:20 nas (da9:mpr0:0:9:0): CAM status: CCB request completed with an error Jan 5 03:08:20 nas (da9:mpr0:0:9:0): Retrying command Jan 5 03:08:21 nas (pass13:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d5 00 01 00 06 00 4f 00 c2 00 b0 00 length 512 SMID 850 terminated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0 Jan 5 03:08:21 nas (da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00 length 4096 SMID 592 terminated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0 Jan 5 03:08:21 nas (da9:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 length 512 SMID 792 term(da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00 Jan 5 03:08:21 nas inated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0 Jan 5 03:08:21 nas (da9:mpr0:0:9:0): CAM status: CCB request completed with an error Jan 5 03:08:21 nas (da9:mpr0:0:9:0): Retrying command Jan 5 03:08:21 nas (da9:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 Jan 5 03:08:21 nas (da9:mpr0:0:9:0): CAM status: CCB request completed with an error Jan 5 03:08:21 nas (da9:mpr0:0:9:0): Retrying command Jan 5 03:08:21 nas (pass13:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d5 00 01 00 01 00 4f 00 c2 00 b0 00 length 512 SMID 502 terminated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0 Jan 5 03:08:21 nas (da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00 length 4096 SMID 772 terminated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0 Jan 5 03:08:21 nas (da9:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 length 512 SMID 396 term(da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00 Jan 5 03:08:21 nas inated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0 Jan 5 03:08:21 nas (da9:mpr0:0:9:0): CAM status: CCB request completed with an error Jan 5 03:08:21 nas (da9:mpr0:0:9:0): Retrying command Jan 5 03:08:21 nas (da9:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 Jan 5 03:08:21 nas (da9:mpr0:0:9:0): CAM status: CCB request completed with an error Jan 5 03:08:21 nas (da9:mpr0:0:9:0): Retrying command Jan 5 03:08:22 nas (pass13:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 06 2c 00 00 00 00 00 00 00 00 00 00 00 e5 00 length 0 SMID 849 terminated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0 Jan 5 03:08:22 nas (da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00 length 4096 SMID 791 terminated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0 Jan 5 03:08:22 nas (da9:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 length 512 SMID 203 term(da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00 Jan 5 03:08:22 nas inated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0 Jan 5 03:08:22 nas (da9:mpr0:0:9:0): CAM status: CCB request completed with an error Jan 5 03:08:22 nas (da9:mpr0:0:9:0): Retrying command Jan 5 03:08:22 nas (da9:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 Jan 5 03:08:22 nas (da9:mpr0:0:9:0): CAM status: CCB request completed with an error Jan 5 03:08:22 nas (da9:mpr0:0:9:0): Retrying command Jan 5 03:08:23 nas (da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00 Jan 5 03:08:23 nas (da9:mpr0:0:9:0): CAM status: SCSI Status Error Jan 5 03:08:23 nas (da9:mpr0:0:9:0): SCSI status: Check Condition Jan 5 03:08:23 nas (da9:mpr0:0:9:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) Jan 5 03:08:23 nas (da9:mpr0:0:9:0): Error 6, Retries exhausted Jan 5 03:08:23 nas (da9:mpr0:0:9:0): Invalidating pack
This has now happened on 4 of these drives. In searching around I saw there can be issues with RZAT and DRAT support behind LSI controllers (I have a 9305 running firmware 16.00.01.00). As far as I can tell these are the relevant fields for these drives:
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks yes 8
DSM - deterministic read yes zeroed
Host Protected Area (HPA) no
I'm sure I would have been better off running enterprise SSDs, and if that's so, I can accept my mistake, but would like to get some verification that it's actually an issue with these SSDs.
The other area I've explored is cooling of the HBA. There are currently fans attached to the heatsink on the 9405 which makes it run relatively cool, as in cool enough to touch comfortable shortly after power on.
I have several MX500s and several ST4000VN000 spinning drives on the 9305 that have not had this issue.
Last edited: