- Joined
- Nov 25, 2013
- Messages
- 7,776
Hi all,
my installation is stock FreeBSD but since there are so many people here who know things about storage, I hope you will permit me to ask in this forum, too ;)
I just got the third SSD drive failing in the last couple of weeks. For the first two drives we just noticed this:
then replaced the supposedly failed drive, resilvered, everything OK.
Now with the third drive failing i dug a little deeper. First, SMART suggest the drive is perfectly healthy if I'm not mistaken:
So what's going on? This:
This is what the kernel has to tell me about the controller:
Any ideas?
Thanks,
Patrick
my installation is stock FreeBSD but since there are so many people here who know things about storage, I hope you will permit me to ask in this forum, too ;)
I just got the third SSD drive failing in the last couple of weeks. For the first two drives we just noticed this:
Code:
NAME STATE READ WRITE CKSUM zdata DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 da2p1 FAULTED 15 248 0 too many errors da3p1 ONLINE 0 0 0
then replaced the supposedly failed drive, resilvered, everything OK.
Now with the third drive failing i dug a little deeper. First, SMART suggest the drive is perfectly healthy if I'm not mistaken:
Code:
Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 3646 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 28 177 Wear_Leveling_Count 0x0013 099 099 000 Pre-fail Always - 10 179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0 181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0 182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0 183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0032 079 066 000 Old_age Always - 21 195 Hardware_ECC_Recovered 0x001a 200 200 000 Old_age Always - 0 199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0 235 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 0 241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 11929282916 SMART Error Log Version: 1 No Errors Logged
So what's going on? This:
Code:
(da2:mpr0:0:4:0): READ(10). CDB: 28 00 07 db d0 21 00 00 07 00 length 3584 SMID 197 Aborting command 0xfffffe000107ab30 mpr0: Sending reset from mprsas_send_abort for target ID 4 (da2:mpr0:0:4:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 length 512 SMID 883 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0 (da2:mpr0:0:4:0): WRITE(10). CDB: 2a 00 13 4c 65 7d 00 00 01 00 length 512 SMID 828 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0 (da2:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 919 terminated ioc 804b loginfo 311(da2:mpr0:0:4:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 30000 scsi 0 state c xfer 0 mpr0: Unfreezing devq for target ID 4 (da2:mpr0:0:4:0): CAM status: CCB request completed with an error (da2:mpr0:0:4:0): Retrying command (da2:mpr0:0:4:0): WRITE(10). CDB: 2a 00 13 4c 65 7d 00 00 01 00 (da2:mpr0:0:4:0): CAM status: CCB request completed with an error (da2:mpr0:0:4:0): Retrying command (da2:mpr0:0:4:0): READ(10). CDB: 28 00 07 db d0 21 00 00 07 00 (da2:mpr0:0:4:0): CAM status: Command timeout (da2:mpr0:0:4:0): Retrying command (da2:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da2:mpr0:0:4:0): CAM status: CCB request completed with an error (da2:mpr0:0:4:0): Retrying command (da2:mpr0:0:4:0): READ(10). CDB: 28 00 07 db d0 21 00 00 07 00 (da2:mpr0:0:4:0): CAM status: SCSI Status Error (da2:mpr0:0:4:0): SCSI status: Check Condition (da2:mpr0:0:4:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da2:mpr0:0:4:0): Retrying command (per sense data) (da2:mpr0:0:4:0): READ(10). CDB: 28 00 07 db c1 b5 00 00 07 00 length 3584 SMID 995 terminated ioc 804b loginfo 31110e03 scsi 0 state c xfer 0 (da2:mpr0:0:4:0): WRITE(10). CDB: 2a 00 27 01 0d a0 00 00 10 00 length 8192 SMID 348 terminated ioc 804b loginfo 31110e03 scs(da2:mpr0:0:4:0): READ(10). CDB: 28 00 07 db c1 b5 00 00 07 00 i 0 state c xfer 0 (da2:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 304 terminated ioc 804b loginfo 311(da2:mpr0:0:4:0): CAM status: CCB request completed with an error 10e03 scsi 0 state c xfer 0 (da2:mpr0:0:4:0): Retrying command (da2:mpr0:0:4:0): WRITE(10). CDB: 2a 00 27 01 0d a0 00 00 10 00 (da2:mpr0:0:4:0): CAM status: CCB request completed with an error (da2:mpr0:0:4:0): Retrying command (da2:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da2:mpr0:0:4:0): CAM status: CCB request completed with an error (da2:mpr0:0:4:0): Retrying command (da2:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da2:mpr0:0:4:0): CAM status: SCSI Status Error (da2:mpr0:0:4:0): SCSI status: Check Condition (da2:mpr0:0:4:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da2:mpr0:0:4:0): Error 6, Retries exhausted (da2:mpr0:0:4:0): Invalidating pack
This is what the kernel has to tell me about the controller:
Code:
mpr0: <Avago Technologies (LSI) SAS3008> port 0xe000-0xe0ff mem 0xdf240000-0xdf24ffff,0xdf200000-0xdf23ffff irq 16 at device 0.0 on pci1 mpr0: Firmware: 10.00.03.00, Driver: 15.03.00.00-fbsd mpr0: IOCCapabilities: 6985c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,IR,MSIXIndex,FastPath,RDPQArray> mpr0: SAS Address for SATA device = a6949033fdccc5a0 mpr0: SAS Address from SAS device page0 = 4433221100000000 mpr0: SAS Address from SATA device = a6949033fdccc5a0 mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x0009> enclosureHandle<0x0001> slot 0 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address for SATA device = a08d8d33fdccc5a1 mpr0: SAS Address from SAS device page0 = 4433221101000000 mpr0: SAS Address from SATA device = a08d8d33fdccc5a1 mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000a> enclosureHandle<0x0001> slot 1 uhub1: mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address for SATA device = a4b8b10cdbcbc695 mpr0: SAS Address from SAS device page0 = 4433221102000000 mpr0: SAS Address from SATA device = a4b8b10cdbcbc695 mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000b> enclosureHandle<0x0001> slot 2 mpr0: At enclosure level 0 and connector name ( ) mpr0: SAS Address for SATA device = a5b1a50cdbcbc695 mpr0: SAS Address from SAS device page0 = 4433221103000000 mpr0: SAS Address from SATA device = a5b1a50cdbcbc695 mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000c> enclosureHandle<0x0001> slot 3 mpr0: At enclosure level 0 and connector name ( )
Any ideas?
Thanks,
Patrick