Hi everyone.
I got an email from my freenas server and I'm not sure what is going on. Backstory below. Could this be a bad cable or 2? Bad HBA? The 12 drives this is happening on are the original 12 I built this array with 2 years ago. They're Seagate 6TB enterprise drives, all the same batch (I know).
serverfqdn.com kernel log messages:
> (da9:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 01 0d c3 8e a8 00 00 00 18 00 00 length 12288 SMID 95 Aborting command 0xfffffe00011bf890
> mpr0: Sending reset from mprsas_send_abort for target ID 9
> (da9:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 01 0d c3 88 b8 00 00 00 18 00 00 length 12288 SMID 115 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
> (da9:mpr0:0:9:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 981 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
> mpr0: Unfreezing devq for target ID 9
> (da9:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 01 0d c3 88 b8 00 00 00 18 00 00
> (da9:mpr0:0:9:0): CAM status: CCB request completed with an error
> (da9:mpr0:0:9:0): Retrying command
> (da9:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 01 0d c3 8e a8 00 00 00 18 00 00
> (da9:mpr0:0:9:0): CAM status: Command timeout
> (da9:mpr0:0:9:0): Retrying command
> (da9:mpr0:0:9:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
> (da9:mpr0:0:9:0): CAM status: CCB request completed with an error
> (da9:mpr0:0:9:0): Retrying command
> (da9:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 01 0d c3 8e a8 00 00 00 18 00 00
> (da9:mpr0:0:9:0): CAM status: SCSI Status Error
> (da9:mpr0:0:9:0): SCSI status: Check Condition
> (da9:mpr0:0:9:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da9:mpr0:0:9:0): Retrying command (per sense data)
> (da9:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 01 0a f5 25 20 00 00 00 40 00 00
> (da9:mpr0:0:9:0): CAM status: SCSI Status Error
> (da9:mpr0:0:9:0): SCSI status: Check Condition
> (da9:mpr0:0:9:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da9:mpr0:0:9:0): Retrying command (per sense data)
Last week I upgraded one of our 12 drive (striped 2x raidz2, i'll call them #1 and #2) freenas servers with a 9305-24i, appropriate cables (8643 to 8087) and an additional 12 drives, to fill up the whole array. Had a 9201-16i before (8087 to 8087). The chassis is a 24 port Norco with an 800w Athena Power dual PSU. We're on FreeNAS-11.1-U4.
The swap went great, everything worked, new drives were recognized and setup, and the new 9305 card updated to the latest firmware . 2 days later, one of the drives in the striped array, raidz2 #2 drive number 3, is getting a ton of read errors. I wasn't around for 2 more days so I couldn't swap it out until Friday. We have backups, so whatever. Friday comes along and another drive is having the same issue but kinda luckily in raidz2 #1 this time which ends up being drive #9, a completely separate row, cable and port on the card. I only had 2 spares so that worked out fine, drives rebuilt in a day and we're cooking again. The additional 12 drives are running flawlessly.
I got an email from my freenas server and I'm not sure what is going on. Backstory below. Could this be a bad cable or 2? Bad HBA? The 12 drives this is happening on are the original 12 I built this array with 2 years ago. They're Seagate 6TB enterprise drives, all the same batch (I know).
serverfqdn.com kernel log messages:
> (da9:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 01 0d c3 8e a8 00 00 00 18 00 00 length 12288 SMID 95 Aborting command 0xfffffe00011bf890
> mpr0: Sending reset from mprsas_send_abort for target ID 9
> (da9:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 01 0d c3 88 b8 00 00 00 18 00 00 length 12288 SMID 115 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
> (da9:mpr0:0:9:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 981 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
> mpr0: Unfreezing devq for target ID 9
> (da9:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 01 0d c3 88 b8 00 00 00 18 00 00
> (da9:mpr0:0:9:0): CAM status: CCB request completed with an error
> (da9:mpr0:0:9:0): Retrying command
> (da9:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 01 0d c3 8e a8 00 00 00 18 00 00
> (da9:mpr0:0:9:0): CAM status: Command timeout
> (da9:mpr0:0:9:0): Retrying command
> (da9:mpr0:0:9:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
> (da9:mpr0:0:9:0): CAM status: CCB request completed with an error
> (da9:mpr0:0:9:0): Retrying command
> (da9:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 01 0d c3 8e a8 00 00 00 18 00 00
> (da9:mpr0:0:9:0): CAM status: SCSI Status Error
> (da9:mpr0:0:9:0): SCSI status: Check Condition
> (da9:mpr0:0:9:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da9:mpr0:0:9:0): Retrying command (per sense data)
> (da9:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 01 0a f5 25 20 00 00 00 40 00 00
> (da9:mpr0:0:9:0): CAM status: SCSI Status Error
> (da9:mpr0:0:9:0): SCSI status: Check Condition
> (da9:mpr0:0:9:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da9:mpr0:0:9:0): Retrying command (per sense data)
Last week I upgraded one of our 12 drive (striped 2x raidz2, i'll call them #1 and #2) freenas servers with a 9305-24i, appropriate cables (8643 to 8087) and an additional 12 drives, to fill up the whole array. Had a 9201-16i before (8087 to 8087). The chassis is a 24 port Norco with an 800w Athena Power dual PSU. We're on FreeNAS-11.1-U4.
The swap went great, everything worked, new drives were recognized and setup, and the new 9305 card updated to the latest firmware . 2 days later, one of the drives in the striped array, raidz2 #2 drive number 3, is getting a ton of read errors. I wasn't around for 2 more days so I couldn't swap it out until Friday. We have backups, so whatever. Friday comes along and another drive is having the same issue but kinda luckily in raidz2 #1 this time which ends up being drive #9, a completely separate row, cable and port on the card. I only had 2 spares so that worked out fine, drives rebuilt in a day and we're cooking again. The additional 12 drives are running flawlessly.
Last edited: