CAM Timeout and Other Scary Messages in Logs

Status
Not open for further replies.

ewhac

Contributor
Joined
Aug 20, 2013
Messages
177
This is a brand new machine, built around a Supermicro X11SSM-F-O. There's an SSD in SATA port 0, and six 4TB Hitachi drives in ports 2-7. I'm still doing setup on the system, so there's no data to lose (yet). The machine was more or less idle when these messages appeared (I was off doing other things). There are no contemporaneous SATA errors appearing in the SMART logs.

After this block of messages, no further such messages appeared. I don't know if a scrub happened to be running, but I doubt it; I've scheduled those to run at 05:00.

If you said "Bad cable," I would believe you. The cables that came with this chassis are... weird. However, no such messages appeared over the 48 hours when I was doing the badblocks run on the drives. So that this would show up now seems surprising.

Code:
Apr 20 21:09:18 freenas ahcich6: Timeout on slot 9 port 0
Apr 20 21:09:18 freenas ahcich6: is 00000000 cs 00000200 ss 00000000 rs 00000200 tfd d0 serr 00000000 cmd 0004c917
Apr 20 21:09:18 freenas (ada5:ahcich6:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
Apr 20 21:09:18 freenas (ada5:ahcich6:0:0:0): CAM status: Command timeout
Apr 20 21:09:18 freenas (ada5:ahcich6:0:0:0): Retrying command
Apr 20 21:09:51 freenas ahcich6: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Apr 20 21:10:21 freenas ahcich6: Timeout on slot 10 port 0
Apr 20 21:10:21 freenas ahcich6: is 00000000 cs 00000400 ss 00000000 rs 00000400 tfd 80 serr 00000000 cmd 0004ca17
Apr 20 21:10:21 freenas (aprobe0:ahcich6:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Apr 20 21:10:21 freenas (aprobe0:ahcich6:0:0:0): CAM status: Command timeout
Apr 20 21:10:21 freenas (aprobe0:ahcich6:0:0:0): Retrying command
Apr 20 21:10:54 freenas ahcich6: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Apr 20 21:11:24 freenas ahcich6: Timeout on slot 11 port 0
Apr 20 21:11:24 freenas ahcich6: is 00000000 cs 00000000 ss 00000000 rs 00000800 tfd 150 serr 00000000 cmd 0004cb17
Apr 20 21:11:24 freenas (aprobe0:ahcich6:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Apr 20 21:11:24 freenas (aprobe0:ahcich6:0:0:0): CAM status: Command timeout
Apr 20 21:11:24 freenas (aprobe0:ahcich6:0:0:0): Error 5, Retries exhausted
Apr 20 21:12:07 freenas ahcich6: Timeout on slot 2 port 0
Apr 20 21:12:07 freenas ahcich6: is 00000000 cs 00000004 ss 00000000 rs 00000004 tfd d0 serr 00000000 cmd 0004c217
Apr 20 21:12:07 freenas (ada5:ahcich6:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
Apr 20 21:12:07 freenas (ada5:ahcich6:0:0:0): CAM status: Command timeout
Apr 20 21:12:07 freenas (ada5:ahcich6:0:0:0): Retrying command
Apr 20 21:12:40 freenas ahcich6: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Apr 20 21:13:10 freenas ahcich6: Timeout on slot 3 port 0
Apr 20 21:13:10 freenas ahcich6: is 00000000 cs 00000008 ss 00000000 rs 00000008 tfd 80 serr 00000000 cmd 0004c317
Apr 20 21:13:10 freenas (aprobe0:ahcich6:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Apr 20 21:13:10 freenas (aprobe0:ahcich6:0:0:0): CAM status: Command timeout
Apr 20 21:13:10 freenas (aprobe0:ahcich6:0:0:0): Retrying command
Apr 20 21:13:43 freenas ahcich6: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Apr 20 21:14:13 freenas ahcich6: Timeout on slot 4 port 0
Apr 20 21:14:13 freenas ahcich6: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 00000000 cmd 0004c417
Apr 20 21:14:13 freenas (aprobe0:ahcich6:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Apr 20 21:14:13 freenas (aprobe0:ahcich6:0:0:0): CAM status: Command timeout
Apr 20 21:14:13 freenas (aprobe0:ahcich6:0:0:0): Error 5, Retries exhausted
Apr 20 21:14:46 freenas ahcich6: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Apr 20 21:15:15 freenas ahcich6: Timeout on slot 5 port 0
Apr 20 21:15:15 freenas ahcich6: is 00000000 cs 00000020 ss 00000000 rs 00000020 tfd 80 serr 00000000 cmd 0004c517
Apr 20 21:15:15 freenas (aprobe0:ahcich6:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Apr 20 21:15:15 freenas (aprobe0:ahcich6:0:0:0): CAM status: Command timeout
Apr 20 21:15:15 freenas (aprobe0:ahcich6:0:0:0): Error 5, Retry was blocked
Apr 20 21:15:15 freenas ada5 at ahcich6 bus 0 scbus6 target 0 lun 0
Apr 20 21:15:15 freenas ada5: <Hitachi HDS5C4040ALE630 MPAOA3B0> s/n PL1331LAGBEZXH detached
Apr 20 21:15:49 freenas ahcich6: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Apr 20 21:16:19 freenas ahcich6: Timeout on slot 6 port 0
Apr 20 21:16:19 freenas ahcich6: is 00000000 cs 00000000 ss 00000000 rs 00000040 tfd 150 serr 00000000 cmd 0004c617
Apr 20 21:16:19 freenas (aprobe0:ahcich6:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Apr 20 21:16:19 freenas (aprobe0:ahcich6:0:0:0): CAM status: Command timeout
Apr 20 21:16:19 freenas (aprobe0:ahcich6:0:0:0): Retrying command
Apr 20 21:16:19 freenas GEOM_MIRROR: Device swap0: provider ada5p1 disconnected.
Apr 20 21:16:19 freenas ZFS: vdev state changed, pool_guid=16884738026142969859 vdev_guid=7149559998634639704
Apr 20 21:16:19 freenas (ada5:ahcich6:0:0:0): Periph destroyed
Apr 20 21:16:19 freenas ada5 at ahcich6 bus 0 scbus6 target 0 lun 0
Apr 20 21:16:19 freenas ada5: <Hitachi HDS5C4040ALE630 MPAOA3B0> ATA8-ACS SATA 3.x device
Apr 20 21:16:19 freenas ada5: Serial Number [[XXXXX]]
Apr 20 21:16:19 freenas ada5: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
Apr 20 21:16:19 freenas ada5: Command Queueing enabled
Apr 20 21:16:19 freenas ada5: 3815447MB (7814037168 512 byte sectors)
Apr 20 21:16:19 freenas ZFS: vdev state changed, pool_guid=16884738026142969859 vdev_guid=7149559998634639704
Apr 20 21:16:20 freenas ZFS: vdev state changed, pool_guid=16884738026142969859 vdev_guid=644480861065049838
Apr 20 21:16:20 freenas ZFS: vdev state changed, pool_guid=16884738026142969859 vdev_guid=2312669421406435181
Apr 20 21:16:20 freenas ZFS: vdev state changed, pool_guid=16884738026142969859 vdev_guid=15714282532111298633
Apr 20 21:16:20 freenas ZFS: vdev state changed, pool_guid=16884738026142969859 vdev_guid=9401609431398183874
Apr 20 21:16:20 freenas ZFS: vdev state changed, pool_guid=16884738026142969859 vdev_guid=7149559998634639704
Apr 20 21:16:20 freenas ZFS: vdev state changed, pool_guid=16884738026142969859 vdev_guid=14006271445869983022
Apr 20 21:16:50 freenas ahcich6: Timeout on slot 4 port 0
Apr 20 21:16:50 freenas ahcich6: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd d0 serr 00000000 cmd 0004c417
Apr 20 21:16:50 freenas (ada5:ahcich6:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
Apr 20 21:16:50 freenas (ada5:ahcich6:0:0:0): CAM status: Command timeout
Apr 20 21:16:50 freenas (ada5:ahcich6:0:0:0): Retrying command
Apr 20 21:17:23 freenas ahcich6: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Apr 20 21:17:53 freenas ahcich6: Timeout on slot 5 port 0
Apr 20 21:17:53 freenas ahcich6: is 00000000 cs 00000020 ss 00000000 rs 00000020 tfd 80 serr 00000000 cmd 0004c517
Apr 20 21:17:53 freenas (aprobe0:ahcich6:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Apr 20 21:17:53 freenas (aprobe0:ahcich6:0:0:0): CAM status: Command timeout
Apr 20 21:17:53 freenas (aprobe0:ahcich6:0:0:0): Retrying command
Apr 20 21:18:26 freenas ahcich6: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Apr 20 21:18:56 freenas ahcich6: Timeout on slot 6 port 0
Apr 20 21:18:56 freenas ahcich6: is 00000000 cs 00000040 ss 00000000 rs 00000040 tfd 80 serr 00000000 cmd 0004c617
Apr 20 21:18:56 freenas (aprobe0:ahcich6:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Apr 20 21:18:56 freenas (aprobe0:ahcich6:0:0:0): CAM status: Command timeout
Apr 20 21:18:56 freenas (aprobe0:ahcich6:0:0:0): Error 5, Retries exhausted
Apr 20 21:19:29 freenas ahcich6: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Apr 20 21:19:59 freenas ahcich6: Timeout on slot 7 port 0
Apr 20 21:19:59 freenas ahcich6: is 00000000 cs 00000000 ss 00000000 rs 00000080 tfd 150 serr 00000000 cmd 0004c717
Apr 20 21:19:59 freenas (aprobe0:ahcich6:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Apr 20 21:19:59 freenas (aprobe0:ahcich6:0:0:0): CAM status: Command timeout
Apr 20 21:19:59 freenas (aprobe0:ahcich6:0:0:0): Error 5, Retry was blocked
Apr 20 21:19:59 freenas ada5 at ahcich6 bus 0 scbus6 target 0 lun 0
Apr 20 21:19:59 freenas ada5: <Hitachi HDS5C4040ALE630 MPAOA3B0> s/n PL1331LAGBEZXH detached
Apr 20 21:20:00 freenas ZFS: vdev state changed, pool_guid=16884738026142969859 vdev_guid=7149559998634639704
Apr 20 21:20:00 freenas (ada5:ahcich6:0:0:0): Periph destroyed
Apr 20 21:20:00 freenas ada5 at ahcich6 bus 0 scbus6 target 0 lun 0
Apr 20 21:20:00 freenas ada5: <Hitachi HDS5C4040ALE630 MPAOA3B0> ATA8-ACS SATA 3.x device
Apr 20 21:20:00 freenas ada5: Serial Number [[XXXXX]]
Apr 20 21:20:00 freenas ada5: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
Apr 20 21:20:00 freenas ada5: Command Queueing enabled
Apr 20 21:20:00 freenas ada5: 3815447MB (7814037168 512 byte sectors)
Apr 20 21:20:01 freenas ZFS: vdev state changed, pool_guid=16884738026142969859 vdev_guid=644480861065049838
Apr 20 21:20:01 freenas ZFS: vdev state changed, pool_guid=16884738026142969859 vdev_guid=2312669421406435181
Apr 20 21:20:01 freenas ZFS: vdev state changed, pool_guid=16884738026142969859 vdev_guid=15714282532111298633
Apr 20 21:20:01 freenas ZFS: vdev state changed, pool_guid=16884738026142969859 vdev_guid=9401609431398183874
Apr 20 21:20:01 freenas ZFS: vdev state changed, pool_guid=16884738026142969859 vdev_guid=7149559998634639704
Apr 20 21:20:01 freenas ZFS: vdev state changed, pool_guid=16884738026142969859 vdev_guid=14006271445869983022
Apr 20 21:20:31 freenas ahcich6: Timeout on slot 20 port 0
Apr 20 21:20:31 freenas ahcich6: is 00000000 cs 00100000 ss 00000000 rs 00100000 tfd d0 serr 00000000 cmd 0004d417
Apr 20 21:20:31 freenas (ada5:ahcich6:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
Apr 20 21:20:31 freenas (ada5:ahcich6:0:0:0): CAM status: Command timeout
Apr 20 21:20:31 freenas (ada5:ahcich6:0:0:0): Retrying command
Apr 20 21:21:04 freenas ahcich6: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Apr 20 21:21:34 freenas ahcich6: Timeout on slot 21 port 0
Apr 20 21:21:34 freenas ahcich6: is 00000000 cs 00000000 ss 00000000 rs 00200000 tfd 150 serr 00000000 cmd 0004d517
Apr 20 21:21:34 freenas (aprobe0:ahcich6:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Apr 20 21:21:34 freenas (aprobe0:ahcich6:0:0:0): CAM status: Command timeout
Apr 20 21:21:34 freenas (aprobe0:ahcich6:0:0:0): Retrying command
Apr 20 21:23:24 freenas ahcich6: Timeout on slot 27 port 0
Apr 20 21:23:24 freenas ahcich6: is 00000000 cs 08000000 ss 00000000 rs 08000000 tfd d0 serr 00000000 cmd 0004db17
Apr 20 21:23:24 freenas (ada5:ahcich6:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
Apr 20 21:23:24 freenas (ada5:ahcich6:0:0:0): CAM status: Command timeout
Apr 20 21:23:24 freenas (ada5:ahcich6:0:0:0): Retrying command
Apr 20 21:23:57 freenas ahcich6: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Apr 20 21:24:27 freenas ahcich6: Timeout on slot 28 port 0
Apr 20 21:24:27 freenas ahcich6: is 00000000 cs 00000000 ss 00000000 rs 10000000 tfd 150 serr 00000000 cmd 0004dc17
Apr 20 21:24:27 freenas (aprobe0:ahcich6:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Apr 20 21:24:27 freenas (aprobe0:ahcich6:0:0:0): CAM status: Command timeout
Apr 20 21:24:27 freenas (aprobe0:ahcich6:0:0:0): Retrying command
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Do you have another cable/set of cables to test?
 

ewhac

Contributor
Joined
Aug 20, 2013
Messages
177

ewhac

Contributor
Joined
Aug 20, 2013
Messages
177
I do, and in fact will be experimenting with that shortly.
All SATA cables have been replaced. Will report if there's any further strangeness.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
If the cables don't help, try a cheap HBA like LSI 9211-8i with a set of breakout cables. Should run about $60 for the card and cables if you can find a good deal and opens up things like SAS and gives you even more ports.
Im sure others will have better advice here but it a thought.
 

sfcredfox

Patron
Joined
Aug 26, 2014
Messages
340
Not sure it will help you much, but I had the same thing happen when I started adding drives to one of my enclosures.

At the time, I had a fully loaded SuperMicro SAS2-EL1 backplane (12 drives in it, 4 SSD and 8 2TB SATA) connected to an IBM M1015 HBA. I also had two HP enclosures (MSA70 loaded with 25 SAS drives, and D2700 loaded with 12 SAS drives connected to an HP 221 HBA (which is really an LSI 9207-8e/9205-83).

Everything was perfect, the system running like that for years. I connected another SuperMicro backplane SAS1-EL1 to the other port of the M1015, everything perfect. I added four 2TB SATA drives, system recognized them, everything perfect.

I start adding more drives to that enclosure and everything went to shit. Started seeing similar messages. I rebooted the system and it wouldn't boot. It got stuck during the drive discovery during boot. So, as I calmly and with a cool head began to panic and loose my freaking mind, I decided to power the system entirely off and power off the new enclosure. I removed all the drives, powered the enclosure back up, powered the system back up, and it booted fine like normal. So this time, I added the drives one at a time, much slower to see if it was a bad drive or port on the backplane. I got through all of them, but one. I thought I had determined it was a bad drive, though I've never seen a bad drive cause those errors, but when I added that drive later to ensure it was bad and get some smart data to RMA it, it was discovered and was fine. Weird.

I was not able to determine the cause. The system has been rebooted many times, all the drives tortured before I created a pool with them, and now it's totally happy. Not sure what happened.

Long story, but maybe try adding the drives one by one and see if you get similar behavior after you are done with cabling?
 

ewhac

Contributor
Joined
Aug 20, 2013
Messages
177
Since changing out the cables, there have been no further such messages in the logs. Indeed, it has been somewhat unnervingly quiet.
 
Status
Not open for further replies.
Top