Timeout on SATA DOM

Status
Not open for further replies.

jixam

Dabbler
Joined
May 1, 2015
Messages
47
I have a Supermicro Ultra Server (SYS-6028U-TNR4T+) with mirrored SATA DOMs (SSD-DM064-PHI) for the boot filesystem.

A few months ago it started to drop a boot drive every couple of weeks. Once I reboot, the drive shows up again and works as before. It has now happened on both DOM ports and with three different disks.

The server has been running FreeNAS flawlessly for about a year. When this issue happened for the first time it had been 6 months since the last FreeNAS update (to 9.10). Thus, I suspect that this is some kind of hardware problem but I am not sure where to look next, so any idea is welcome.

To ease the pain when this happens, is there a command that I can use to safely get the disk visible/resilvered, without having to do a reboot?


ahcich9: Timeout on slot 6 port 0
ahcich9: is 00000000 cs 00000000 ss 00000040 rs 00000040 tfd 40 serr 00000000 cmd 0004c617
(ada1:ahcich9:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 0e 97 71 cd 40 01 00 00 00 00 00
(ada1:ahcich9:0:0:0): CAM status: Command timeout
(ada1:ahcich9:0:0:0): Retrying command
ahcich9: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich9: Timeout on slot 7 port 0
ahcich9: is 00000000 cs 00000080 ss 00000000 rs 00000080 tfd 80 serr 00000000 cmd 0004c717
(aprobe0:ahcich9:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich9:0:0:0): CAM status: Command timeout
(aprobe0:ahcich9:0:0:0): Retrying command
ahcich9: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich9: Timeout on slot 8 port 0
ahcich9: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd 80 serr 00000000 cmd 0004c817
(aprobe0:ahcich9:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich9:0:0:0): CAM status: Command timeout
(aprobe0:ahcich9:0:0:0): Error 5, Retries exhausted
ahcich9: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich9: Timeout on slot 9 port 0
ahcich9: is 00000000 cs 00000200 ss 00000000 rs 00000200 tfd 80 serr 00000000 cmd 0004c917
(aprobe0:ahcich9:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich9:0:0:0): CAM status: Command timeout
(aprobe0:ahcich9:0:0:0): Error 5, Retry was blocked
ada1 at ahcich9 bus 0 scbus9 target 0 lun 0
ada1: <SATA SSD S9FM02.1> s/n [...] detached
ahcich9: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich9: Timeout on slot 10 port 0
ahcich9: is 00000000 cs 00000400 ss 00000000 rs 00000400 tfd 80 serr 00000000 cmd 0004ca17
(aprobe0:ahcich9:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich9:0:0:0): CAM status: Command timeout
(aprobe0:ahcich9:0:0:0): Retrying command
ahcich9: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich9: Timeout on slot 11 port 0
ahcich9: is 00000000 cs 00000800 ss 00000000 rs 00000800 tfd 80 serr 00000000 cmd 0004cb17
(aprobe0:ahcich9:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich9:0:0:0): CAM status: Command timeout
(aprobe0:ahcich9:0:0:0): Error 5, Retries exhausted
ahcich9: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich9: Poll timeout on slot 12 port 0
ahcich9: is 00000000 cs 00001000 ss 00000000 rs 00001000 tfd 80 serr 00000000 cmd 0004cc17
(aprobe0:ahcich9:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich9:0:0:0): CAM status: Command timeout
(aprobe0:ahcich9:0:0:0): Error 5, Retries exhausted
ahcich9: Timeout on slot 13 port 0
ahcich9: is 00000000 cs 00002000 ss 00000000 rs 00002000 tfd 80 serr 00000000 cmd 0004cd17
(ada1:ahcich9:0:0:0): SETFEATURES ENABLE RCACHE. ACB: ef aa 00 00 00 40 00 00 00 00 00 00
(ada1:ahcich9:0:0:0): CAM status: Command timeout
(ada1:ahcich9:0:0:0): Error 5, Periph was invalidated
ahcich9: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich9: Poll timeout on slot 14 port 0
ahcich9: is 00000000 cs 00004000 ss 00000000 rs 00004000 tfd 80 serr 00000000 cmd 0004ce17
(aprobe0:ahcich9:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich9:0:0:0): CAM status: Command timeout
(aprobe0:ahcich9:0:0:0): Error 5, Retries exhausted
ahcich9: Timeout on slot 15 port 0
ahcich9: is 00000000 cs 00008000 ss 00008000 rs 00008000 tfd 80 serr 00000000 cmd 0004cf17
(ada1:ahcich9:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 0e 97 71 cd 40 01 00 00 00 00 00
(ada1:ahcich9:0:0:0): CAM status: Command timeout
(ada1:ahcich9:0:0:0): Error 5, Periph was invalidated
(ada1:ahcich9:0:0:0): Periph destroyed
ahcich9: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich9: Poll timeout on slot 16 port 0
ahcich9: is 00000000 cs 00010000 ss 00000000 rs 00010000 tfd 80 serr 00000000 cmd 0004d017
(aprobe0:ahcich9:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich9:0:0:0): CAM status: Command timeout
(aprobe0:ahcich9:0:0:0): Error 5, Retries exhausted
ahcich9: Poll timeout on slot 17 port 0
ahcich9: is 00000000 cs 00020000 ss 00000000 rs 00020000 tfd 80 serr 00000000 cmd 0004d117
(aprobe0:ahcich9:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich9:0:0:0): CAM status: Command timeout
(aprobe0:ahcich9:0:0:0): Error 5, Retries exhausted
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
A few months ago it started to drop a boot drive every couple of weeks. Once I reboot, the drive shows up again and works as before. It has now happened on both DOM ports and with three different disks.
Sounds like a power issue. Can you confirm that your PSU is supplying the correct voltages?
 
Status
Not open for further replies.
Top