I am trying to figure out what is causing this behavior, two of my disks go offline (either one at a time) and a reboot fixes it. I reconnected the SATA cables, reseated them and the power supply cables etc. Looking for ideas to figure out if this is an issue with my hardware or a software glitch because a reboot fixes it every time.
Similarly,
and the result is this
Any help or guidance would be much appreciated!
Code:
kernel log messages: > ahcich10: Timeout on slot 17 port 0 > ahcich10: is 00000002 cs 00000000 ss 00000000 rs 00020000 tfd 50 serr 00000000 cmd 00047117 > ahcich10: Timeout on slot 17 port 0 > ahcich10: is 00000002 cs 00000000 ss 00000000 rs 00020000 tfd 50 serr 00000000 cmd 00047117 > (aprobe0:ahcich10:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 > (aprobe0:ahcich10:0:0:0): CAM status: Command timeout > (aprobe0:ahcich10:0:0:0): Error 5, Retry was blocked > ahcich10: Timeout on slot 24 port 0 > ahcich10: is 00000002 cs 00000000 ss 00000000 rs 01000000 tfd 50 serr 00000000 cmd 00047817 > ahcich10: Timeout on slot 24 port 0 > ahcich10: is 00000002 cs 00000000 ss 00000000 rs 01000000 tfd 50 serr 00000000 cmd 00047817 > (aprobe0:ahcich10:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 > (aprobe0:ahcich10:0:0:0): CAM status: Command timeout > (aprobe0:ahcich10:0:0:0): Error 5, Retry was blocked -- End of security output --
Similarly,
Code:
kernel log messages: > ahcich11: Timeout on slot 22 port 0 > ahcich11: is 00000002 cs 00000000 ss 00000000 rs 00400000 tfd 50 serr 00000000 cmd 00047617 > ahcich11: Timeout on slot 22 port 0 > ahcich11: is 00000002 cs 00000000 ss 00000000 rs 00400000 tfd 50 serr 00000000 cmd 00047617 > (aprobe0:ahcich11:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 > (aprobe0:ahcich11:0:0:0): CAM status: Command timeout > (aprobe0:ahcich11:0:0:0): Error 5, Retry was blocked > ahcich11: Timeout on slot 22 port 0 > ahcich11: is 00000002 cs 00000000 ss 00000000 rs 00400000 tfd 50 serr 00000000 cmd 00047617 > (aprobe0:ahcich11:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 > (aprobe0:ahcich11:0:0:0): CAM status: Command timeout > (aprobe0:ahcich11:0:0:0): Error 5, Retry was blocked > (ada7:ahcich11:0:0:0): lost device -- End of security output --
and the result is this
Code:
pool: Vol1_Z2 state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 52.4M in 0h0m with 0 errors on Wed Feb 5 08:28:36 2014 config: NAME STATE READ WRITE CKSUM Vol1_Z2 DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 gptid/1726809e-6fd3-11e3-82e4-6805ca036bbd ONLINE 0 0 0 6019324666217575707 REMOVED 0 0 0 was /dev/gptid/1791a0bf-6fd3-11e3-82e4-6805ca036bbd gptid/17fb544b-6fd3-11e3-82e4-6805ca036bbd ONLINE 0 0 0 gptid/186595a7-6fd3-11e3-82e4-6805ca036bbd ONLINE 0 0 0 gptid/18d4b6c5-6fd3-11e3-82e4-6805ca036bbd ONLINE 0 0 0 gptid/1940235e-6fd3-11e3-82e4-6805ca036bbd ONLINE 0 0 0 errors: No known data errors
Any help or guidance would be much appreciated!