Quick summary:
Hardware is good (supermicro/ECC/Xeon v3/enterprise drives) and server seems good. The pool - which I'm expanding - has been made up of 4 sets of mirrors (call them vdev1:{d1,d2}, vdev2:{d1,d2}, vdev3{d1,d2} and vdev4:{d1,d2}). I'm upgrading them to 3 way mirrors (I like mirrors, and 2 way isn't redundant enough any more).
I added 4 disks to the array (3 x mirrors to vdev1 and 1 x mirror to vdev2). When they finish resilvering, I'll detach 2 of the old drives from vdev1 and reuse it elsewhere.
What I'm noticing is odd behaviour with vdev4:d1.
While watching the resilver progress, I noticed that zpool status -v listed zero for all R/W/CHK for all disks, except vdev4:d1 which listed R=1, W=114, CHK=0. I wasn't too worried as this is often a symptom of a bad cable, so I powered the system down to swap the cable and change the port. But I also saw that the output showed vdev4:d1 as resilvering, which was odd - I certainly hadn't told it to do so, and there were no log reports I could find showing other faults (including SMART faults) that definitively showed a disk error.
I shut down, swapped the cable and port, and rebooted and all was then normal. This was about 4 hours ago. After rebooting, zpool status -v showed zero R/W/CHK errors for all drives (including that one), it showed just 3 new drives resilvering (correct), and vdev4:d1 had zero errors and was no longer listed as resilvering. After a while of no issues, I got on with other things and thought no more of it until now.
I probably kept an eye on zpool status for a while after reboot, but can't be sure how long. But just now, I rechecked. R/W/CHK errors are still all zero for all drives, but now vdev4:d1 shows once again as spontaneously "resilvering" without being told to and without obvious reason.
Now I'm distinctly disturbed.
What should I make of this, and what if any action is appropriate?
Hardware is good (supermicro/ECC/Xeon v3/enterprise drives) and server seems good. The pool - which I'm expanding - has been made up of 4 sets of mirrors (call them vdev1:{d1,d2}, vdev2:{d1,d2}, vdev3{d1,d2} and vdev4:{d1,d2}). I'm upgrading them to 3 way mirrors (I like mirrors, and 2 way isn't redundant enough any more).
I added 4 disks to the array (3 x mirrors to vdev1 and 1 x mirror to vdev2). When they finish resilvering, I'll detach 2 of the old drives from vdev1 and reuse it elsewhere.
What I'm noticing is odd behaviour with vdev4:d1.
While watching the resilver progress, I noticed that zpool status -v listed zero for all R/W/CHK for all disks, except vdev4:d1 which listed R=1, W=114, CHK=0. I wasn't too worried as this is often a symptom of a bad cable, so I powered the system down to swap the cable and change the port. But I also saw that the output showed vdev4:d1 as resilvering, which was odd - I certainly hadn't told it to do so, and there were no log reports I could find showing other faults (including SMART faults) that definitively showed a disk error.
I shut down, swapped the cable and port, and rebooted and all was then normal. This was about 4 hours ago. After rebooting, zpool status -v showed zero R/W/CHK errors for all drives (including that one), it showed just 3 new drives resilvering (correct), and vdev4:d1 had zero errors and was no longer listed as resilvering. After a while of no issues, I got on with other things and thought no more of it until now.
I probably kept an eye on zpool status for a while after reboot, but can't be sure how long. But just now, I rechecked. R/W/CHK errors are still all zero for all drives, but now vdev4:d1 shows once again as spontaneously "resilvering" without being told to and without obvious reason.
Now I'm distinctly disturbed.
smartctl -a
shows healthy + quite a few fast ECC corrected errors + 4 "non medium error count". I'm not sure which SMART report to run, if it's helpful I will run it.What should I make of this, and what if any action is appropriate?
Last edited: