We started getting checksum errors on one of our drives about a month ago, that caused the pool to go to unhealthy, and then eventually degraded status. So, we figured we had a bad disk. We replaced it.
The problem recurred with the brand new disk. That seemed odd, so I started googling.
The consensus here is that it's often a faulty cable or bad connection. So, as part of this thread (VMware, iSCSI, dropped connections and lockups), I swapped out my drive controllers AND cables. The problem persists.
So, we thought "maybe the backplane is the cause" and we swapped the disk to a different port on a different backplane. The problem stays with the drive.
Now, normally, I would say "ok, so it's a bad drive" and replace it. But when the original drive and a BRAND NEW drive have exactly the same issue, and I've replaced all the connections, I am at a loss.
What else can I do to get to the bottom of this?
NOTE: The original drives are all HGST HUH728080AL4200 8TB disks. When we replaced the original disk, we were unable to find that model brand new -- only refurb, so we replaced it with a model that had the same base specs (performance, size, RPM, sectors [4kn]), a SEAGATE STMPSKD1CLAR8000.
I know it's not optimal to use a different drive, but since the problem is identical to the HGST, it's hard to say for certain that the mismatch is the cause. I have not had time to take the original HGST drive and do a long test on it in another machine, but I suspect that the original drive is just fine, and something else is causing these checksum errors.
I'm not sure that my other thread directly relates to this or not, but I will post a link to this one there, as well, so people have the full story of what is going on with our system.
The problem recurred with the brand new disk. That seemed odd, so I started googling.
The consensus here is that it's often a faulty cable or bad connection. So, as part of this thread (VMware, iSCSI, dropped connections and lockups), I swapped out my drive controllers AND cables. The problem persists.
So, we thought "maybe the backplane is the cause" and we swapped the disk to a different port on a different backplane. The problem stays with the drive.
Now, normally, I would say "ok, so it's a bad drive" and replace it. But when the original drive and a BRAND NEW drive have exactly the same issue, and I've replaced all the connections, I am at a loss.
What else can I do to get to the bottom of this?
NOTE: The original drives are all HGST HUH728080AL4200 8TB disks. When we replaced the original disk, we were unable to find that model brand new -- only refurb, so we replaced it with a model that had the same base specs (performance, size, RPM, sectors [4kn]), a SEAGATE STMPSKD1CLAR8000.
I know it's not optimal to use a different drive, but since the problem is identical to the HGST, it's hard to say for certain that the mismatch is the cause. I have not had time to take the original HGST drive and do a long test on it in another machine, but I suspect that the original drive is just fine, and something else is causing these checksum errors.
I'm not sure that my other thread directly relates to this or not, but I will post a link to this one there, as well, so people have the full story of what is going on with our system.