So less than two years ago I bought 8 Seagate drives. (I purchased them from Amazon in two batches). I run them in raidz2.
I run routinely long smart tests, and one day after a month one disk suddenly had read error, and showed 8 Current_Pending_Sector and Offline_Uncorrectable. Than it was find for a day, and after that it started to show ATA errors and I started to get CRC errors and the Reallocated_Sector_Ct raised (to the hundreds). So I RMA'd it, and everything was fine for the next year and half.
Suddenly two months ago, another disk had the 8 Current_Pending_Sector and Offline_Uncorrectable. Before I had the chance to deal with it (one day later), my mother board died (my C2750D4I, but that is another story).
So I finally got the motherboard two weeks ago, I ran badblocks on that drive, and everything looked fine, so I resilvered it and didn't RMA it (I have redundancy of 2 so I wasn't too worried).
But... yesterday I got the 8 sectors error on two more disks! So the first thing that I did to run scrub to verify that my data is fine, but guess what? I got a lot of CRC errors, not on those two disks, but on the disk that had it two months ago (and the relocated sectors count raised). So now I am in a situation that I have 3 bad disks in raidz2 (the nightmare). I rushed to the store to buy another disk (WD this time), and I am resilvering it into the array right now. (It is going to finish soon and there isn't any error). After that I will have the minimal number of 'ok' disks in my raid for now.
So I am going to RMA that disk ASAP (it has ATA errors now too). And I am pretty that those two other disks are time bombs too so I will RMA them too one by one (one of them already had ATA errors, but the bad sectors hasn't start to raise yet).
Now.... How it is possible? Can it be something wrong with my server? 3 disks die almost simultaneously after 11K hours?
I run routinely long smart tests, and one day after a month one disk suddenly had read error, and showed 8 Current_Pending_Sector and Offline_Uncorrectable. Than it was find for a day, and after that it started to show ATA errors and I started to get CRC errors and the Reallocated_Sector_Ct raised (to the hundreds). So I RMA'd it, and everything was fine for the next year and half.
Suddenly two months ago, another disk had the 8 Current_Pending_Sector and Offline_Uncorrectable. Before I had the chance to deal with it (one day later), my mother board died (my C2750D4I, but that is another story).
So I finally got the motherboard two weeks ago, I ran badblocks on that drive, and everything looked fine, so I resilvered it and didn't RMA it (I have redundancy of 2 so I wasn't too worried).
But... yesterday I got the 8 sectors error on two more disks! So the first thing that I did to run scrub to verify that my data is fine, but guess what? I got a lot of CRC errors, not on those two disks, but on the disk that had it two months ago (and the relocated sectors count raised). So now I am in a situation that I have 3 bad disks in raidz2 (the nightmare). I rushed to the store to buy another disk (WD this time), and I am resilvering it into the array right now. (It is going to finish soon and there isn't any error). After that I will have the minimal number of 'ok' disks in my raid for now.
So I am going to RMA that disk ASAP (it has ATA errors now too). And I am pretty that those two other disks are time bombs too so I will RMA them too one by one (one of them already had ATA errors, but the bad sectors hasn't start to raise yet).
Now.... How it is possible? Can it be something wrong with my server? 3 disks die almost simultaneously after 11K hours?