Hi All,
Running latest version of Truenas.. dell r720xd 96GB Ram. 12 6TB SAS, and one 1TB nvme for cache. Using a h310 mini HBA flashed. E5-2690 CPU
I keep having drives "fail" or become degraded. I would run zpool status and see lots of read/write on a particular drive. If I go to clear that error from just that drive it'll be fine then show back up on that same drive (This happens to any drive).... Then If I go to replace the drive as it's resilvering every other drive will throw errors, and when the pool is done with the resilver another drive will "fail" if I leave it like that the whole pool starts to throw read/write errors. Any ideas? I'm at loss for words there's no way the drives are failing at that rate (One every day)? i have an Identical nas and the drives that I ordered for both were all from the same batch.
(Drives were used btw... I know it's not recommended but they're 5x the price new)
Could it be the h310 mini that's failing? That's only thing I can think of that would cause so many drives to "fail"
What should I do I can't keep ordering more drives it's very expensive.
Thanks!
Bellow is the error from this morning after replacing a drive yesterday and waiting for the resilver to finish..
This is after running zpool clear Pool
In about 20min another drive will start to show errors...
Running latest version of Truenas.. dell r720xd 96GB Ram. 12 6TB SAS, and one 1TB nvme for cache. Using a h310 mini HBA flashed. E5-2690 CPU
I keep having drives "fail" or become degraded. I would run zpool status and see lots of read/write on a particular drive. If I go to clear that error from just that drive it'll be fine then show back up on that same drive (This happens to any drive).... Then If I go to replace the drive as it's resilvering every other drive will throw errors, and when the pool is done with the resilver another drive will "fail" if I leave it like that the whole pool starts to throw read/write errors. Any ideas? I'm at loss for words there's no way the drives are failing at that rate (One every day)? i have an Identical nas and the drives that I ordered for both were all from the same batch.
(Drives were used btw... I know it's not recommended but they're 5x the price new)
Could it be the h310 mini that's failing? That's only thing I can think of that would cause so many drives to "fail"
What should I do I can't keep ordering more drives it's very expensive.
Thanks!
Bellow is the error from this morning after replacing a drive yesterday and waiting for the resilver to finish..
Code:
state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P scan: resilvered 10.4G in 00:43:48 with 0 errors on Wed Jul 21 21:13:01 2021 config: NAME STATE READ WRITE CKSUM Tank2 DEGRADED 0 0 0 raidz3-0 DEGRADED 0 0 0 gptid/e9380b23-e630-11eb-8b55-ecf4bbc0e684 ONLINE 337 27.9K 0 gptid/ea1e756a-e630-11eb-8b55-ecf4bbc0e684 ONLINE 50 3.40K 0 gptid/eac53908-e630-11eb-8b55-ecf4bbc0e684 ONLINE 12 39.2K 0 gptid/eab4836c-e630-11eb-8b55-ecf4bbc0e684 ONLINE 460 17.8K 0 gptid/ed614cef-e960-11eb-b636-ecf4bbc0e684 ONLINE 105 31.1K 0 gptid/ebf41b72-e630-11eb-8b55-ecf4bbc0e684 ONLINE 25 6.14K 0 gptid/eb760390-e630-11eb-8b55-ecf4bbc0e684 ONLINE 80 44.6K 0 gptid/ebbab4d7-e630-11eb-8b55-ecf4bbc0e684 ONLINE 391 28.5K 0 gptid/ec7d62a3-e630-11eb-8b55-ecf4bbc0e684 DEGRADED 38 36.8K 265 too many errors gptid/ddf08d31-ea29-11eb-b636-ecf4bbc0e684 ONLINE 0 0 43 gptid/eca02a58-e630-11eb-8b55-ecf4bbc0e684 ONLINE 21 9.13K 0 gptid/aa75b32a-e665-11eb-8b55-ecf4bbc0e684 ONLINE 98 48.4K 0 cache gptid/e9369611-e630-11eb-8b55-ecf4bbc0e684 ONLINE 0 0 0 errors: No known data errors pool: boot-pool state: ONLINE scan: scrub repaired 0B in 00:01:59 with 0 errors on Fri Jul 16 03:47:00 2021 config: NAME STATE READ WRITE CKSUM boot-pool ONLINE 0 0 0 da12p2 ONLINE 0 0 0
This is after running zpool clear Pool
Code:
state: ONLINE scan: resilvered 10.4G in 00:43:48 with 0 errors on Wed Jul 21 21:13:01 2021 config: NAME STATE READ WRITE CKSUM Tank2 ONLINE 0 0 0 raidz3-0 ONLINE 0 0 0 gptid/e9380b23-e630-11eb-8b55-ecf4bbc0e684 ONLINE 0 0 0 gptid/ea1e756a-e630-11eb-8b55-ecf4bbc0e684 ONLINE 0 0 0 gptid/eac53908-e630-11eb-8b55-ecf4bbc0e684 ONLINE 0 0 0 gptid/eab4836c-e630-11eb-8b55-ecf4bbc0e684 ONLINE 0 0 0 gptid/ed614cef-e960-11eb-b636-ecf4bbc0e684 ONLINE 0 0 0 gptid/ebf41b72-e630-11eb-8b55-ecf4bbc0e684 ONLINE 0 0 0 gptid/eb760390-e630-11eb-8b55-ecf4bbc0e684 ONLINE 0 0 0 gptid/ebbab4d7-e630-11eb-8b55-ecf4bbc0e684 ONLINE 0 0 0 gptid/ec7d62a3-e630-11eb-8b55-ecf4bbc0e684 ONLINE 0 0 0 gptid/ddf08d31-ea29-11eb-b636-ecf4bbc0e684 ONLINE 0 0 0 gptid/eca02a58-e630-11eb-8b55-ecf4bbc0e684 ONLINE 0 0 0 gptid/aa75b32a-e665-11eb-8b55-ecf4bbc0e684 ONLINE 0 0 0 cache gptid/e9369611-e630-11eb-8b55-ecf4bbc0e684 ONLINE 0 0 0 errors: No known data errors pool: boot-pool state: ONLINE scan: scrub repaired 0B in 00:01:59 with 0 errors on Fri Jul 16 03:47:00 2021 config: NAME STATE READ WRITE CKSUM boot-pool ONLINE 0 0 0 da12p2 ONLINE 0 0 0 errors: No known data errors
In about 20min another drive will start to show errors...