danjb
Dabbler
- Joined
- Aug 2, 2014
- Messages
- 26
I had an 8TB drive fail in a storage pool consisting of 2 RAIDZ2 Vdev's each comprised of 6 8TB Western Digital WD80EFZX drives. I replaced the failed drive with a Toshiba HDWG180 because it was the soonest available replacement option I had available.
The resilvering proceeded very quckly at first, reaching something like 53% complete in the first 12 hours. However, it then slowed to a crawl and took about 4 days to reach 58%. It then speeded up a little and took another 3 days to reach 67%, and is currently projecting completion in a little over 3 more days from now. The pool is something like 72% full.
This is on TrueNAS-12.0-U3.1 running on a Xeon E5-2620 with 64GB of RAM. The system runs 24x7 and is not super loaded down. I am not really concerned about the total amount of time the resilvering process takes, I had expected it to run very slowly. However, I am curious about the jumpy percentage complete progress. It progresses along at about 4-5% per hour for the first 12 hours, then slows down by a factor of 100X for the next 4 days, then speeds up about 2X for the next 3 days.
What is the major driving factor to resilvering speed that could cause this orders of magnitude variability? Some things I thought of:
The resilvering proceeded very quckly at first, reaching something like 53% complete in the first 12 hours. However, it then slowed to a crawl and took about 4 days to reach 58%. It then speeded up a little and took another 3 days to reach 67%, and is currently projecting completion in a little over 3 more days from now. The pool is something like 72% full.
This is on TrueNAS-12.0-U3.1 running on a Xeon E5-2620 with 64GB of RAM. The system runs 24x7 and is not super loaded down. I am not really concerned about the total amount of time the resilvering process takes, I had expected it to run very slowly. However, I am curious about the jumpy percentage complete progress. It progresses along at about 4-5% per hour for the first 12 hours, then slows down by a factor of 100X for the next 4 days, then speeds up about 2X for the next 3 days.
What is the major driving factor to resilvering speed that could cause this orders of magnitude variability? Some things I thought of:
- Numbers of files or sizes of files. I have some metadata directories with huge amounts of tiny files. I have other media directories with relatively small numbers of extremely large files. Does one or the other of these cause speedups or slowdowns in resilvering?
- System usage. As I say this is not a heavily loaded system, but it is performing variable amounts of I/O all the time. Could high levels of I/O cause 100X resilvering slowdowns? Could CPU utilization cause that large an amount of variability?
- I would have liked to use an identical drive, but like I mentioned I didn't. The old drives are 5400RPM slower drives, the new drive is a 7200RPM faster drive. Other than it being bad to have a mismatch, could this cause performance issues? It doesn't seem like it would contribute to the variability though.
- Is it possible this variability indicates an issue with any drives? No errors are currently being reported.
Code:
pool: storage state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue Aug 31 17:50:30 2021 47.4T scanned at 84.2M/s, 46.8T issued at 83.2M/s, 69.6T total 1.22T resilvered, 67.29% done, 3 days 07:42:05 to go