bal0an
Explorer
- Joined
- Mar 2, 2012
- Messages
- 72
I've search a bit for a good way to replace a still working (albeit old) disk. Here I'd like to document my solution for discussion, and raise two issues.
Issue 1: the manual doesn't describe how to replace a working disk without degrading the pool.
Issue 2: the UI replace disk option seems not to work.
See also:
Replacing Disk does not work in TrueNAS Core v13.0?
Drive Swap With Spare
Forcing a hot spare to become a permanent drive in pool
First attempt which did not work: item 1.
My recipe for safely replacing a working disk item 2.
This is my configuration:
An additional disk has been connected but not added to any pool.
1. Ideally since the pool is still fully operational replacing a disk should work without degrading the pool. Lacking other official information I followed the manual at https://www.truenas.com/docs/core/coretutorials/storage/disks/diskreplace/#figure-2 . Using the UI I added the disk as a spare, and then tried to
a) OFFLINE, and
b) REPLACE it with the spare drive.
The REPLACE step didn't work because I couldn't select non-member disks from the replace UI window. This is the case for the newly installed disk - either added as a spare to the pool or standalone. Is this still the known error from when 13.0 was released?
After taking the legacy disk back ONLINE I proceeded as follows:
2. ZFS instructions describe it like this: Activating and Deactivating Hot Spares in Your Storage Pool
This looks much better as it resilvers the spare disk while keeping all RAIDZ1 disks online and reducing the risk of disk failure due to high load.
As a last step I will have to remove the legacy drive and make the spare drive integrate with the pool (to be updated).
The subsequent display in the UI is misleading IMHO. It may be true the "ada1 is UNAVAIL as a spare" as it is being resilvered but that fact is not visible int he UI. The output of '''zfs status''' is easier to understand.
1
Issue 1: the manual doesn't describe how to replace a working disk without degrading the pool.
Issue 2: the UI replace disk option seems not to work.
See also:
Replacing Disk does not work in TrueNAS Core v13.0?
Drive Swap With Spare
Forcing a hot spare to become a permanent drive in pool
First attempt which did not work: item 1.
My recipe for safely replacing a working disk item 2.
This is my configuration:
Code:
root@nas1:~ # zpool status -v tank pool: tank state: ONLINE scan: resilvered 264K in 00:00:01 with 0 errors on Sat Feb 17 12:07:35 2024 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 gptid/cfbe1afc-9caf-11ec-81a1-18c04d8f0452 ONLINE 0 0 0 gptid/fdff7e8b-2548-11ec-b7d6-d05099921104 ONLINE 0 0 0 gptid/fe2055b4-2548-11ec-b7d6-d05099921104 ONLINE 0 0 0 gptid/fe15fec1-2548-11ec-b7d6-d05099921104 ONLINE 0 0 0 gptid/fe269fa1-2548-11ec-b7d6-d05099921104 ONLINE 0 0 0 errors: No known data errors
An additional disk has been connected but not added to any pool.
1. Ideally since the pool is still fully operational replacing a disk should work without degrading the pool. Lacking other official information I followed the manual at https://www.truenas.com/docs/core/coretutorials/storage/disks/diskreplace/#figure-2 . Using the UI I added the disk as a spare, and then tried to
a) OFFLINE, and
b) REPLACE it with the spare drive.
The REPLACE step didn't work because I couldn't select non-member disks from the replace UI window. This is the case for the newly installed disk - either added as a spare to the pool or standalone. Is this still the known error from when 13.0 was released?
After taking the legacy disk back ONLINE I proceeded as follows:
2. ZFS instructions describe it like this: Activating and Deactivating Hot Spares in Your Storage Pool
This looks much better as it resilvers the spare disk while keeping all RAIDZ1 disks online and reducing the risk of disk failure due to high load.
Code:
root@nas1:~ # zpool replace tank gptid/fe269fa1-2548-11ec-b7d6-d05099921104 gptid/233b7e12-cd91-11ee-95ed-18c04d8f0452 root@nas1:~ # zpool status -v tank pool: tank state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sat Feb 17 13:37:07 2024 1.90T scanned at 21.8G/s, 32.1G issued at 369M/s, 8.71T total 6.30G resilvered, 0.36% done, 06:50:35 to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 gptid/cfbe1afc-9caf-11ec-81a1-18c04d8f0452 ONLINE 0 0 0 gptid/fdff7e8b-2548-11ec-b7d6-d05099921104 ONLINE 0 0 0 gptid/fe2055b4-2548-11ec-b7d6-d05099921104 ONLINE 0 0 0 gptid/fe15fec1-2548-11ec-b7d6-d05099921104 ONLINE 0 0 0 spare-4 ONLINE 0 0 0 gptid/fe269fa1-2548-11ec-b7d6-d05099921104 ONLINE 0 0 0 gptid/233b7e12-cd91-11ee-95ed-18c04d8f0452 ONLINE 0 0 0 (resilvering) spares gptid/233b7e12-cd91-11ee-95ed-18c04d8f0452 INUSE currently in use errors: No known data errors
As a last step I will have to remove the legacy drive and make the spare drive integrate with the pool (to be updated).
The subsequent display in the UI is misleading IMHO. It may be true the "ada1 is UNAVAIL as a spare" as it is being resilvered but that fact is not visible int he UI. The output of '''zfs status''' is easier to understand.
1
Last edited: