Disk replacement procedure for RaidZ1?

Sokonomi

Contributor
Joined
Jul 15, 2018
Messages
115
Im sure this has been asked a dozen times before, but unfortunately all the info I can find either pertains to older builds of TrueNAS/FreeNAS, or involves some kind of issue that makes their situation more complicated than mine. So id like to run this checklist by some seasoned users before I start pulling things in my NAS.

So first some context;
The version i'm running is TrueNAS-12.0-U5 (Want to update, but im told its best to do after resilvering)
My current pool consists of 6 x WD Red 3Tb drives, one of which has been flaking out.
My current pool status seems to read as 'unhealthy', with 0 disks w/errors.
All disks are online and the NAS seems to function as usual, still.
ada3p2 has a 'checksum 1' which I assume is where the unhealthy status is coming from.
I have copied said drives serialnumber to make the physical disk easier to identify.
My system has a spare SATA port that I could use; apparently this helps.

So where do I go from here?
I have found this tutorial though it leaves some questions unanswered; Do I need to do any prepwork before doing this? I have jails running on this pool, should I offline those first? Any settings I need to back up/note down beforehand? The manual states a failing disk can be left online when replacing, though only when you know exactly what the failing disk condition is. How do I know which route is best to take? The pool isn't degraded, just unhealthy and seemingly still functioning. I could just plug the replacement disk into a spare SATA port and go at it that way, or I can offline and yank the broken disk out and plop the replacement in its spot.

The data on my pool isn't super curial, though recovering it would save me a few hours of time.

Any tips on how to I should proceed?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Do I need to do any prepwork before doing this?
Any disk you're using in your system should be burned-in and tested first. There are a number of guides on doing this; Uncle Fester's (link in my sig) is one.
I have jails running on this pool, should I offline those first?
No.
Any settings I need to back up/note down beforehand?
No.
The manual states a failing disk can be left online when replacing, though only when you know exactly what the failing disk condition is.
If the failing disk is still generally OK (and with only one checksum error, I'd expect that to be the case), I prefer to leave it online during the replacement process. Taking it offline degrades your redundancy, and with RAIDZ1 means you have no redundancy at all. Leaving it online means that the resilvering will likely take longer than it would if the disk were offline, but I'd consider the preservation of redundancy worth it.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
"Checksum" could be a cable rather than the drive.

Do you have a backup?
What does SMART report? (smartctl -a /dev/ada3 in a SSH shell—avoid the GUI shell if possible)
You're welcome to post smartctl output, for all drives, as well as zpool status -v. Within CODE tags for readability, please.

If it's "just" a single checksum error, a scrub may be enough.
If you proceed to replace the drive, plug the new drive (preferably after running some burn-in tests on it) to the free port, and initiate replacement from the GUI (Storage > Pool > (gear) > Status > (old drive) > (…) > Replace). There's no need to offline or remove the old drive or stop the jails—though performance may suffer while resilver is in progress.
 

Sokonomi

Contributor
Joined
Jul 15, 2018
Messages
115
Any disk you're using in your system should be burned-in and tested first. There are a number of guides on doing this; Uncle Fester's (link in my sig) is one.
Ive taken a look at the uncle fester manual, but I cant seem to find any mentions of disk burn-in procedures? Could you point me to where this is explained?

If the failing disk is still generally OK (and with only one checksum error, I'd expect that to be the case), I prefer to leave it online during the replacement process. Taking it offline degrades your redundancy, and with RAIDZ1 means you have no redundancy at all. Leaving it online means that the resilvering will likely take longer than it would if the disk were offline, but I'd consider the preservation of redundancy worth it.
The NAS seems to be performing 'alright' still, I think it recovered and resilvered itself once, but the disk has had a sector reallocated or something. I think it should still have enough to give to aid its replacement, but for me that's a hard call to make since Ive got no experience resilvering arrays.

"Checksum" could be a cable rather than the drive.

Do you have a backup?
What does SMART report? (smartctl -a /dev/ada3 in a SSH shell—avoid the GUI shell if possible)
You're welcome to post smartctl output, for all drives, as well as zpool status -v. Within CODE tags for readability, please.

If it's "just" a single checksum error, a scrub may be enough.
If you proceed to replace the drive, plug the new drive (preferably after running some burn-in tests on it) to the free port, and initiate replacement from the GUI (Storage > Pool > (gear) > Status > (old drive) > (…) > Replace). There's no need to offline or remove the old drive or stop the jails—though performance may suffer while resilver is in progress.
The cable being faulty would be a surprise, I've never had a cable give out after 4 years of unimpeded service before. But ill keep that in mind once I pull the allegedly faulty drive for testing.

I have fully backed up all important data, and most of the less important stuff as well, though if I could preserve the pool it would save me a bit of headache.

Heres what SMART spat out;

smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p9 amd64] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Green Device Model: WDC WD30EZRX-00MMMB0 Serial Number: WD-WCAWZ0908447 LU WWN Device Id: 5 0014ee 25b5f72ee Firmware Version: 80.00A80 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sun Jul 10 10:51:34 2022 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 121) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: (50160) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 482) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 167 144 021 Pre-fail Always - 8616 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1762 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 008 008 000 Old_age Always - 67738 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1306 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 177 193 Load_Cycle_Count 0x0032 145 145 000 Old_age Always - 167827 194 Temperature_Celsius 0x0022 117 106 000 Old_age Always - 35 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 2 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 2 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 2071 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 2 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 90% 2191 1044225103 # 2 Short offline Completed without error 00% 2024 - # 3 Extended offline Completed without error 00% 1992 - # 4 Short offline Completed: read failure 10% 1856 979058200 # 5 Short offline Completed: read failure 60% 1689 979058200 # 6 Short offline Completed: read failure 10% 1528 978963121 # 7 Short offline Completed: read failure 90% 1353 1001713848 # 8 Extended offline Completed: read failure 90% 1257 1001713848 # 9 Short offline Completed: read failure 90% 1185 1001713848 #10 Short offline Completed: read failure 90% 1017 1001713848 #11 Short offline Completed: read failure 80% 850 1001713848 #12 Short offline Completed without error 00% 683 - #13 Extended offline Completed: read failure 80% 516 982200736 #14 Short offline Completed without error 00% 347 - #15 Short offline Completed without error 00% 179 - #16 Short offline Completed without error 00% 11 - #17 Short offline Completed without error 00% 65380 - #18 Extended offline Completed without error 00% 65342 - #19 Short offline Completed without error 00% 65213 - #20 Short offline Completed without error 00% 65045 - #21 Short offline Completed without error 00% 64878 - 9 of 10 failed self-tests are outdated by newer successful extended offline self-test # 3 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0) : After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.

To my surprise, this disk seems to identify itself as a WD GREEN (??) while I am 100% certain the physical disk label states otherwise. I even have pictures of the drives from when I was constructing the NAS, and it definitely appears to be all reds. I've SMART checked the others and they all identify normally. I bought them all new and sealed from a reputable vendor. I am as baffled at this as you might be. I guess this is why the drive is faulting while the rest of them are fine? Could some weird fault cause a drive to misidentify somehow? All the more reason for me to remove this bizarre drive I guess.. :eek: For what its worth, the failures seems to be all read failures, which might be typical for a slow to wake WD Green?

So all I have to really do here is run a burn test on the new drive, plop it into the spare SATA port and do as you instructed to replace? I presume doing that will prompt me for what disk to use as replacement? And once its done resilvering, I can just power down, yank the red/green bastard, and plug the new disk in its rackmount/port? The new disk would have to sit unmounted during this procedure as ive only got a SATA port to spare, not a disk mount (its a 6 disk factal design Node case).
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Could you point me to where this is explained?

In the SMART attributes of your disk, nothing's too concerning other than the load cycle count, which is consistent with it being a Green rather than a Red (and just the age of the disk). But it's pretty consistently failing SMART self-tests; that's definitely a reason to replace it. You should have been getting email alerts about this--make sure you've entered your email address in the SMART service configuration.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
This is not too bad (yet… these counters should go up in the not-too-distant future):
Code:
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

but I'd say that this is worrying:
Code:
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       2
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       2071
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       2


as well as the SMART errors on various sectors. Replace the drive, by a CMR drive of the same or higher capacity.
These appear to be physical errors, not just "slow to wake up".
As for the Red label on a Green, only WD could tell what might have happened.

So all I have to really do here is run a burn test on the new drive, plop it into the spare SATA port and do as you instructed to replace? I presume doing that will prompt me for what disk to use as replacement? And once its done resilvering, I can just power down, yank the red/green bastard, and plug the new disk in its rackmount/port? The new disk would have to sit unmounted during this procedure as ive only got a SATA port to spare, not a disk mount (its a 6 disk factal design Node case).
Yes. You may also hang the new drive in the bay already and let the old drive unmounted—but plugged to a SATA port. ZFS tracks drives by gptid, so it doesn't matter if drives/ports are shuffled across reboots.
But always carefully check by serial # which drive is to be replaced! It may or may not still be 'ada3' after a reboot.
 

Sokonomi

Contributor
Joined
Jul 15, 2018
Messages
115
Oh, so burn in and validation mean the same thing? Sorry, I didn't know. Is there an easy way to perform these tests using a USB external and windows? I don't really have any lab computers on hand to do it with at the moment. I've looked around for a windows/dos SMART test tool, but most results felt kinda sketchy, so I was hoping someone here knows of a tried and true bit of software that I could use.

I have two new WD Red Plus 4Tb drives on hand that I had intended to silver in one by one (and then 2 more next week and 2 more the week after, to spread cost). So if I could get through the preliminary validation stuff using my desktop machine that would make things a little easier.

In the SMART attributes of your disk, nothing's too concerning other than the load cycle count, which is consistent with it being a Green rather than a Red (and just the age of the disk). But it's pretty consistently failing SMART self-tests; that's definitely a reason to replace it. You should have been getting email alerts about this--make sure you've entered your email address in the SMART service configuration.
I did enter my email but for some reason I did not receive notice until I logged in and saw a red bell icon.. By some strange coincidence I logged on to check up on a jail, only hours after the NAS resilvered itself. So fortunately it hasn't been sitting on it for that long yet.

This is not too bad (yet… these counters should go up in the not-too-distant future):
Code:
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

but I'd say that this is worrying:
Code:
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       2
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       2071
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       2


as well as the SMART errors on various sectors. Replace the drive, by a CMR drive of the same or higher capacity.
These appear to be physical errors, not just "slow to wake up".
As for the Red label on a Green, only WD could tell what might have happened.


Yes. You may also hang the new drive in the bay already and let the old drive unmounted—but plugged to a SATA port. ZFS tracks drives by gptid, so it doesn't matter if drives/ports are shuffled across reboots.
But always carefully check by serial # which drive is to be replaced! It may or may not still be 'ada3' after a reboot.
Some sectors are eating dirt, that's definitely a death rattle sign. What bothers me the most though, is the label says Red, but the SMART comes up with Green, no idea why I hadn't noticed that before, but it really bothers me. I vaguely remember flashing some firmware changes on two of my old 1Tb WD greens (terrible purchase, never touch green!) in another computer just for funzies, but they have never been in the same system together with my reds, so I don't know. All I can think of is either I or my vendor got scammed. 4 years too late to do anything about that now though.. I'm glad I didn't buy my new drives from the same place now.

Another thing im pondering now is if the firmware given serial# will even match the darn physical sticker at all. But I guess I can find the bugger by process of elimination; Any sticker # that doesn't match what TrueNAS is giving me is the culprit.

ZFS is some wonderfully resilient system it seems. I remember screwing up a classic raid0 by accidentally mixing SATA plugs up. Not possible with ZFS apparently.

So here is where im at;
- First I need to validate my new disk(s) one way or another.
- Then I need power down my TrueNAS and introduce said disk to it.
- Power back on, then go to 'Storage > Pool > (gear) > Status > (old drive) > (…) > Replace'
- It will probably prompt me what to use as the replacement here.
- Let it carry out a resilver.
- Power down and pull the broken disk out.
- Move new disk in its place.
- Power back on, and run a SMART test to see if all went well.
- Fester's your uncle.

Did I miss anything?

I remember reading somewhere I should turn on automatic expansion so the pool will silver into a bigger size once all drives are replaced, but im not sure if and where I should do that.
 
Last edited:

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
I remember reading somewhere I should turn on automatic expansion so the pool will silver into a bigger size once all drives are replaced, but im not sure if and where I should do that.
The autoexpand flag should have been set by default. You can check with zpool get autoexpand.
Your replace list is complete.

The burn-in can be done in any system which can run the commands you're using (there's no definitive standard on that). But it's best to plug the drive(s) through SATA rather than by USB, as the procedure may run for a long time (badblocks takes days to complete its passes) and USB connections are not that reliable.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Is there an easy way to perform these tests using a USB external and windows?
I'm sure it could be done, but I have no idea what tools you'd use. You said you had a spare SATA port in the server, right? Why not just connect the disk there and run the tests on your server?
Did I miss anything?
Doesn't look like it.
 

Sokonomi

Contributor
Joined
Jul 15, 2018
Messages
115
Why not just connect the disk there and run the tests on your server?
Its a bit of a mess when it comes to IT in my building. I was halfway through running all the cables for a nice serverrack cabin with all the space id want, but the contractors wife got ill so now I'm still stuck keeping my NAS in a cramped little fractal design Node box up on a high shelf in an awkward hard to access corner of my workshop. And of course one of the disks decided to die now, a month or two before I have the chance to move it all over to a nice roomy accessible 14 bay silverstone rack. Talk about bad timing. o_O

So I have a bit of an issue with where im going to keep the server while all this validation and resilvering is happening. It cant sit on my workbench with its disk hanging out for longer than a 3 day weekend, so I was kind of looking to minimise that by doing the preliminaries from my desktop or something. But that machine is a heavy watercooled rig with all the cables managed a little too well, so its not easy to just hang a disk on it internally. Hence the question about USB external.

But there is a plan C.. maybe..
I have a little Dell computer I was preparing to be my new router (pfSense is neat), but I can hold off on deploying that since the rack isn't ready yet, to use it to run the tests instead. I believe uncle festers manual states just installing TrueNAS on some temporary box solely to run the apparently built in disk checks as one of the options. But ill have to see if that tiny SFF computer even has a second SATA to begin with. :')

Tech is fun, when its working! :wink:
 

Sokonomi

Contributor
Joined
Jul 15, 2018
Messages
115
Welp, Ive done all that I summarized, but after pulling the broken disk out, the NAS complains:

CRITICAL​

Pool Tank state is DEGRADED: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:
  • Disk 9911428738427028150 is UNAVAIL

2022-07-19 09:23:32 (Europe/Amsterdam)

So apparently I did miss something..
Anyone know what I should be doing?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
zpool status is the first thing we need to start ...
 

Sokonomi

Contributor
Joined
Jul 15, 2018
Messages
115
zpool status is the first thing we need to start ...
Ofcourse, here you go:
Code:
  pool: Tank
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q
  scan: resilvered 25.8M in 00:00:06 with 0 errors on Tue Jul 19 10:09:50 2022
config:

        NAME                                            STATE     READ WRITE CKSUM
        Tank                                            DEGRADED     0     0     0
          raidz1-0                                      DEGRADED     0     0     0
            gptid/73ea972a-d762-11eb-97f5-d05099c19171  ONLINE       0     0     0
            gptid/74b038f8-d762-11eb-97f5-d05099c19171  ONLINE       0     0     0
            gptid/74baea02-d762-11eb-97f5-d05099c19171  ONLINE       0     0     0
            9911428738427028150                         UNAVAIL      0     0     0  was /dev/gptid/74a6b398-d762-11eb-97f5-d05099c19171
            gptid/74cf2566-d762-11eb-97f5-d05099c19171  ONLINE       0     0     0
            gptid/93915049-05ad-11ed-9080-d05099c19171  ONLINE       0     0     0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:10 with 0 errors on Sun Jul 17 03:45:10 2022
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          ada0p2    ONLINE       0     0     0

errors: No known data errors
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
So there's one disk missing, which must be the one you pulled out. Did you replace that with the new disk before you pulled it? If not you need to do that now. Can be done in the UI ...

Your new disk is connected? camcontrol devlist shows all the disks the system know about. Also the UI shows the pool status in a way where we can see which GPTID is which disk ...
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Yea, I see five drives of the six you originally had. I don't think you have actually added the new drive so you missed some instructions. I'd probably power down and reconnect the drive you pulled out, power back on and scrub the pool again, get it back to normal, then you can start all over. My assessment of your drives is you have one that might have a SATA data cable issue (UDMA_CRC_Errors), but the fact that you cannot always pass a selftest (which is a completely internal drive test) is evidence the drive is premature of a total failure.

Be very descriptive in what you did to get to this point, do not assume we know what you have done, we are not there and you could be making a simple mistake. Treat us as if we are clueless on what you are doing, because we are. This is not to say that we think less of you, it's the fact that a lot is lost by making assumptions. We all want to help you fix your problem with the least amount of trouble and no data loss.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Are you sure you pulled the correct disk?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994

Sokonomi

Contributor
Joined
Jul 15, 2018
Messages
115
So there's one disk missing, which must be the one you pulled out. Did you replace that with the new disk before you pulled it? If not you need to do that now. Can be done in the UI ...

Your new disk is connected? camcontrol devlist shows all the disks the system know about. Also the UI shows the pool status in a way where we can see which GPTID is which disk ...

So heres a rundown of what I did sofar;
01. Ran the disk through its paces in another computer, according to uncle festers guide, it came up clean and proper.
02. Power down and plug the new disk into the last available SATA port, booted back up and it recognised the disk correctly.
03. I clicked 'Storage > Pools > [cog] > Status' to determine which ada# had the checksum 1.
04. I clicked ' > Replace' on said disk and selected the new disk as the member disk.
05. Waited for it to prattle through the long resilvering process.
06. At this point the pool said it was healthy again.
07. I checked which ada# the broken disk had once more, then clicked 'Storage > Disks' to find the corresponding S/N.
08. Powered down, pulled the disk that matched it, moved the new disk from the spare SATA to the old disks SATA.
09. Powered up, only to find that the pool is degraded once again.
10. Added the old disk back in using the spare SATA, this restored the pool back to healthy again.

So long story short; Can't remove broken disk despite having replaced it with new one.

EDIT :
This is what camcontrol devlist gives me with all disks plugged and running;
Code:
<TEAM T253LE120G SBFM11.1>         at scbus0 target 0 lun 0 (pass0,ada0)
<WDC WD30EFRX-68EUZN0 82.00A82>    at scbus1 target 0 lun 0 (pass1,ada1)
<WDC WD30EFRX-68EUZN0 82.00A82>    at scbus2 target 0 lun 0 (pass2,ada2)
<WDC WD30EZRX-00MMMB0 80.00A80>    at scbus3 target 0 lun 0 (pass3,ada3)
<WDC WD30EZRX-00AZ6B0 80.00A80>    at scbus4 target 0 lun 0 (pass4,ada4)
<WDC WD30EFRX-68N32N0 82.00A82>    at scbus5 target 0 lun 0 (pass5,ada5)
<WDC WD30EFRX-68EUZN0 82.00A82>    at scbus6 target 0 lun 0 (pass6,ada6)
<WDC WD40EFZX-68AWUN0 81.00B81>    at scbus7 target 0 lun 0 (pass7,ada7)
<AHCI SGPIO Enclosure 2.00 0001>   at scbus8 target 0 lun 0 (pass8,ses0)


The pool status also shows all but ada3 present, which is to be expected as ada3 is the 'bad' one that got replaced with the new disk currently residing on ada7.

1658232012811.png


To verify further, heres the list of disks in the UI;
1658233181067.png


The highlighted one, on ada3, is the broken drive, its pool is marked as N/A as well, which I assume means its no longer part of the pool and can be safely removed from the system. But when I do, it complains, for some bizarre reason..
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Go to the drive listed as ada3 and 'offline' the drive. Once done the pool should still be healthy. Then power down and disconnect the drive, lastly power on again. That "should" work.
 

Sokonomi

Contributor
Joined
Jul 15, 2018
Messages
115
Go to the drive listed as ada3 and 'offline' the drive. Once done the pool should still be healthy. Then power down and disconnect the drive, lastly power on again. That "should" work.
Strangely enough.. it did! Thank you!
 
Top