need urgent help - disk not recognized after replacement of faulty drive.

vtravalja

Dabbler
Joined
Apr 25, 2022
Messages
27
Hi

We have Dell R740xd2 storage and TruenasCore 12 U8 running on top. Two weeks ago we got information from TrueNAS that one disk on bay 7 has been removed.
Since we have NBD support from Dell, we got new disk next day, and we first set the old one offline and replace that damaged with brand new.
disk7.png


replacement.png

When tried to replace disk I get that I cannot replace drive that drive is busy.

issue.png


How to bypass this and enable new disk so I can bring it online?


Second problem disk on bay 14 also reported as faulty, but this disk is not even showing in the drop-down menu when I try to do replacement?
I even tried to reboot the machine and still the same issue.

Why is da14 not shown? what am I doing wrong?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
I find sometimes you need to force a refresh of the GUI page in the browser (like SHIFT + Click Refresh/Reload)

Otherwise, you could start trying to figure some things out with dmesg, zpool status -v, glabel status and camcontrol devlist
 
Last edited:

vtravalja

Dabbler
Joined
Apr 25, 2022
Messages
27
I find sometimes you need to force a refresh of the GUI page in the browser (like SHIFT + Click Refresh/Reload)

Otherwise, you could start trying to figure some things out with dmesg, zpool status -v, glabel status and caomcontrol devlist
Hi again,

the problem is that GUI looks like it is not responding as it should. Now I cannot even take this disk 7 offline. very weird situation.
Almost like there is a bug in the system.

It simply does not do anything when I select that disk and do offline option, moreover disk 14 cannot be replaced either. not sure what is happening. Can someone help me troubleshoot this?

Btw these are the outputs.

zpool.png


glabel.png


camcontrol.png
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
OK, so from the status output, da14 is already offline, so the GUI is right to have problems to take it offline when in that state already.

If da7 is identified as degraded, that's also equivalent to offline, so again, can't be done as it's already the case.

I would start with a wipe of da7 (you might want to consider replacing da14 and doing that too).

If possible consider a reboot of the whole system as it's better to have the GUI in a functioning state and do everything through there.

Replace the disks in the pool (one at a time is safer, but you can do both at once).
 

vtravalja

Dabbler
Joined
Apr 25, 2022
Messages
27
OK, so from the status output, da14 is already offline, so the GUI is right to have problems to take it offline when in that state already.

If da7 is identified as degraded, that's also equivalent to offline, so again, can't be done as it's already the case.

I would start with a wipe of da7 (you might want to consider replacing da14 and doing that too).

If possible consider a reboot of the whole system as it's better to have the GUI in a functioning state and do everything through there.

Replace the disks in the pool (one at a time is safer, but you can do both at once).
Hi again,

yes, it occurred me to do so, but I noticed that both da7 and da14 do not have wipe option in TrueNAS even though they were marked as offline. so what to do then? (P.S. I know that I am bugging you with this, but I am very new in ZFS so I am not familiar with it that much, plus it is production storage, so quite possibly I can damage it if I try myself.)

I rather would ask experts on that matter
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
so what to do then?
If you can, reboot.

If for some reason that's not an option, we can look at the relatively annoying and complex process to do it from the CLI.
 

vtravalja

Dabbler
Joined
Apr 25, 2022
Messages
27
If you can, reboot.

If for some reason that's not an option, we can look at the relatively annoying and complex process to do it from the CLI.
you'll hate me for that, but I cannot reboot machine. The problem is that it is production so 24/7 plus I am doing replication of live data to another storage which I set so cca. 30-40TB, if there is a possibility to do it via CLI and you have patience, I would be grateful for that.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
Well, one last attempt before CLI method... restart the middleware.

service middlewared restart
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
OK, then we're going to start with da7...

First, we'll need to wipe and partition it...

gpart destroy -F /dev/da7

Then create the new partition scheme with 2GB for SWAP (standard in the GUI, so I assume the same for your other disks) and the rest for Pool data:

gpart create -s gpt /dev/da7

gpart add -s 2G -t freebsd-swap /dev/da7

gpart add -t freebsd-zfs /dev/da7

Once you have the partitions, you need the ID of the 2nd of them... the following command will list both (and only 2... there is a problem if you see more or fewer), you will use the second of them in the following steps:

gpart list da7 | grep rawuuid

so we should be able to now replace the missing drive with the gptid we just made...

zpool replace ZFS da7p2 gptid/<insert the rawuuid from the second partition here> (should look something like this... 1b28c891-69c9-11ea-b524-5d55812e7307)

Let's pause there and see how we did. I anticipate it could be difficult about replacing what it thinks is da7p2 with a partition gptid that's effectively da7p2, but we'll have to see as we have no other reference for it for now.

zpool status -v
 

vtravalja

Dabbler
Joined
Apr 25, 2022
Messages
27
sorry incomplete command: "gpart delete -i 1 da7" and "gpart delete -i 2 da7" as according to the gpart there are two partitions.

partitions.png
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
Don't I need to delete partition first with "gpart delete /dev/da7"?
No, the -F handles that and should force it to complete...

Perhaps we still need to set it offline...

zpool offline ZFS da7p2
 

vtravalja

Dabbler
Joined
Apr 25, 2022
Messages
27
No, the -F handles that and should force it to complete...

Perhaps we still need to set it offline...

zpool offline ZFS da7p2
Hi,

it is really annoying that nothing responds.
so here is the output:
annoying.png


but what is interested, it should finally make disk offline as well, right? yet I can see still as degraded in the pool

disk-running.png
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
OK, it can be that the SWAP is also on da7 (in da7p1), which is something the GUI would handle nicely.

Let's check that:
swapctl -lh

and

geom -t
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
OK, so da7 has part of swap3.

We can stop that with:

swapoff /dev/mirror/swap3

If that works, we can go back to the process.
 
Top