Are 2 of My Drives Failed? (See Edit: Moving Data Onto To New Vdev, To Remove Old)

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
What are you doing when you try to detach it?

Are you detaching the second one in the active Spare part of Mirror 6?
 
Joined
Jul 3, 2015
Messages
926
zpool detach PrimaryPool gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Just finished tearing apart my rack and room and reorganizing everything.
I do need to finish figuring out how to properly flash the firmware to my HBA, to make sure I don't run into issues with these sata drives. Beyond that should be good after that to start replacing these vdevs.

What are you doing when you try to detach it?

Are you detaching the second one in the active Spare part of Mirror 6?
Detatching this one gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76 ONLINE 0 0 0
through the GUI
Screenshot at the bottom of my post there, I am clicking 3 dots and clicking "detach" for the one in red da12 under the actual mirror vdev. The exact one I highlighted.

zpool detach PrimaryPool gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76
I have not tried it via shell yet, I'm running a scrub so I will wait and let that finish then try it tomorrow.
Isn't this what the GUI is doing anyways though? Failing to see why this would work, when the GUI is basically running this command, unless it's not?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Isn't this what the GUI is doing anyways though? Failing to see why this would work, when the GUI is basically running this command, unless it's not?
It "should" be doing that... I guess we'll see if that's really the case or not when you try it.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
zpool detach PrimaryPool gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76
It "should" be doing that... I guess we'll see if that's really the case or not when you try it.
It seems to have worked. I will run a scrub once more to ensure it doesn't kick it back into the pool but I think it should be fine.
That's interesting how the GUI couldn't do that. Is it worthy of a bug report?
Code:
# zpool status -v
  pool: PrimaryPool
 state: ONLINE
  scan: scrub repaired 0B in 13:23:22 with 0 errors on Wed Aug 30 11:45:15 2023
config:

        NAME                                            STATE     READ WRITE CKSUM
        PrimaryPool                                     ONLINE       0     0 0
          mirror-0                                      ONLINE       0     0 0
            gptid/d7476d46-32ca-11ec-b815-002590f52cc2  ONLINE       0     0 0
            gptid/d8d6aa36-32ca-11ec-b815-002590f52cc2  ONLINE       0     0 0
          mirror-1                                      ONLINE       0     0 0
            gptid/d9a6f5dc-32ca-11ec-b815-002590f52cc2  ONLINE       0     0 0
            gptid/db71bcb5-32ca-11ec-b815-002590f52cc2  ONLINE       0     0 0
          mirror-2                                      ONLINE       0     0 0
            gptid/d8b2f42f-32ca-11ec-b815-002590f52cc2  ONLINE       0     0 0
            gptid/d96847a9-32ca-11ec-b815-002590f52cc2  ONLINE       0     0 0
          mirror-3                                      ONLINE       0     0 0
            gptid/d9fb7757-32ca-11ec-b815-002590f52cc2  ONLINE       0     0 0
            gptid/da1e1121-32ca-11ec-b815-002590f52cc2  ONLINE       0     0 0
          mirror-4                                      ONLINE       0     0 0
            gptid/9fd0872d-8f64-11ec-8462-002590f52cc2  ONLINE       0     0 0
            gptid/9ff0f041-8f64-11ec-8462-002590f52cc2  ONLINE       0     0 0
          mirror-5                                      ONLINE       0     0 0
            gptid/14811777-1b6d-11ed-8423-ac1f6be66d76  ONLINE       0     0 0
            gptid/0cd1e905-3c2e-11ee-96af-ac1f6be66d76  ONLINE       0     0 0
          mirror-6                                      ONLINE       0     0 0
            gptid/749a1891-1b5c-11ee-941f-ac1f6be66d76  ONLINE       0     0 0
            gptid/c774316e-3c2c-11ee-96af-ac1f6be66d76  ONLINE       0     0 0
        spares
          gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76    AVAIL
          gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76    AVAIL

errors: No known data errors

  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:01:06 with 0 errors on Sat Aug 26 03:46:06 2023
config:
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
First you expand the pool adding a vdev made of the new drives.
I'm still a little shakey on the exact process for adding in the new drives.
Sketches me out because it's not just a simple add a vdev, and also they are larger capacity drives so I'm trying to move the older smaller vdevs onto the new one.

For anyone reading later, again my goal is basically:
- Add in x3 20tb drives as a 3-way mirror to replace a few of the smaller 4tb vdevs
- In the future, I will do this once or twice more to eventually replace all the 4tb vdevs. I just do not have the upfront funds to buy x9 20tb drives here.
- In the meantime, all the 4tb drives I remove, I will re-attach to the remaining 4tb vdevs to turn them from 2-way mirrors, into 3-way mirrors.


Was this the correct process?

1. I have to disable both of the hotspares correct? Otherwise they will kick in when I detach drives.

2. I add in the x3 20tb drives as a NEW vdev
3. I go and click detach on both drives in any of my 2-way 4tb mirrors
4. Before it detaches, It will auto re-silver the data from the 4tb mirror onto the 20tb drives and then it will finish detaching
5. I can now physically remove them
6. Then I will repeat step 3 - 5 for another of the 2-way 4tb mirror.

7. Now the data is on the 20tb drives
8. I can go and attach the 4tb drives I removed, to the leftover vdevs to turn them into 3-way mirrors.
9. Add back in my two 4tb hot spares (or not? see below notes)

My main concerns with that process are:
- It will auto re-silver the data before it detaches in step 3-4 correct? This confuses me most. If I detach one drive, it will remove it. Then the second drive, I assume it will start resilvering. But is there any way to keep both drives in? In case one of the two drives fails during the resilver? Does that make sense?

- When I add my hot spares back in at the end, TrueNAS is smart enough to know that they can only replace another 4tb drive correct? It won't try to replace any of the 20tb drives, correct?
Should I even put them back in as hot spares? Or simply just attach them to my remaining vdevs and make them 3-way mirrors? Seems like that might be the better way to go.

- We confirmed I can remove up to 3 of the 4tb vdevs, but I will only remove 2 so I retain some unused space. Will TrueNAS stop me from detaching drives or something, in case our math was wrong in theory, if it doesn't have enough space to move the data onto the new drives?
For example, if I move 2 vdevs onto the 20tb pool, and then decide I want to move the 3rd, but there isn't actually enough space to do so, does it have safeguards in place to prevent that?
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
This is all I'm kinda waiting on at this point tbh.
Any final help on the transfer would be appreciated. Just want to make sure I don't fuck shit up lol
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
You don't just pull them out you have to [software] detach them.
Unless I am missing something, you just told me you have to detach them. Which I understand, but that doesn't address the concerns around what happens when you detach them, and considering there are 2 drives in each vdev, if I detach one I don't think it will start resilvering onto the new drives until the next drive is detached. Which I asked if its possible to somehow have it keep both drives in while migrating the data to the new drives, to avoid the possibility of the drive dying while resilvering/expanding onto the new drives.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
...Am I missing a post?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
As far as I am aware, which it should have been addressed in this or in the other thread, if you have enough space and you remove a VDEV all the data in that VDEV is migrated into the others.

EDIT: found it. It was on the other thread.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
As far as I am aware, which it should have been addressed in this or in the other thread, if you have enough space and you remove a VDEV all the data in that VDEV is migrated into the others.

EDIT: found it. It was on the other thread.
Ok I just looked at that, it wasn't entirely clear up front and a little mixed in.

So basically I add in the 20tb drives as a new vdev.. then when you guys say "remove" a vdev, I am running detach on both the 2 drives in a single mirror?

But again, my other question remains, is it possible to have it migrate data from both drives at once? Instead of taking one out..
Because if I detach one drive from a mirror, It won't start migrating the data until the 2nd drive is detached (I assume). Is it possible to keep both drives in and utilized, to lower the risk of one of the drives failing while migrating the data?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Is it possible to keep both drives in and utilized, to lower the risk of one of the drives failing while migrating the data?
As far as I know, no.

So basically I add in the 20tb drives as a new vdev.. then when you guys say "remove" a vdev, I am running detach on both the 2 drives in a single mirror?
You should use the WebUI whenever possibile.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
You should use the WebUI whenever possibile.
I think maybe you are misunderstanding my question. I am using the WebUI. My question simply put is making sure I'm clicking "remove" on the whole vdev, and not "detach" on the individual drives in the vdevs.

Anyways, I just went ahead and added in the 20tb drives as a new vdev.
And then clicked "remove" on one of the 4tb vdevs. It said please wait and loaded for a little. Then popped up with a CallError.

Code:
[EFAULT] Failed to wipe disks: 1) da15: [Errno 1] Operation not permitted: '/dev/da15' 2) da9: [Errno 1] Operation not permitted: '/dev/da9'

Error: Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 139, in call_method
    result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1236, in _call
    return await methodobj(*prepared_call.args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 981, in nf
    return await f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool.py", line 1290, in remove
    raise CallError(f'Failed to wipe disks:\n{error_str}')
middlewared.service_exception.CallError: [EFAULT] Failed to wipe disks:
1) da15: [Errno 1] Operation not permitted: '/dev/da15'
2) da9: [Errno 1] Operation not permitted: '/dev/da9'
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Not sure if this error is because it wants it to be run from shell.
But is it not concerning it also says failed to wipe disks?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Not sure if this error is because it wants it to be run from shell.
Naw. It's a failure.


But is it not concerning it also says failed to wipe disks?
Yup, looks like an error so I'd be concerned. It is time for you to generate a bug report I think.

I suspect (this is a guess on my part) that the wipe disk, having being in the 'remove' section, likely was removing the partition data from the spare drive you were trying to remove and then re-establishing the partitions as it needs to. Are da9 and da15 your spare disks now? I ask because your system has changed some and I do not want to assume anything. If they are the spare drives, I would find out if they are good to use now. What if the drives are not setup to actually become a spare again when they need to?

So, I would consider the error important until you find out otherwise.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Naw. It's a failure.



Yup, looks like an error so I'd be concerned. It is time for you to generate a bug report I think.

I suspect (this is a guess on my part) that the wipe disk, having being in the 'remove' section, likely was removing the partition data from the spare drive you were trying to remove and then re-establishing the partitions as it needs to. Are da9 and da15 your spare disks now? I ask because your system has changed some and I do not want to assume anything. If they are the spare drives, I would find out if they are good to use now. What if the drives are not setup to actually become a spare again when they need to?

So, I would consider the error important until you find out otherwise.
The spare drives are still there, but they are not active in any of the vdevs.
I had detached them from the vdevs they were stuck in, and the pool has been healthy.

The spares are da12 and da13.
Nothing has changed with my pool really so far, besides the addition of the 3 new drives as a new vdev.


I'm confused why you state making sure they are good to use now. They show available, and should kick in no issue. Unless I am missing something.

da15 and da9 are the 2 drives in one of the 4tb vdevs, that I clicked "remove" on the WebUI. As @Davvo stated, and I read elsewhere too: by adding a new vdev and then removing the old ones, zfs will move data to the new vdev.
But in this case, when I clicked remove on the webui on the old vdev (da9 & da15), it just said please wait, buffered for a little, then that error popped up. Looked like it tried to wipe the data, instead of moving the data to the other vdevs then wiping it like I expected it to.


I figured this error was TrueNAS simply failing to wipe the data because it is part of an active online pool, and that is simply a safeguard or something to avoid accidentally wiping data.
That being said, I am also surprised it didn't pop up saying the pool would no longer be functional, and all data would be wiped with a confirmation box or something explaining what it was going to do. I just clicked remove and it went right ahead.
Then the error popped.



Code:
# zpool status -v
  pool: PrimaryPool
 state: ONLINE
  scan: scrub repaired 0B in 13:44:54 with 0 errors on Fri Sep  1 01:39:56 2023
remove: Removal of vdev 6 copied 1.39T in 3h55m, completed on Tue Sep  5 16:54:55 2023
        28.2M memory used for removed device mappings
config:

        NAME                                            STATE     READ WRITE CKSUM
        PrimaryPool                                     ONLINE       0     0 0
          mirror-0                                      ONLINE       0     0 0
            gptid/d7476d46-32ca-11ec-b815-002590f52cc2  ONLINE       0     0 0
            gptid/d8d6aa36-32ca-11ec-b815-002590f52cc2  ONLINE       0     0 0
          mirror-1                                      ONLINE       0     0 0
            gptid/d9a6f5dc-32ca-11ec-b815-002590f52cc2  ONLINE       0     0 0
            gptid/db71bcb5-32ca-11ec-b815-002590f52cc2  ONLINE       0     0 0
          mirror-2                                      ONLINE       0     0 0
            gptid/d8b2f42f-32ca-11ec-b815-002590f52cc2  ONLINE       0     0 0
            gptid/d96847a9-32ca-11ec-b815-002590f52cc2  ONLINE       0     0 0
          mirror-3                                      ONLINE       0     0 0
            gptid/d9fb7757-32ca-11ec-b815-002590f52cc2  ONLINE       0     0 0
            gptid/da1e1121-32ca-11ec-b815-002590f52cc2  ONLINE       0     0 0
          mirror-4                                      ONLINE       0     0 0
            gptid/9fd0872d-8f64-11ec-8462-002590f52cc2  ONLINE       0     0 0
            gptid/9ff0f041-8f64-11ec-8462-002590f52cc2  ONLINE       0     0 0
          mirror-5                                      ONLINE       0     0 0
            gptid/14811777-1b6d-11ed-8423-ac1f6be66d76  ONLINE       0     0 0
            gptid/0cd1e905-3c2e-11ee-96af-ac1f6be66d76  ONLINE       0     0 0
          mirror-7                                      ONLINE       0     0 0
            gptid/8ab56673-4c0d-11ee-8b4c-ac1f6be66d76  ONLINE       0     0 0
            gptid/8ab75bbc-4c0d-11ee-8b4c-ac1f6be66d76  ONLINE       0     0 0
            gptid/8aa4f83e-4c0d-11ee-8b4c-ac1f6be66d76  ONLINE       0     0 0
        spares
          gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76    AVAIL
          gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76    AVAIL

errors: No known data errors

  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:01:11 with 0 errors on Sat Sep  2 03:46:11 2023
config:
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I guess all is good then. Good news.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
@Davvo, on which point exactly?
I have no hands-on experience with shrinking mirror stripes, and I think I have no useful contribution to make about the health of @isopropyl 's pool over the many posts from @joeschmuck .
 
Top