Pool disk replacement error

sboutros

Dabbler
Joined
Sep 8, 2017
Messages
14
I am trying to replace a disk in one of my pools.
Following instructions per - https://www.truenas.com/docs/core/coretutorials/storage/disks/diskreplace/
@ "Bringing a New Disk Online" and I get an error with the "Replace" step both with and without the Force checkbox checked.

Host is a Dell R710 running TrueNAS-13.0-U5.3
New disk was to replace da4 - now shows up as da5
Old disk: Seagate Ironwolf NAS - 4TB
New disk: Seagate Exos 10TB, shows up as 0B in web UI

Any insights to help with this?


Code:
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/disk_/format.py", line 11, in format
    size = get_size_with_name(disk)
  File "bsd/disk.pyx", line 37, in bsd.disk.get_size_with_name
  File "bsd/disk.pyx", line 38, in bsd.disk.get_size_with_name
  File "bsd/disk.pyx", line 48, in bsd.disk.get_size_with_file
FileNotFoundError: [Errno 2] No such file or directory
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
One person had this issue and it was resolved by wiggling the drive data cable. It's worth a try, or just replace the data cable, try a different SATA port, etc. If that doesn't work, continue to the steps below, maybe they will help diagnose the issue.

Is your Seagate Exos 10TB drive new/unused or was it used before?

What is the output of smartctl -a /dev/da5 for the new drive, and geom disk list da5 ?
 

sboutros

Dabbler
Joined
Sep 8, 2017
Messages
14
I'll deff try the wiggle, thank you.

Outputs
Code:
root@freenas[~]# smartctl -a /dev/da5
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST10000NM002G
Revision:             E003
Compliance:           SPC-5
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500d8fef253
Serial number:        ZS51CANL0000C2307BD5
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Sat Oct  7 23:06:38 2023 EDT
device is NOT READY (e.g. spun down, busy)
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
root@freenas[~]#
root@freenas[~]#
root@freenas[~]#
root@freenas[~]#
root@freenas[~]#
root@freenas[~]#
root@freenas[~]#
root@freenas[~]# geom disk list da5
Geom name: da5
Providers:
1. Name: da5
   Mediasize: 0 (0B)
   Sectorsize: 512
   Mode: r0w0e0
   descr: SEAGATE ST10000NM002G
   lunid: 5000c500d8fef253
   ident: ZS51CANL0000C2307BD5
   rotationrate: 7200
   fwsectors: 0
   fwheads: 0

 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Your smart results don't exactly look right. Oh, but its SAS.
I still don't like the "device is NOT READY (e.g. spun down, busy)"
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Agreed, the SMART data is not looking good.
I'll deff try the wiggle, thank you.
When you do this, while I didn't specify to power off your machine, I'll say it now. Anytime you are messing with an electrical connection power should be secured.

My first step at this point would be to remove the drive and place it into a different computer to ensure it is working fine.

I've searched the internet and found several postings about this type of problem however some of the solutions looked like pure luck, and nothing was the same solution. If I had found one that looked reasonable, I'd point you to it but I also prefer to not give advice that I had not tried myself or I reasonably expect to not cause harm. It went from formatting the drive to the 3.3v pin issue to special commands to force the drive out of 'STOP' mode. I don't imagine 'STOP' mode is a default for a new drive, it looks like it must be intentionally set by the end user.

So, verify the hard drive works in another computer, if it does, and you haven't already done so, try the drive in another drive bay. You can do the last part first if you like, nothing to really lose there.

Can you tell if the drive is spinning? That might help someone provide some better advice.
 

zubair.zafar

Dabbler
Joined
Mar 13, 2024
Messages
17
Agreed, the SMART data is not looking good.

When you do this, while I didn't specify to power off your machine, I'll say it now. Anytime you are messing with an electrical connection power should be secured.

My first step at this point would be to remove the drive and place it into a different computer to ensure it is working fine.

I've searched the internet and found several postings about this type of problem however some of the solutions looked like pure luck, and nothing was the same solution. If I had found one that looked reasonable, I'd point you to it but I also prefer to not give advice that I had not tried myself or I reasonably expect to not cause harm. It went from formatting the drive to the 3.3v pin issue to special commands to force the drive out of 'STOP' mode. I don't imagine 'STOP' mode is a default for a new drive, it looks like it must be intentionally set by the end user.

So, verify the hard drive works in another computer, if it does, and you haven't already done so, try the drive in another drive bay. You can do the last part first if you like, nothing to really lose there.

Can you tell if the drive is spinning? That might help someone provide some better advice.
We're utilizing TrueNAS application version 13, and recently encountered a situation where one of our hard disks failed. Subsequently, we removed the faulty disk and replaced it with a new one of identical capacity. Despite attempting to initiate scrubbing, the new disk failed to rebuild, leaving the disk status degraded. Our RAID-5 configuration was set up through the BIOS. We urgently seek assistance in resolving this issue without risking data loss. Kindly provide a solution so that we can promptly address this matter.
 

zubair.zafar

Dabbler
Joined
Mar 13, 2024
Messages
17
I am trying to replace a disk in one of my pools.
Following instructions per - https://www.truenas.com/docs/core/coretutorials/storage/disks/diskreplace/
@ "Bringing a New Disk Online" and I get an error with the "Replace" step both with and without the Force checkbox checked.

Host is a Dell R710 running TrueNAS-13.0-U5.3
New disk was to replace da4 - now shows up as da5
Old disk: Seagate Ironwolf NAS - 4TB
New disk: Seagate Exos 10TB, shows up as 0B in web UI

Any insights to help with this?


Code:
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/disk_/format.py", line 11, in format
    size = get_size_with_name(disk)
  File "bsd/disk.pyx", line 37, in bsd.disk.get_size_with_name
  File "bsd/disk.pyx", line 38, in bsd.disk.get_size_with_name
  File "bsd/disk.pyx", line 48, in bsd.disk.get_size_with_file
FileNotFoundError: [Errno 2] No such file or directory
We're utilizing TrueNAS application version 13, and recently encountered a situation where one of our hard disks failed. Subsequently, we removed the faulty disk and replaced it with a new one of identical capacity. Despite attempting to initiate scrubbing, the new disk failed to rebuild, leaving the disk status degraded. Our RAID-5 configuration was set up through the BIOS. We urgently seek assistance in resolving this issue without risking data loss. Kindly provide a solution so that we can promptly address this matter.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
RAID 5 through the BIOS - thats a big no no.
The solution is to get the data off the pool, trash the pool and start again using software RAID. ie make sure that trueNAS sees each disk as an individual disk and then build a RAID using ZFS, not the system BIOS.

You have unfortunately broken one of the fundamental rules of ZFS which is to let ZFS handle the disks. Your only choice at this point is to backup, trash and rebuild - do NOT use the BIOS to rebuild the array.
 

zubair.zafar

Dabbler
Joined
Mar 13, 2024
Messages
17
RAID 5 through the BIOS - thats a big no no.
The solution is to get the data off the pool, trash the pool and start again using software RAID. ie make sure that trueNAS sees each disk as an individual disk and then build a RAID using ZFS, not the system BIOS.

You have unfortunately broken one of the fundamental rules of ZFS which is to let ZFS handle the disks. Your only choice at this point is to backup, trash and rebuild - do NOT use the BIOS to rebuild the array.
As a result of this activity, I'll need to discard all the data on the True NAS, which I'm unable to do. Are there any alternative solutions that might be effective without loosing data?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
RAID 5 through the BIOS - thats a big no no.
Yes, that caught my eye. Not good.

As a result of this activity, I'll need to discard all the data on the True NAS, which I'm unable to do. Are there any alternative solutions that might be effective without loosing data?
I think @NugentS explained the proper path forward. Backup your data and then correct your configuration issue.
 

zubair.zafar

Dabbler
Joined
Mar 13, 2024
Messages
17
Yes, that caught my eye. Not good.


I think @NugentS explained the proper path forward. Backup your data and then correct your configuration issue.
Our operations run daily and cannot afford downtime. All iSCSI connections are established through this NAS. If I choose to take a pool offline, unplug that disk, refresh the page, and then plug the disk back in after a minute, the pool might come back online. Do you think this could resolve the issue?
 

zubair.zafar

Dabbler
Joined
Mar 13, 2024
Messages
17
Oh, I forgot to mention one more thing. I've set up RAID-5 on both the TrueNAS application and in the BIOS, so RAID-5 is configured on both sides.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
That's all nice, but you know what's worse than planned outages? Unplanned outages. Your (not unreasonable) requirements don't make reality bend such that they are met, so you're stuck with reality.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Our operations run daily and cannot afford downtime.
If this is a company venture, then purchase another system, a new system is less expensive to losing work, and you have no downtime. Whoever setup your TrueNAS system really messed up in a huge way.

I've set up RAID-5 on both the TrueNAS application and in the BIOS
That makes no sense. TrueNAS uses ZFS, not RAID. You must be thinking of RAIDZ which is perfectly fine. Also, are you certain you have a RAID setup in the BIOS? Or is this a software RAID that only works with Windows, or a few OSs with a driver?
BIOS RAID

BIOS RAID (also known as"quasi-hardware RAID") is a form of software RAID for which the RAID configuration is managed in part or in full by the storage controllers' BIOS but which is not true hardware RAID. For BIOS RAID to work, a specific driver is needed at the operating system level.

BIOS RAID is often nicknamed "fake RAID" as it is easily mistaken for hardware RAID and vendor of controllers that offer BIOS RAID often do little to educate buyers about the fact that it is not hardware RAID and that specific OS drivers are necessary for its RAID functionality to function. A somewhat more neutral way of describing it would be "BIOS-assisted software RAID".

We have absolutely no idea what your hardware is nor your configuration and what data capacity you have. For all we know you populated your system with SMR drives (oh yes, it does happen in spite of the warnings). So a "complete" list of hardware (the power supply is not a big deal at this point) and we might be able to help you out.

For argument sake: Let's say you did setup 6 hard drives in the BIOS as a RAID5, and your total storage is 10TB. And your motherboard has 10 SATA ports. You could purchase two 14TB CMR hard drives and connect those to your system, do not do anything in the BIOS. When the system comes up, create a pool MIRROR with the two new drives. Now copy the data over (I'm not sure if Replicate or what is the best here). Once all the data is copied over you can go into the BIOS and remove the drives from the RAID5 setup, then reboot and honestly I would not be surprised if the system comes up normally. Why? Because the BIOS RAID setup probably did nothing to the drives, but I could be wrong, hence doing a mirrored backup first. If it comes up normally then you are good and can remove the mirror. Lastly fix the drive replacement issue you have.
 

zubair.zafar

Dabbler
Joined
Mar 13, 2024
Messages
17
If this is a company venture, then purchase another system, a new system is less expensive to losing work, and you have no downtime. Whoever setup your TrueNAS system really messed up in a huge way.


That makes no sense. TrueNAS uses ZFS, not RAID. You must be thinking of RAIDZ which is perfectly fine. Also, are you certain you have a RAID setup in the BIOS? Or is this a software RAID that only works with Windows, or a few OSs with a driver?


We have absolutely no idea what your hardware is nor your configuration and what data capacity you have. For all we know you populated your system with SMR drives (oh yes, it does happen in spite of the warnings). So a "complete" list of hardware (the power supply is not a big deal at this point) and we might be able to help you out.

For argument sake: Let's say you did setup 6 hard drives in the BIOS as a RAID5, and your total storage is 10TB. And your motherboard has 10 SATA ports. You could purchase two 14TB CMR hard drives and connect those to your system, do not do anything in the BIOS. When the system comes up, create a pool MIRROR with the two new drives. Now copy the data over (I'm not sure if Replicate or what is the best here). Once all the data is copied over you can go into the BIOS and remove the drives from the RAID5 setup, then reboot and honestly I would not be surprised if the system comes up normally. Why? Because the BIOS RAID setup probably did nothing to the drives, but I could be wrong, hence doing a mirrored backup first. If it comes up normally then you are good and can remove the mirror. Lastly fix the drive replacement issue you have.
my ex-colleague initially referred to the setup as RAID-5, but to clarify, let me elaborate. I'm utilizing a Dell Z400 machine where my ex-colleague initially configured RAID-5 via BIOS. Subsequently, TrueNAS was installed, and ZFS was configured. We're employing a total of 8 SAS hard disks, each with a capacity of 2TB, resulting in a usable space of 14TB. However, one of the hard disks failed, prompting me to replace it during pool operation. Despite attempting to rebuild the pool through extension or scrubbing, it remains in a degraded state. Even after rebooting the machine, the issue persists. My inquiry is whether removing and reinserting the hard disk, followed by bringing it back online via the offline option, would resolve the issue, or if there's another course of action I should pursue.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Post the entire output of the command zpool status -v in code brackets. Depending on what is specifically reported, I can tell you the next step. If you get this within the next 30 minutes before I go to bed, you can have that information right away. But I will not guess, I need the data to provide you the correct answer. Keep in mind, we are not paid to provide help on this forum, we just like to help and educate others.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Our operations run daily and cannot afford downtime. All iSCSI connections are established through this NAS. If I choose to take a pool offline, unplug that disk, refresh the page, and then plug the disk back in after a minute, the pool might come back online. Do you think this could resolve the issue?
No

You have broken a cardinal rule of ZFS, and are having issues as a result.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
"I've set up RAID-5 on both the TrueNAS application and in the BIOS"

I agree - that makes no sense. As @joeschmuck says please post complete and detailsed hardware - so we know exactly what you have
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
You could purchase two 14TB CMR hard drives and connect those to your system, do not do anything in the BIOS. When the system comes up, create a pool MIRROR with the two new drives. Now copy the data over (I'm not sure if Replicate or what is the best here).
Very good idea. On a side note, though this is nothing I play around with: if you are using zvols (because of iSCSI) you would need to replicate them.

I don't want to pour salt, but if you don't schedule maintenance your equipment will schedule it for you is a saying at my place.
That beeing said, are you operating 24/7? Bite the bullet and do the maintenance during the night or on the weekend.

I'm pretty confident that what Joe suggested is the best course of action right now.

For the future, I'm not sure if that's what Joe meant, purchase a second system. From what I read here and there TrueNAS should offer high availability options.

 

zubair.zafar

Dabbler
Joined
Mar 13, 2024
Messages
17
Post the entire output of the command zpool status -v in code brackets. Depending on what is specifically reported, I can tell you the next step. If you get this within the next 30 minutes before I go to bed, you can have that information right away. But I will not guess, I need the data to provide you the correct answer. Keep in mind, we are not paid to provide help on this forum, we just like to help and educate others.
I have attached screenshot when i ran this command. late checked your message.
 

Attachments

  • truenas screenshort.JPG
    truenas screenshort.JPG
    47.2 KB · Views: 39
Top