Pool disk replacement error

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
1) For the corrupt file, see this link:
2) Once the corrupt file has been removed then run a scrub zpool scrub OTNAS
3) Once the scrub completes (6 hours and 20 minutes later), run zpool status -v and see if you have any files with errors. If you do have errors, post a screen capture of that and wait for assistance.
4) IF there were no new file errors and you only have the CKSUM errors, run zpool clear OTNAS and then repeat the command in step 3. With any luck you will no longer have any errors.

So back to the BIOS RAID5... If this pool is made from this RAID5 set of drives, I will have to say that it looks like you do have a real RAID setup as TrueNAS only sees one drive. Fixing this is easy, but not fast, nor cheap and does require you to copy your data to another system.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Just to emphasise - your only solution here is to redo the pool from scratch. Your ex-collegue has screwed the pooch with this install - you are running on borrowed time and your data is at risk. Urgent action is required or all the data could vanish.

Do you have a backup?

As you haven't posted your hardware - we don't know if its even possible with the existing hardware. You clearly have a RAID controller - we need to know which one - so advice can be offered. Basically we need to know what you server hardware is, even down to what the HDD's are connected to, motherboard, memory, case, RAID controller, Make and model of HDD's, even PSU. Do you even have any spare disk slots?

Only then can we advise in anything more than generalities
 

zubair.zafar

Dabbler
Joined
Mar 13, 2024
Messages
17
Just to emphasise - your only solution here is to redo the pool from scratch. Your ex-collegue has screwed the pooch with this install - you are running on borrowed time and your data is at risk. Urgent action is required or all the data could vanish.

Do you have a backup?

As you haven't posted your hardware - we don't know if its even possible with the existing hardware. You clearly have a RAID controller - we need to know which one - so advice can be offered. Basically we need to know what you server hardware is, even down to what the HDD's are connected to, motherboard, memory, case, RAID controller, Make and model of HDD's, even PSU. Do you even have any spare disk slots?

Only then can we advise in anything more than generalities
I previously shared some hardware details, but here is the complete hardware information. I trust this provides sufficient detail.

Dell PowerEdge T610
CPU: Intel(R) Xeon(R) CPU E5645 @2.40Ghz
Memory 16gb
Hardisk: SAS 3.5 HDD
(2tbx8)
 

zubair.zafar

Dabbler
Joined
Mar 13, 2024
Messages
17
1) For the corrupt file, see this link:
2) Once the corrupt file has been removed then run a scrub zpool scrub OTNAS
3) Once the scrub completes (6 hours and 20 minutes later), run zpool status -v and see if you have any files with errors. If you do have errors, post a screen capture of that and wait for assistance.
4) IF there were no new file errors and you only have the CKSUM errors, run zpool clear OTNAS and then repeat the command in step 3. With any luck you will no longer have any errors.

So back to the BIOS RAID5... If this pool is made from this RAID5 set of drives, I will have to say that it looks like you do have a real RAID setup as TrueNAS only sees one drive. Fixing this is easy, but not fast, nor cheap and does require you to copy your data to another system.
Alright, I'll attempt to follow these instructions and will share an update with you soon.
 

zubair.zafar

Dabbler
Joined
Mar 13, 2024
Messages
17
1) For the corrupt file, see this link:
2) Once the corrupt file has been removed then run a scrub zpool scrub OTNAS
3) Once the scrub completes (6 hours and 20 minutes later), run zpool status -v and see if you have any files with errors. If you do have errors, post a screen capture of that and wait for assistance.
4) IF there were no new file errors and you only have the CKSUM errors, run zpool clear OTNAS and then repeat the command in step 3. With any luck you will no longer have any errors.

So back to the BIOS RAID5... If this pool is made from this RAID5 set of drives, I will have to say that it looks like you do have a real RAID setup as TrueNAS only sees one drive. Fixing this is easy, but not fast, nor cheap and does require you to copy your data to another system.
After running the "zpool status -x -v" command, the subsequent message appeared. As previously discussed, I simply need clarification on the method you have provided. Should I adhere to these commands based on my understanding?

service collectd onestop

rm /var/db/system/rrd-b1e73a61b826437295342114c13683d0/localhost/cpu-19/cpu-idle.rrd

service collectd onestart
 

Attachments

  • truenas screenshort.JPG
    truenas screenshort.JPG
    47.2 KB · Views: 41

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
I previously shared some hardware details, but here is the complete hardware information. I trust this provides sufficient detail.

Dell PowerEdge T610
CPU: Intel(R) Xeon(R) CPU E5645 @2.40Ghz
Memory 16gb
Hardisk: SAS 3.5 HDD
(2tbx8)
No it isn't
Although the HDD's being SAS tends to imply they aren't SMR (something to watch out for)

What are the HDD's plugged into - my assumption is that its probably a PERC controller of some kind - the question is - what model - this is probably the most important information and its missing
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
service collectd onestop

rm /var/db/system/rrd-b1e73a61b826437295342114c13683d0/localhost/cpu-19/cpu-idle.rrd

service collectd onestart
Yes, that looks correct. And once done, perform the other steps I outlined.
 

zubair.zafar

Dabbler
Joined
Mar 13, 2024
Messages
17
Yes, that looks correct. And once done, perform the other steps I outlined.
Thank you! One additional matter I need clarification on is whether it's essential to back up the entire NAS. The sheer size, almost 15TB, along with all disks being connected to VM via iSCSi, makes the backup process quite arduous. Given this, do you think it's necessary to undertake a preparatory activity before initiating the backup process? I don't anticipate any disruption to the data, but I'd appreciate your thoughts on whether it's advisable. If there's no anticipated disruption, I'm inclined to proceed immediately, perhaps with a moment of prayer for success.
 

zubair.zafar

Dabbler
Joined
Mar 13, 2024
Messages
17
No it isn't
Although the HDD's being SAS tends to imply they aren't SMR (something to watch out for)

What are the HDD's plugged into - my assumption is that its probably a PERC controller of some kind - the question is - what model - this is probably the most important information and its missing
I'll provide you with a snapshot of the old hard disk shortly, and then you'll receive complete hardware information.
 

zubair.zafar

Dabbler
Joined
Mar 13, 2024
Messages
17
I'll provide you with a snapshot of the old hard disk shortly, and then you'll receive complete hardware information.
Here is a snapshot of the old hard disk I previously used before replacing it with a new one.
 

Attachments

  • TrueNAS Hardisk.jpg
    TrueNAS Hardisk.jpg
    87.5 KB · Views: 41

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
That is a CMR drive, Good!
One additional matter I need clarification on is whether it's essential to back up the entire NAS.
While you should have a backup of the important data, to do the steps outlined, it is not required at all.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
WTF has the hard disk got to do with the RAID controller?

I/We need to know what the RAID controller is - make and model as that way we can tell you if you need to replace it (or flash it) before rebuilding the disk array

You need to be able to backup everything that you want to keep. Do that now as the array is at risk.

Do not however trash everything and start again because - WE NEED TO KNOW WHAT THE BLOODY RAID CONTROLLER is. If it is the wrong type and you rebuild the array on it - then you will probably just have to repeat the process again

Also - you say you have 2 * 8TB disks, along with 16TB of data. Being kind - I might assume you had three disks in a RAID5 configuration (but the array would be very full), dangerously so), in your hardware spec you said you have two. So please list your complete hardware spec otherwise you are wasting your, my and everyone else here's time
 

zubair.zafar

Dabbler
Joined
Mar 13, 2024
Messages
17
WTF has the hard disk got to do with the RAID controller?

I/We need to know what the RAID controller is - make and model as that way we can tell you if you need to replace it (or flash it) before rebuilding the disk array

You need to be able to backup everything that you want to keep. Do that now as the array is at risk.

Do not however trash everything and start again because - WE NEED TO KNOW WHAT THE BLOODY RAID CONTROLLER is. If it is the wrong type and you rebuild the array on it - then you will probably just have to repeat the process again

Also - you say you have 2 * 8TB disks, along with 16TB of data. Being kind - I might assume you had three disks in a RAID5 configuration (but the array would be very full), dangerously so), in your hardware spec you said you have two. So please list your complete hardware spec otherwise you are wasting your, my and everyone else here's time
WTF has the hard disk got to do with the RAID controller?

I/We need to know what the RAID controller is - make and model as that way we can tell you if you need to replace it (or flash it) before rebuilding the disk array

You need to be able to backup everything that you want to keep. Do that now as the array is at risk.

Do not however trash everything and start again because - WE NEED TO KNOW WHAT THE BLOODY RAID CONTROLLER is. If it is the wrong type and you rebuild the array on it - then you will probably just have to repeat the process again

Also - you say you have 2 * 8TB disks, along with 16TB of data. Being kind - I might assume you had three disks in a RAID5 configuration (but the array would be very full), dangerously so), in your hardware spec you said you have two. So please list your complete hardware spec otherwise you are wasting your, my and everyone else here's time
Here are the details of the RAID Controller: "DELL POWEREDGE T610 DELL PERC 6I SAS SATA RAID CONTROLLER." Additionally, I have a total of eight hard disks, each with a capacity of 2TB, as mentioned previously. What further information do you need? I want to be mindful of your time and appreciate your assistance.
 

zubair.zafar

Dabbler
Joined
Mar 13, 2024
Messages
17
That is a CMR drive, Good!

While you should have a backup of the important data, to do the steps outlined, it is not required at all.
Alright, I'll back up the entire NAS before proceeding with this task. However, I'm concerned that data might be lost during this activity. It's essential to take precautionary measures to prevent any loss, leaving no room for doubt.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
"Here are the details of the RAID Controller: "DELL POWEREDGE T610 DELL PERC 6I SAS SATA RAID CONTROLLER." Additionally, I have a total of eight hard disks, each with a capacity of 2TB, as mentioned previously."

Note that I am not an expert on Dell Hardware - others here do know more than I.

According to reddit (various search results) the Perc 6/i cannot be flashed to IT mode and will need to be replaced with a more suitable card - this does not need to be expensive) - but you cannot do this and keep the data on the disks

I do not know the dell hardware - but my understanding is that a Dell H710 can be a drop in replacement, and can be flashed to IT Mode which makes the card act like an HBA. Note that if you wish to continue to use TrueNAS reliably then this is not an optional step.

So:
1. Backup Data - do this anyway and keep the backup updated
2. Replace RAID Controller
3. Ensure RAID Controller is flashed with suitable firmware
4. Boot TrueNAS and create ZFS RAID Array using TrueNAS to do this
5. Restore Data

Note that for 16TB of data I would be "budgeting" 3-4 days of actual downtime minimum. If you cannot afford this downtime then your only option is a "new" server specced with a suitable HBA / Disks etc. You can then replicate data to this server (hoping you don't lose the array in the meantime) and then switch functionality over to the "new" server which will then enable you to fix your ex-collegue's screwup with some (relatively minimal) downtime (I suspect a few hours) - however it will take a week or so (guesstimate) to get to that stage as you will need to run multiple replications at ever decreasing intervals. Afterwards you can re-spec the old server correctly and use it as a backup to the new "primary" server

Note that copying 16TB across a 1Gb link will take approx 1 Day, 16 hours or so each way. You don't mention having 10Gb available so I am assuming 1Gb only. 10Gb will take 4 hours - assuming that whatever you are copying data to will actually write at that speed. Expect things to take longer

@joeschmuck has told you how to deal with the corrupted file. Its part of the reporting data and its thus not terribly important as these things go. You might be able to deal with it by moving the system dataset to the boot pool but I have no idea what you are booting from, it could be a virtual disk thats part of the hardware RAID - not good.

Oh, and apologies - I misread your disk configuration as 2*8TB rather than 8*2TB

To replace the disk in the existing array, you will probably need to down the server and work out from the Perc 6/i BIOS which disk needs replacing, replace it, resync the array and then reboot TrueNAS. Do not trust TrueNAS to tell you which drive has failed
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I echo @NugentS statement
If you cannot afford this downtime then your only option is a "new" server specced with a suitable HBA / Disks etc. You can then replicate data to this server (hoping you don't lose the array in the meantime) and then switch functionality over to the "new" server which will then enable you to fix your ex-collegue's screwup with some (relatively minimal) downtime (I suspect a few hours) - however it will take a week or so (guesstimate) to get to that stage as you will need to run multiple replications at ever decreasing intervals. Afterwards you can re-spec the old server correctly and use it as a backup to the new "primary" server.

If this is a company asset then I still recommend the company purchase a new server to replace the currently mis-configured system. If zero downtime is that important, that is the only way I can see doing it. It is a very practical solution, for a business.

I know this isn't the answer you wanted to hear however it is the honest answer and most correct given your circumstance.
 

zubair.zafar

Dabbler
Joined
Mar 13, 2024
Messages
17
That is a CMR drive, Good!

While you should have a backup of the important data, to do the steps outlined, it is not required at all.
I attempted to implement that approach, and although no errors were produced, I continue to encounter the same issue. Even prior to executing this procedure, our NAS exhibited instability at specific time. During these instances, the NAS ping responded, yet we were unable to add or remove anything from the iSCSI drives. Upon inspecting the NAS interface, I noticed that pressing the Enter key resulted in no action; instead, the page simply scrolled upwards blankly. As a result, I had to forcefully reboot the machine to restore normalcy. This pattern persisted, with the NAS functioning smoothly until the recurrence of the issue at the same intervals. Notably, we have not configured any cronjobs. Please find the attached screenshot for reference.
 

Attachments

  • truenas error.jpg
    truenas error.jpg
    113.6 KB · Views: 19

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
This should have been fixed, done and dusted by now. And if the NAS is flakey - then replace it - its your safest migration path anyhow

Your pool is screwed, you can't fix it by faffing around
 
Top