Help needed salvaging data from pool to another drive.

Joined
Feb 2, 2024
Messages
8
Hi all.
I'm pretty new to Truenas but I have been able to turn an old Del server into a truenas device, flashing the controller card and running a fan script to keep the wind tunnel at bay.

All been running great until yesterday when the shares dropped out during moving files between folders and I got the dreaded boot loop.

I was right in the middle of organising my files and creating backups when this happened too. I wanted the Nas to keep a backup of all my software instruments.

My truenas is set up as:

8 3tb drives with 4 mirrored pairs as one pool. So I basically have half usable capacity as I wanted to have some redundancy.

Where I am at.

I have to unplug two specific drives in order for truenas to boot. The pool is then missing.
I can then import the pool back in but only in read-only mode, once I plug the two drives back in.

The shares appear and are active in truenas but the network folders are just empty.

It looks like two drives on the same mirrored pair has errors.

I'm struggling to find out how exactly to go about copying all the data off the pool onto another external drive or device so I can retry creating a new pool and copy back incase the drives are actually fine. I'd like to be able to copy the data in small chunks instead of all as one if that's possible.

Also just wanted to ask if I replace the two drives using the replacement drives function will it just copy the pool errors over and still not work?

I'm still very new to all the terminology that is used with true as so go easy on me.

Regards
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
Can you list your hardware in detail please? Especially since you mentioned a controller card.

What drives do you need to remove?

What is the output of zpool status? You need to enter that in the shell.
 
Joined
Feb 2, 2024
Messages
8
Can you list your hardware in detail please? Especially since you mentioned a controller card.

What drives do you need to remove?

What is the output of zpool status? You need to enter that in the shell.
My hardware is

Repurposed Del power edge t320

Xeon e5-2440 V2
48 GB ram
8 3gb del enterprise drives
2.5 GB network card.
H310 controller card flashed to allow truenas

Two of the drives, which are a mirrored pair within the pool showed errors and when installed cause a boot loop. I have to remove both in order to get truenas to boot and allow me into the gui.

I then have to push them back in and import the pool as read-only.

I obviously then can't access the shares over the network.
 

Attachments

  • IMG_20240202_161601.jpg
    IMG_20240202_161601.jpg
    108.8 KB · Views: 50
Joined
Feb 2, 2024
Messages
8
The errors seem to be checksum errors. It is only showing errors on one drive in the image but it showed both before.

I have a feeling it may have been the controller card overheating maybe at the time and probably not the drives failing. It was during moving large files between directories on the nas.
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
Even a failed pool should not put you into a boot loop. Can you elaborate on that? What is the output of the console during that boot loop?

I obviously then can't access the shares over the network.
Actually that's not obvious, you have pool which is online. You should be able to access it.

You could run some s.m.a.r.t. tests and check for errors on the drives. CRC errors like that can be likely due to the controller / cables. Can change you cables / connect the drives directly?
I assume you don't have a spare HBA card lying around. HBAs by LSI are often recommended around here.

From my personal point of view you have two choices, get another HBA if swapping the cables does not work and wait until it arrives to get off the data or you get off the data now and investigate further. I'm not sure how trustworthy the data pulled off the drives will be if the connection via HBA introduces CRC errors. If possible I'd try with new cables / HBA if I was you.
 
Joined
Feb 2, 2024
Messages
8
The bootloop triggers, I guess, as it is trying to trigger the pool.
If I import the pool from the gui manually after removing the two drives in question, in regular write mode, it immediately reboots the server.

If I do want to just try and recover the data how would I go about that?

I thought I. Read-only mode the shares would not be available and you can only copy content using commands to a connected drive, say via usb.

If I could get the shares to appear on the network while the pool was in read only that would be helpful. As I don't need to write anything at the moment.
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
The bootloop triggers, I guess, as it is trying to trigger the pool.
Get video output during booting and show us the display.

If I import the pool from the gui manually after removing the two drives in question, in regular write mode, it immediately reboots the server.
I would need to guess which one, but have you looked into the logs? Probably /var/log/syslog.

If I could get the shares to appear on the network while the pool was in read only that would be helpful. As I don't need to write anything at the moment.
You need to provide information for us to work with. I'm not the most knowledgeable user around here, others may know something I don't. But your pool is online, in read only mode, but online. We still don't know why this prohibits you from accessing it via network. surely there's some error message or the likes related to it.

If I do want to just try and recover the data how would I go about that?
Again, just my personal opinion, make sure you don't have any hardware issues before attempting to rescue data. But if you do want to proceed now, either look into why the shares are not working or as you mentioned connect another drive via USB and copy the data. You can do it via shell, you can access your pool via /mnt/Store/ and at least cp is available.
 
Joined
Feb 2, 2024
Messages
8
I can't even access the pool via /mmt/Store/ in the shell using read-only mode it just says access denied.

What ever failed just seems to have totally mucked up the pool. No settings have changed so something has just failed and all with very light use. The ease of use of the network Nas was too good to be true.

I may have to try a new hba card but this all looks like a lost cause especially if I can't even copy data from it.
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
Is the typo only on your post or also in your command?

Use the code tags and show us your actual input and the actual output. Did you try with elevated privileges?

Also, can you specify how exactly you flashed the card? I can't promise you anything but I'm quite optimistic that this issue is related to the HBA or your cabling.

Did you swap / reseat the cables?

What power supply do you have?
 
Joined
Feb 2, 2024
Messages
8
Yeah it's a typo just on on the post.

I unplugged everything and reseated the cards and cables.

There are two power supplies in the system I've used both. I'm away from system now to tell you exactly what they are.


How do I elevate the privileges within the shell?

Cheers for all your replies,
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
sudo for elevated privileges.

Okay, waiting on the input on the PSUs. CRC errors may also originate from an insufficient power source. You don't happen to have a spare PSU (with enough power, I'm thinking 500W or more at least) lying around?

You still need to explain what exactly you flashed on that card. This is certainly nothing I have experience in.
This thread discusses confirming the right version. Have a look at that plesse.

You could also install a fresh truenas instance on an USB drive and see if you can boot that without disconnecting the drives first.

Unfortunately this is where my experience/ideas end. I wouldn't know who to ping for that.

So let's see if we can get info on the PSUs and check your firmware and work from there.
 
Joined
Feb 2, 2024
Messages
8
The system has two 495w power supplies.

When I flashed the card. I followed a guide specifically for my card and used the correct version files. I've tried to look on my main system to find the files used but I can't see them. The process worked first time.

This was all working fine before until moving files between folders.

On running a fresh install of truenas it could boot fully as no pool is setup. That's with all drives installed.

Again as soon as you import the pool nothing displays on the server screen. And after a few seconds the systems reboots.

Importing the pool again as read-only now only shows the datasets but the storage dashboard screen stays empty. It's basically not importing the pool but only the data set info.

I think the only thing I can do is get another HBA card thats already in IT mode and then if that doesn't help then I'll have to just put it down as a lost cause. I do hope though that it's not all just been a truenas software bug in the latest release and not intact the hardware.

I appreciate all the help.
 
Joined
Feb 2, 2024
Messages
8
Well after a new HBA card there is no difference. I can't import the pool even though it finds it.
All the hardware checks out as working. I just think there was a bug in the build when moving files between folders. It's basically screwed up the entire pool.

I can't even import it in read-only now after booting into a usb install.

The software just isn't robust enough to make it useful in my case scenario. The whole point was to have redundancy. Trouble is if it copies errors onto both mirrors it screwed the whole pool. I would have actually been better off just having individual disks as storage and loosing only one or two disks rather than whole pool.

Yes I have the data backed up but now I'm down to one backup again. This whole process was to have an organised back up available on my network in order to keep my data 1st back up on removed disks kept safe.

Id advise not moving files between directories. It was fine copying large amounts of data in one go but simply moving data on the pool crashed it. The temps were all fine too after checking.
 
Last edited:

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
Good that you have backups! I'm still curious about this error, maybe someone else has an idea? @joeschmuck @Patrick M. Hausen ?

What HBA did you get?
 
Top