Bad RAM led to kernel panic. Many previous changes made. Assistance in best way to proceed appreciated. Mini XL+ arrives today.

erb

Dabbler
Joined
Oct 9, 2023
Messages
12
I'm very new to TrueNAS. I did have a FreeNAS server years ago, so I thought I knew what I was doing. This is a new build, and I've come to understand that it's important to proceed safely, but I have made so many changes recently I think a new thread might be warranted.

Initial config:
I created a pool of 3 WD Red Pro 18TB NAS Hard Drives RAIDz1 ( before purchasing I looked to see if they were SMR, and appeared that they were not. )
I created a pool of 2 WD Blue SSDs meant for fast transfer

Initial symptoms:
CPU was overheating, thermal paste fixed that.
Server kept rebooting.
Reading forums I read of a bug with encrypted datasets and replication, so I pulled the WD Blues out. It booted after this.
Encrypted volumes were slow, so I replaced it with an Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz for AES-NI
Server seemed stable, but would occasionally reboot
Upgraded to Version: TrueNAS-13.0-U5
Password updates weren't persisting, so I updated the system-dataset pool that was still pointing to the WD Blue SSDs to boot-pool
Did a ram test and found one was faulty, so I pulled a stick of RAM out taking the system from 16GB to 8GB
More stable, but there were irrecoverable files
Ran several scrubs, and deleted files which were listed
Ran short SMART test on all drives, all which returned healthy
Scrubs were still showing errors in some files, but I was able to copy data to the pool, but not from the pool on those files
zpool status -v gave some paths to files I couldn't find in the dataset, but seemed otherwise stable
Created a new encrypted Dataset
While copying data to the encrypted dataset it rebooted and now gives "panic: Solaris(panic): zfs: adding existent segment to range tree (offset=15400332000 size=04000)

Current situation:
To remove hardware as potential issue, I bought a Mini XL+ that arrived just minutes ago.
I have created a new install of TrueNAS Core TrueNAS-13.0-U5.3 booting off USB on the old server.
I think the Mini XL+ will have TrueNAS Scale, and am not sure if I can just throw the disks from TrueNAS Core into that new server.

I've read through a lot of posts, and will continue reading through the documentation. Given all of the changes I've made in the last few days, I was hoping someone might be able to give me advice on the best way to proceed with moving to the new hardware and recovery of the pool. I've read some suggest exporting and importing, but I wasn't comfortable with doing that quite yet, as I didn't want to potentially dig myself any deeper into a hole.

Any advice is greatly appreciated.
 

erb

Dabbler
Joined
Oct 9, 2023
Messages
12
tl;dr - TrueNAS Core is stuck in a boot loop after rebooting while copying data to an encrypted dataset. Bought new hardware to out hardware. Unsure whether I should try to fix the pool in the old TrueNAS Core prior to moving to new system which I think runs TrueNAS Scale.
 

erb

Dabbler
Joined
Oct 9, 2023
Messages
12
I was not able to backup the system data, before moving to the new hardware. For some reason it isn't finding the boot-pool on the SSD anymore. I moved the disks into the new Mini XL+, but after importing the system ran into the same boot loop. I've pulled the drives, rebooted without the drives in, then moved the system data-set which was set to the 3hdd raidz1 back to boot-pool, then loaded the drives again. It shows pool is offline. Not sure where to go from here.
 

erb

Dabbler
Joined
Oct 9, 2023
Messages
12
Update: I was able to mount the pool using the following:

Code:
zpool import -F -f -o readonly=on -R /mnt/temp zpool2


Server still reboots when trying to import r/w. Should I try an export and then an import?

I came across this which seemed to help me at least get the pool imported:

Code:
Recently i hit what i thought was a huge disaster with my ZFS array. Essentially i was unable to import my zpool without causing the kernel to panic and reboot. Still unsure of the exact reason, but it didn’t seem to be due to a hardware fault. (zpool import showed all disks as ONLINE)

When i tried to import with zpool import -f tank the machine would lockup and reboot (panic).

The kernel panic;  (key line)

> genunix: [ID 361072 kern.notice] zfs: freeing free segment (offset=3540185931776 size=22528)

Nothing i could do would fix it… tried both of these options in the system file with no success;

set zfs:zfs_recover=1
set aok=1

After a quick email from a Sun Engineer (kudos to Victor), the zdb command line that fixed it;

zdb -e -bcsvL <poolname>

zdb is a read only diagnostic tool, but seemed to read through the sectors that had the corrupt data and fix things??  (not sure how a read only tool does that) – the run took well over 15hrs.

Apparently if you have set zfs:zfs_recover=1 in your system file the zdb command will operate in a different manner fixing the issues it encounters.

Remember to run a zpool scrub <poolname> if you are lucky enough to get it back online.


I haven't set zfs_recover, because it sounds like you need to set that at boot. Unfortunately the XL+ has a VGA output, and neither of the two VGA to HDMI adapters seem to work.

For now I have it running the following:

Code:
zdb -e -bcsvL hdd-raidz-pool1
 

erb

Dabbler
Joined
Oct 9, 2023
Messages
12
I have new drives, and building a new pool.

There are some files that have permanent errors that I'd like to recover as much of the file as possible. Does anyone have any suggestions how to do this?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
There are some files that have permanent errors that I'd like to recover as much of the file as possible. Does anyone have any suggestions how to do this?
Use backups or recovery services (very expensive!).

I haven't set zfs_recover, because it sounds like you need to set that at boot.
You can set tunables in the WebUI.
 
  • Like
Reactions: erb

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Also, I don't think you can simply take a pool from a CORE system and throw it into a SCALE system. Please read the following documentation:
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
As long as the ZFS features match then you can transfer a pool from Core to Scale and import. You will just get the pool, no shares or similar
 

erb

Dabbler
Joined
Oct 9, 2023
Messages
12
I'm absolutely thrilled with the new Mini XL+. There are a few questions I have though, if anyone might have an idea.


Cooling Quetion:

I have some Seagate 6TB disks I had running in raid5 on my FreeNAS. The system does run hot when loaded with disks. Images attached. I'm not sure how to address it other than swapping out the 7200 rpm drives with 5400s without possible tunables, or adjusting internals.

1) What might be a suggestion for increasing cooling?


FreeNAS raid devices to TrueNAS Question:

Regarding my FreeNAS disks, zpool import didn't find anything on zpool import, and I'm thinking that I might have set them up as raid 5 in BIOS. The BIOS battery has died, which is a big no-no for motherboard raid. I'm unsure of the order they were in. I know FreeNAS isn't picky, but the RAID controller might be. I've heard some having success re-setting up the raid in the BIOS. It mentions disks will be wiped, but only metadata gets re-written. From there you can run a partition recovery tool that will let you know if you have the correct order and able to get them back online as a cluster presented to FreeNAS (will be TrueNAS now).

2) If it turns out I wasn't using the Mother Board Raid controller for raid, can you think of anyother reason why the disks may not report during a zpool import for another reason I could try first?

Dying SSD devices:
Unfortunately I've had two 1TB SSDs fry out in the last day. I'm not sure if it was the old hardware, or the new hardware, or if perhaps FreeNAS does something the SSD isn't expecting getting it into a strange state. One disk doesn't respond or appear on any machine. The other responds, but as read only.

3) Is anyone aware of anything which might cause SSDs to start failing? Moving to different slots hasn't changed the disk behavior. One never appeared. The other I setup as a raid0, and threw a fault as the one wasn't writable. That's really all I have to go on with that one.

Current pool setup:

I have 3 18TB raid0
I have 1 18TB single drive for backup to the above striped disks'
Then I have a portable drive for offsite, which I'm sycing just with a jobocopy to a truecrypt encrypted disk to remove the necessity of zfs in a situation as dire as that might to find the need for the offsite backup.

I tried two pairs of stripes, which allows for loss of 2 disks as long as they're not the same vdev, but given how much these drives cost, I have 4 now. I still have 2 disk redundancy, but there is the 3 disk failure case I could survive. Makes me a bit uneasy, but I also have the offsite drive for now. 750mb r/w was much better than the 500/600mb r/w I was getting with the mirrors. I'm planning on doing either once a week with the disk present only for that copy each week, and unplugged otherwise. Or maybe once a month. Maybe in a few months I'll pick up another 18 as a mirror for the backup. And If I get a 2nd I could have 3 vdevs of 2 mirrored disks.

I wanted to setup raidz2 with raid0 SSD for buffer, but I wasn't getting the behaviour I was looking for using it as a SLOG. So for now, after the 6 full days of data shifting, I'm reasonably happy with the setup for now.

Thank you for your advice.
 

Attachments

  • cool.png
    cool.png
    103 KB · Views: 51
  • hot2.png
    hot2.png
    182.5 KB · Views: 52

erb

Dabbler
Joined
Oct 9, 2023
Messages
12
I've also used two VGA to HDMI adapters, and can't get video out to any of my monitors. Still looking into that. Any ideas?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Doesn't the motherboard have a dsub on it? Try that one
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
1) What might be a suggestion for increasing cooling?

2) If it turns out I wasn't using the Mother Board Raid controller for raid, can you think of anyother reason why the disks may not report during a zpool import for another reason I could try first?

3) Is anyone aware of anything which might cause SSDs to start failing? Moving to different slots hasn't changed the disk behavior. One never appeared. The other I setup as a raid0, and threw a fault as the one wasn't writable. That's really all I have to go on with that one.
1) Adding fans, ramping up existing ones, leaving space between drives, spinning down the drives.

2) You understand you cannot use hardware RAID with ZFS, don't you? Why are you not importing the pool from the WebUI? You should use the WebUI whenever possibile.

3) Wear or hardware RAID magic.
 
Last edited:
  • Like
Reactions: erb

erb

Dabbler
Joined
Oct 9, 2023
Messages
12
1) Adding fans, ramping up existing ones, leaving space between drives, spinning down the drives.

2) You understand you cannot use hardware RAID with ZFS, don't you? Why are you not importing the pool from the WebUI? You should use the WebUI whenever possibile.

3) Wear or hardware RAID magic.
1) I'll have to call to confirm what would be considered permissible alterations not to void any warranty. It's possible the disks won't run so hot now that I'm not copying 20TB of data in one go to it.

2) Yes, I've learned that it was probably a bad choice to have possibly used BIOS RAID. This was 10-15 years ago, so I can't remember how I actually configured my previous FreeNAS system. I'm only guessing that it's possible I used BIOS RAID to help explain why TrueNAS didn't recognize the pool from those old disks, based on some things I've read, trying to understand what's required to possibly recover it. Or at least move the encrypted data set onto the new system to free up those disks. I didn't see an option to import the disks as zfs from the GUI, so that's why I went looking for alternative methods. My impression was that import disks seemed like it was possibly meant for external media, or perhaps ancillary FS support due to the lack of zfs as a selection. Also, my pool was pretty badly broken. Once I ran the commands from the command line, it instantly recognized the disks and that they were a pool. Took loading it read only to get it loaded, but ever since then, it's been accessible from the GUI. My SSD which coontained the boot-pool died, so I only had disks to a broken pool coming into the new hardware. No system backups or anything. It was a hope it would be possible to recover the disks, and was pleased to see that it did, albiet needing read only as to not put the new system into the same boot loop the old hardware was in. I was actually pretty surprised that importing disks could cause a kernel panic, but much of this is new to me.

3) No hardware RAID has been used. I'm not even positive I did with FreeNAS years earlier. TrueNAS just didn't seem to notice the zfs disks and pool identification on the 10 year old FreeNAS disks like it had with the TrueNAS system I had just built a few months ago. One thing I did do was move the system data set from the raidz1 HDD pool to the boot-pool SSD. I still need to read through that documentation to see what the implications of that was. I came across something stating along the lines that it's better not to move that to the boot pool as to not wear out your boot pool or lose your metadata, but I may not be using the right terminology or concepts here. I will read up on it soon after I'm able to slim the data a bit further to fit on my external drive. Thankfully the SSDs are still under warranty. I did have them setup up as a striped VDEV. One of those is unresponsive. The boot-pool disk seems is responsive, just unable to write even after reformatting. I'll try zeroing out the first few MB on the dev in case there's some strange partition state that's causing it in a bit.

Thank you for the pointers, I appreaciate it.
 

erb

Dabbler
Joined
Oct 9, 2023
Messages
12
IPMI is great. I'll definitley get that setup. I was mostly trying to identify if I had a hardware issue with the VGA port. Too often I don't work through issues quickly enough to get items returned when I get around to identifying it as a hardware issue. I thought maybe it needed to be enabled throught he GUI, but I didn't see any option that might indicate that was necessary. Could be the adapters are bad. Could be that my monitors can't support the resolution and refresh rate. I'll bring that up when I contact them in the next couple of days.
 

erb

Dabbler
Joined
Oct 9, 2023
Messages
12
Doesn't the motherboard have a dsub on it? Try that one
I have to admit I had to look dsub up. That's what I have the VGA to HDMI adapter connected to. Neither cable/adapter seemed to produce video on any monitor I own.
 

erb

Dabbler
Joined
Oct 9, 2023
Messages
12
Looks like the SSD problems are firmware related:

Also, sounds like some motherboards run into the broken VGA issue. Will be following up with iX systems.
 

erb

Dabbler
Joined
Oct 9, 2023
Messages
12
Appreciate the help Davvo and NugentS. Given the original purpose of this thread is resolved, I'm going to close it out.
 
Top