TrueNAS-12.0-U8.1 "Pool offline" after Reboot

DAMatson

Cadet
Joined
Dec 29, 2022
Messages
7
Good morning folks.

I am new to TrueNAS and not sure where to start. I have attempted to search the forums for "Pool Offline TrueNAS" but it doesn't appear that anyone is having a similar problem.

I am using TrueNAS-12.0-U8.1 as a VM under ProxMox successfully for the last 8 months with no problems. My memory was originally set at 8/16GB, but I have since upgraded it to 16/32Gib as a result of one thread where someone mentioned that 4Gb was insufficient. Here's the latest config for the TrueNAS VM:

TrueNAS.jpg
When discussing

My Physical hardware is a Dell R610:
- 2 Intel Xeon X5670 w/64Gib Memory
- Booting off of a newer Acer 240GB SSD using the CD-ROM slot.
- LSI 9211-4I 9211-8I SAS/SATA 6Gbps HBA LSI P20 IT Mode for ZFS FreeNAS unRAID
- running 6 Dell Seagate 1TB Constellation 2 ST91000640SS 7.2K SAS 6 Gbps 2.5" HDDs

The PCI Device in the screenshot above is the LSI 9211, and it's being passed-thru from the ProxMox metal to the TrueNAS VM, and prior to this problem it had been working successfully for the last 8 months or so. I didn't even notice the issue until a TrueNAS plugin called "PlexMediaServer" failed to launch and I found out my pool was offline.

As mentioned previously I'm a newb, so I'm not sure where to start. I would prefer to recover the data on this pool if at all possible. Any help would be greatly appreciated!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Have you looked at dmesg to see what the pool alarms are? What other alarms are you getting?

I have never used Proxmox so I will ask what might sound like stupid questions.

Were the hard drives passed-through to the VM, or were the drives virtualized?

If the drives were virtualized, and you somehow corrupted them (you or drive failure), odds are you might not be able to get the data back. But I'd try everything possible first to mount the drives to attempt recovery.

If the drives were passed-through then you could boot from a TrueNAS boot drive and import the drives, then recover the data. It's not that simple but that is the idea.

We just really need more information about how you virtualized this and the error messages (screen captures work nicely).
 

DAMatson

Cadet
Joined
Dec 29, 2022
Messages
7
Have you looked at dmesg to see what the pool alarms are? What other alarms are you getting?
Sorry, I'm not aware how to check this. Is DMESG a separate utility?

I have never used Proxmox so I will ask what might sound like stupid questions.
I'm not very far ahead of you on Proxmox. I installed it and then installed TrueNAS and both worked perfectly so I haven't really had a chance to play around with it more.

Were the hard drives passed-through to the VM, or were the drives virtualized?
Yes, sorry. I passed the HBA LSI 9211-4I and all 6 of the Dell/Seagate drives thru to TrueNAS VM.


If the drives were virtualized, and you somehow corrupted them (you or drive failure), odds are you might not be able to get the data back. But I'd try everything possible first to mount the drives to attempt recovery.
Since they were not virtualized, how would I go about doing this when they are offline?

If the drives were passed-through then you could boot from a TrueNAS boot drive and import the drives, then recover the data. It's not that simple but that is the idea.

We just really need more information about how you virtualized this and the error messages (screen captures work nicely).
So under Disks, I only see the virtualized boot drive. I had thought that the six drives appeared a one ZFS volume, but that was 8 months ago and I'm not entirely certain.

1672351731123.png



When I went to Pool > Add and tried the Import option, I got as far as picking the pool that I wanted to Import and it only gave me a "-" as you can see from the screenshots below:

1672351952837.png
1672351959760.png


How would I check to make sure that the individual drives are good/bad ?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Read the entire message below, I tend to think out of order and the last thing listed it the first thing I want you to try.
Also, prior to this failure, did something happen tot he system? Power outage, reboot, upgrade of Proxmox, anything. Think back over the past two weeks or more, not just the past 2 days.

Since they were not virtualized, how would I go about doing this when they are offline?
Since you passed through the HBA, odds are you saved yourself some bigtime trouble.

First things first, can you bootstrap the TrueNAS VM and get to the GUI?
If yes, let's first try this...
1. At the GUI, go into the Shell window and type:
zpool import
This should provide the pool name. If this fails then maybe your pool is gone.

Post the output message to the forum. We need to see what you are actually dealing with.

If the pool name appears then type this:
zpool import -m -R /mnt/ "poolname" and hopefully it works. (Pay attention to the format!, spaces matter)

Then follow that up with zpool status and post all the data that was generated from the commands.

Sorry, I'm not aware how to check this. Is DMESG a separate utility?
In the GUI Shell window, just type dmesg | more and you will likely get a lot of data. You can search for things by typing dmesg | grep "wordtosearchfor" and only lines with that word will print out. Helpful to list one common device.

So under Disks, I only see the virtualized boot drive. I had thought that the six drives appeared a one ZFS volume, but that was 8 months ago and I'm not entirely certain.
My fear is the HBA is not really being passed through or the HBA has failed. So the very first thing I think you should do is this stuff, not the stuff above. Until the drives are visable in the GUI, nothing above except the dmesg part will work.

1. Backup your configuration file.
2. Shutdown your VM of TrueNAS. Disable it from automatically starting. We want to keep it around incase you need it later, but don't want it to automatically start up.
3. Reboot your computer & Proxmox.
4. Create a new TrueNAS VM from scratch, this means create a new virtual disk for booting as well. I don't think you need to give it 16 cores, maybe 4 will do, but it's your machine. Pass through the HBA.
5. Install TrueNAS to the VM Disk.
6. Bootstrap the new TrueNAs VM.
7. Log into the GUI and check to see if the "Storage" ->"Disks" lists all the drives now.
8. If it does list all the drives now, you could restore the previously saved configuration file and then reboot the VM.
9. Make sure it's working. The pool "should" be back online.
10. Provide the output of zpool status

Cross your fingers this is all that happened, it just looks like the HBA flew the coup.
 

DAMatson

Cadet
Joined
Dec 29, 2022
Messages
7
Good Morning Chuck!

I too was suspecting that the controller card was kaput. I was going to follow your 10 steps on Sunday, but then a power failure hit New Year's Eve that outlasted the capacity of my UPS batteries. Sadly, I was unable to gracefully shut down the servers, so they were sitting cold for about 3 hours.

It seems that power was restored around 12:11am New Years Day, and when I turned everything back on, the drives are now visible!

1672563179412.png


So, change of strategies.

I suspect that sitting fallow for 3hrs allowed the HBA to disapate any residual charge it might have had between previous restarts. Now that the drives are visible, are there any further diagnostics we can run on these drives to see what possible damage may have ocurred ?

Here's a ZPOOL STATUS output that you requested:

1672563480793.png


Thank you again for your assistance, it is greatly appreciated!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
It seems that power was restored around 12:11am New Years Day, and when I turned everything back on, the drives are now visible!
Just curious, did you ever reboot the server before reporting the problem? I'm sure you have read on the forums that Proxmox is not the most stable hypervisor for TrueNAS.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994

DAMatson

Cadet
Joined
Dec 29, 2022
Messages
7
Just curious, did you ever reboot the server before reporting the problem? I'm sure you have read on the forums that Proxmox is not the most stable hypervisor for TrueNAS.
Good morning Joe!

Sorry about the Chuck name, I caught my error immediately after posting, but unfortunately, there's no obvious "edit" function to correct such errors. Sigh.

Also, yes, I rebooted the physical server (Proxmox) at least twice. And no, I never saw any reference that stated that Proxmox isn't the most stable hypervisor for the purposes of using TrueNAS, but I know now.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Sorry about the Chuck name
I just thought is was done un humor. No problem.

That is odd that your system would come back online after a power outage. I assume that when you rebooted the server, you powered it down completely? I'm only asking questions to try and make sense of this problem, even though it may be "fixed" for now.

Yes, some people say Proxmox works fine, and some say it doesn't. I've never seen anyone complain about ESXi. I don't want you to think Proxmox caused your issue, we actually do not know what happened, not yet. And I don't like to make assumptions unless it needed to move forward.
 

DAMatson

Cadet
Joined
Dec 29, 2022
Messages
7
OK, so SCRUB is completed. Not sure about the results:

1672601203112.png


Says zero errors, but then on the Pool Status page is still registers as "unhealthy":

1672601298838.png


Is there any further checks to be done to remedy this?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Looks like you had a few checksum errors which where likely fixed.
If you go to the Shell and type in this command zpool status storage0 then it will provide a an output which will have a little more data. It will tell you if there were errors corrected or not.

Reboot, see if that will now correct the "Unhealthy" condition. If it does not, I need to see the output of the command from above.
 

DAMatson

Cadet
Joined
Dec 29, 2022
Messages
7
So the command ZPOOL STATUS STORAGE0 was no functionally no different than just ZPOOL STATUS. That is, I did not get any extra output on that over the base command:

1672684729521.png


Since it specifically mentions that it scrubbed 0B of repairs, I'll presume that to mean that it didn't repair anything.

However, after Shutdown and subsequent restart:

1672698067738.png


So no longer "unhealthy".

Would you be satisified with this resolution if it was your system, or is there more that you would "dig into" ?

And, two related questions for you.

1.) I would like to have a way to copy my TrueNAS configs to a backup location like a cloud drive automatically. Is there a way to do this to help mitigate future problems?
2.) Should I upgrade to TrueNAS Scale, or stick with TrueNAS Core? I only use TrueNAS for storing personal shares for the family, and for running the plugin "PleaseMediaServer" and storing the media content recorded by Plex.

Thank you Joe!
 

DAMatson

Cadet
Joined
Dec 29, 2022
Messages
7
Also, checksum on the drives 3 and 4 is now 0 like the rest, but was didn't change until after restart of the system:

1672698943212.png
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Would you be satisified with this resolution if it was your system
Yes I would be. I figured the reboot would fix it. This kind of stuff happens from time to time. I don't want to speculate what caused it but if it happens often, then further troubleshooting to isolate the cause is required.

So the command ZPOOL STATUS STORAGE0 was no functionally no different than just ZPOOL STATUS
Well some of the data scrolled off the screen. zpool status would provide you all the pools, including the boot pool which I didn't want to confuse you. zpool status [I]pool[/I] will provide just the pool specified.
 
Top