SOLVED Pool restoration journey TrueNAS 12

Joined
Jan 4, 2014
Messages
1,644
I lost a pool on a HP Gen 8 microserver a couple of night ago. A bit of background first. The server has four NAS drives installed in a RAID-Z1 configuration - 2 x 8TB, 2 x 6TB. Please, no lectures about RAID-Z1. I replicate to mitigate my risk in using this RAID-Z configuration. I wanted to replace the 6TB drives for two reasons. The first was that 2 x 2TB of disk space from the 8TB drives was unavailable because of the 6TB drives I'd thrown in the mix. Secondly, the 6TB drives were SMR drives that I wanted to replace with 8TB CMR drives I had recently purchased. Before anyone asks, I did burn-in test the replacement drives first. The server was running TrueNAS 12.0-U3.1 at the time.

The server rebooted pretty much immediately after I replaced the first drive. When the UI came back online, it had lost connection with the middleware - disk stats were not available. A zpool status did show that the resilver was progressing though. The next day, the resilver completed successfully and the UI had magically reconnected with the middleware. Pool status was healthy, with no disk errors.

On the evening of 31 May, TrueNAS 12.0-U4 became available. I upgraded, thinking that might resolve the issue with the UI losing connection with the middleware. I waited a further 24 hours as I have a scheduled scrub occurring on the first day of the new month and I didn't want that competing with the resilver. The next evening, I replaced the second drive. Same deal, the server rebooted and the UI lost connection with the middleware again. This time though, after 1% into the resilver, the server would crash.

It looked like I had lost the pool. I wasn't particularly worried. As indicated, I replicate all datasets so recreating the pool and restoring all datasets is more of an inconvenience rather than a nightmarish process. My main concern is that restoring pools is not something I do often. It only happens by exception. The last time I did it was with FreeNAS 11.x and that was with a backup system, so no biggie. This time around I'm restoring a primary system and I'm dealing with TrueNAS 12. Things look and behave differently. My personal opinion is that TrueNAS 12 still isn't as stable as FreeNAS 11.3-U5. For instance, you only have to look at the ongoing issue with python core dumps NAS-109709 in 12.0-U4. Was that what I was experiencing during the disk replacement when the UI lost contact with the middleware? I'm not sure.

A couple of questions have arisen for me:
  1. Why did I lose the pool? I wonder if I hadn't upgraded TrueNAS between disk replacements, whether the outcome would have been different? We'll never know. The evidence trail has now been erased. I've built a new pool.
  2. I feel like I'm breaking new ground by restoring a pool under TrueNAS 12. Has anyone else done this yet?
Anyway, I'm going to use this thread as a record of my pool restoration journey under TrueNAS 12, but feel free to chime up along the way with any thoughts.
 
Joined
Jan 4, 2014
Messages
1,644
Issue #1 - Unusual pool geometry messages

Here's an excerpt from the TrueNAS boot after having removed the corrupted pool. As indicated within the highlight below, there isn't a pool to import.

ilo4.jpg


A new RAID-Z1 (4 x 8TB WD RED+) pool was successfully created on TrueNAS 12.0-U4. However, during a reboot, I now see messages such as the following. These appear between the highlighted lines in the image above. I've captured these messages through the HP iLO remote console as the system boots. They do not appear in console messages in the TrueNAS GUI.

ilo5.jpg


Should I be concerned? I don't recall seeing these messages under FreeNAS 11.3-U5. They're something that's appeared in the upgrade to TrueNAS 12. Just to be doubly sure, I rebooted another Gen 8 server, which I had recently upgraded from FreeNAS 11.3-U5 to 12.0-U3.1 and, lo and behold, similar messages appeared during the reboot of that server. So, alarming as they look, the assumption might be that they don't really matter. However, I can't help but wonder whether there's something insidious going on here that contributed to the pool failure I experienced while replacing the second disk. Is anyone able to ally or confirm my concerns?
 
Last edited:
Joined
Jan 4, 2014
Messages
1,644
Issue #2 - Unable to reload config

I hadn't fully appreciated when I disconnected the corrupted pool that the TrueNAS default position is to remove the share configuration that use the pool.

tn1.jpg


As I was restoring the datasets back to this pool, I probably should have unchecked the highlighted box. To late. I then thought I'll just reload a saved TrueNAS config over the top and that will restore the shares. The config is automatically saved and emailed to me on a weekly basis as part of regular reporting. For this, I've been using the edgarsuit/FreeNAS report. The last config backup I had for this server was dated 26-May and created while the server was running TrueNAS 12.0-U3.1. I attempted to reload the config on the server now running TrueNAS 12.0-U4.

tn2.jpg


It failed. I also tried rolling back to TrueNAS 12.0-U3.1 and restoring the config there. The result was the same.

tn3.jpg


The issue is likely to be in one of two areas:
  1. The structure of the config db is different under TrueNAS 12 and this hasn't been accounted for in the report script; or,
  2. If the config db structure is unchanged, there's a problem uploading the config.
Thoughts anyone?

In retrospect, I probably should have manually saved the config prior to disconnecting the pool, and not assumed I would be able to restore a config from the automated backup. Lesson learnt.

EDIT: I managed to recover my share configuration by changing to an earlier boot environment 12.0-U3, which still had them included and then upgrading to 12.0-U3.1 (manually) and then to 12.0-U4.

I suspect there's an issue with the config backup in the edgarsuit script as I'm able to save and reload a config through the GUI. Further testing required before I report it.
 
Last edited:
Joined
Jan 4, 2014
Messages
1,644
Restoring datasets under TrueNAS 12 is a breeze and a delight to use. The assumption is that you have the TrueNAS config for the system on which a pool is being recreated. The replication tasks are saved in the config.

I'm not sure if this is a new feature for TrueNAS 12, but there is now an option to restore a dataset directly from a replication task. I don't recall seeing this option under FreeNAS 11.

tn4.jpg


Clicking on this option creates a new pull replication task to restore the replicated dataset. Enable that replication task and let it do its thing. Remember to disable the task when it has successfully completed.

When the dataset is restored, it will be in a read-only state.

tn5.jpg


This needs to be switched off. It can be changed through the UI:

tn6.jpg


Issue #3 - No option in the UI to change the read-only state of a ZVOL

Restoring a replicated ZVOL is just as straightforward. The ZVOL I restored was for a Ubuntu VM. However, there's no option in the GUI to change the read-only state of the ZVOL. To do this, I had to execute the following command zfs set readonly=off tank/ubuntu20-xh76iin the GUI shell.
 
Last edited:
Joined
Jan 4, 2014
Messages
1,644
In summary, I found reconstructing a pool much easier under TrueNAS 12. Replication improvements make restoring a dataset a trivial exercise. There's always room for further improvement though. WRT to the identified issues in this thread:

Issue #1 - Unusual pool geometry messages

This one's strange. I've reported it NAS-110925, but I'm not sure if anything will come of it. There's no visible operational issue apart from a lingering suspicion that it may have played a part in the pool failure.

Issue #2 - Unable to reload config

Looks like this is an issue with the edgarsuit script. I've raised it as an issue here https://github.com/edgarsuit/FreeNAS-Report/issues/29.

Referring to post #3, the following questions spring to mind as possible TrueNAS improvements when deleting a (corrupted) pool:
  1. Should the option to delete the configuration of shares that use the pool be checked by default? The checked option caused all sorts of problems for me during pool reconstruction.
  2. Whether there should be a (checked?) option to save the TrueNAS config prior to pool deletion?
I've made suggestions here NAS-110926. Please vote for this suggestion if you feel strongly about it.

Issue #3 - No option in the UI to change the read-only state of a ZVOL

This one I feel warrants a UI change. I've reported it here NAS-110927 as a bug rather than an improvement.
 
Last edited:
Top