Basil Hendroff
Wizard
- Joined
- Jan 4, 2014
- Messages
- 1,644
I lost a pool on a HP Gen 8 microserver a couple of night ago. A bit of background first. The server has four NAS drives installed in a RAID-Z1 configuration - 2 x 8TB, 2 x 6TB. Please, no lectures about RAID-Z1. I replicate to mitigate my risk in using this RAID-Z configuration. I wanted to replace the 6TB drives for two reasons. The first was that 2 x 2TB of disk space from the 8TB drives was unavailable because of the 6TB drives I'd thrown in the mix. Secondly, the 6TB drives were SMR drives that I wanted to replace with 8TB CMR drives I had recently purchased. Before anyone asks, I did burn-in test the replacement drives first. The server was running TrueNAS 12.0-U3.1 at the time.
The server rebooted pretty much immediately after I replaced the first drive. When the UI came back online, it had lost connection with the middleware - disk stats were not available. A
On the evening of 31 May, TrueNAS 12.0-U4 became available. I upgraded, thinking that might resolve the issue with the UI losing connection with the middleware. I waited a further 24 hours as I have a scheduled scrub occurring on the first day of the new month and I didn't want that competing with the resilver. The next evening, I replaced the second drive. Same deal, the server rebooted and the UI lost connection with the middleware again. This time though, after 1% into the resilver, the server would crash.
It looked like I had lost the pool. I wasn't particularly worried. As indicated, I replicate all datasets so recreating the pool and restoring all datasets is more of an inconvenience rather than a nightmarish process. My main concern is that restoring pools is not something I do often. It only happens by exception. The last time I did it was with FreeNAS 11.x and that was with a backup system, so no biggie. This time around I'm restoring a primary system and I'm dealing with TrueNAS 12. Things look and behave differently. My personal opinion is that TrueNAS 12 still isn't as stable as FreeNAS 11.3-U5. For instance, you only have to look at the ongoing issue with python core dumps NAS-109709 in 12.0-U4. Was that what I was experiencing during the disk replacement when the UI lost contact with the middleware? I'm not sure.
A couple of questions have arisen for me:
The server rebooted pretty much immediately after I replaced the first drive. When the UI came back online, it had lost connection with the middleware - disk stats were not available. A
zpool status
did show that the resilver was progressing though. The next day, the resilver completed successfully and the UI had magically reconnected with the middleware. Pool status was healthy, with no disk errors.On the evening of 31 May, TrueNAS 12.0-U4 became available. I upgraded, thinking that might resolve the issue with the UI losing connection with the middleware. I waited a further 24 hours as I have a scheduled scrub occurring on the first day of the new month and I didn't want that competing with the resilver. The next evening, I replaced the second drive. Same deal, the server rebooted and the UI lost connection with the middleware again. This time though, after 1% into the resilver, the server would crash.
It looked like I had lost the pool. I wasn't particularly worried. As indicated, I replicate all datasets so recreating the pool and restoring all datasets is more of an inconvenience rather than a nightmarish process. My main concern is that restoring pools is not something I do often. It only happens by exception. The last time I did it was with FreeNAS 11.x and that was with a backup system, so no biggie. This time around I'm restoring a primary system and I'm dealing with TrueNAS 12. Things look and behave differently. My personal opinion is that TrueNAS 12 still isn't as stable as FreeNAS 11.3-U5. For instance, you only have to look at the ongoing issue with python core dumps NAS-109709 in 12.0-U4. Was that what I was experiencing during the disk replacement when the UI lost contact with the middleware? I'm not sure.
A couple of questions have arisen for me:
- Why did I lose the pool? I wonder if I hadn't upgraded TrueNAS between disk replacements, whether the outcome would have been different? We'll never know. The evidence trail has now been erased. I've built a new pool.
- I feel like I'm breaking new ground by restoring a pool under TrueNAS 12. Has anyone else done this yet?