Upgrade to TrueNAS-13.0-U5.2 and now my Storage Pool is OFFLINE

HoneyBadger · Jul 19, 2023

So hardware is being detected, let's move on to partitions. Can you capture the output of gpart show and paste it in [code][/code] tags?

It may be better to temporarily enable SSH (Services -> SSH -> Edit -> Log in as Root with Password) to make the copy-and-paste easier.

Tong_Po · Jul 19, 2023

HoneyBadger said:
So hardware is being detected, let's move on to partitions. Can you capture the output of gpart show and paste it in [code][/code] tags?

It may be better to temporarily enable SSH (Services -> SSH -> Edit -> Log in as Root with Password) to make the copy-and-paste easier.


root@ushpst-san02[~]# gpart show
=>       40  584843184  da0  GPT  (279G)
         40     532480    1  efi  (260M)
     532520   33554432    3  freebsd-swap  (16G)
   34086952  550731776    2  freebsd-zfs  (263G)
  584818728      24496       - free -  (12M)

root@ushpst-san02[~]#

HoneyBadger · Jul 19, 2023

Well, that's not good at all - even an exported/disconnected pool after a reboot should be showing a partition table on its drives akin to below:

Code:

root@core-boot-pool[~]# gpart show
=>      40  33554352  da0  GPT  (16G)
        40      1024    1  freebsd-boot  (512K)
      1064  33521664    2  freebsd-zfs  (16G)
  33522728     31664       - free -  (15M)

=>      40  33554352  da1  GPT  (16G)
        40      2008       - free -  (1.0M)
      2048      1024    1  freebsd-boot  (512K)
      3072  33521664    2  freebsd-zfs  (16G)
  33524736     29656       - free -  (14M)

=>      40  16777136  da2  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da3  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da4  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da5  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da6  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da7  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

Your serial numbers look like how I would expect for unconfigured/passthrough drives rather than the much more artificial ones that result from being a virtual disk, so I'm not sure what would have occurred here. Did you export or disconnect your pool prior to the upgrade?

Can you collect a debug file from System -> Advanced -> Save Debug and attach it to a Report A Bug ticket?

Tong_Po · Jul 19, 2023

HoneyBadger said:

Well, that's not good at all - even an exported/disconnected pool after a reboot should be showing a partition table on its drives akin to below:

Code:

root@core-boot-pool[~]# gpart show
=>      40  33554352  da0  GPT  (16G)
        40      1024    1  freebsd-boot  (512K)
      1064  33521664    2  freebsd-zfs  (16G)
  33522728     31664       - free -  (15M)

=>      40  33554352  da1  GPT  (16G)
        40      2008       - free -  (1.0M)
      2048      1024    1  freebsd-boot  (512K)
      3072  33521664    2  freebsd-zfs  (16G)
  33524736     29656       - free -  (14M)

=>      40  16777136  da2  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da3  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da4  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da5  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da6  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da7  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

Your serial numbers look like how I would expect for unconfigured/passthrough drives rather than the much more artificial ones that result from being a virtual disk, so I'm not sure what would have occurred here. Did you export or disconnect your pool prior to the upgrade?

Can you collect a debug file from System -> Advanced -> Save Debug and attach it to a Report A Bug ticket?

I did not.. it pulled the download and I applied it. After the reboot, everything was all bad.

NickF · Jul 19, 2023

Ah! This is interesting.

You said your config is as follows:

Hard Drives (Boot): 2 x Dell 300GB 15k SAS Enterprise
Hard Drives (Storage): 16 x Dell 1.92TB SSD Enterprise

But in your screenshot I see that we have some “drives” that are a bunch of RAID0s or JBODs?

What’s more interesting is that you don’t see ANY partitions on those drives.

Did you change anything in the bios or idrac related to the disks and raid modes? Did you flash an update? Did you ever reboot the server before this most recent update ?

I think the Raid controller wiped the partition headers of the drives for some reason and I’m not sure those can be recovered. The data is probably likely still on the disks but we can’t know for certain…and likely won’t know without some serious data recovery foo.

Tong_Po · Jul 19, 2023

NickF said:
Ah! This is interesting.

You said your config is as follows:

Hard Drives (Boot): 2 x Dell 300GB 15k SAS Enterprise
Hard Drives (Storage): 16 x Dell 1.92TB SSD Enterprise

But in your screenshot I see that we have some “drives” that are a bunch of RAID0s or JBODs?

What’s more interesting is that you don’t see ANY partitions on those drives.

Did you change anything in the bios or idrac related to the disks and raid modes? Did you flash an update? Did you ever reboot the server before this most recent update ?

I think the Raid controller wiped the partition headers of the drives for some reason and I’m not sure those can be recovered. The data is probably likely still on the disks but we can’t know for certain…and likely won’t know without some serious data recovery foo.

Correct. The controller is in RAID mode. I had to put each individual drive in its own RAID0 so that TrueNAS could pick the individual drives. I would have wasted too much space for my liking on just creating one giant virtual disk.

Nothing was changed in the iDRAC or BIOS. The only update that was applied to anything was the TrueNAS update. Yes, the server was bounced a few times before and never had this issue.

jgreco · Jul 19, 2023

NickF said:
I think the Raid controller wiped the partition headers of the drives for some reason and I’m not sure those can be recovered. The data is probably likely still on the disks but we can’t know for certain…and likely won’t know without some serious data recovery foo.

Yeah, the data's very likely to still be out there but you would need to dummy up the right kind of partition table so that ZFS could identify its partitions again. This is the danger I warn against in the HBA/RAID sticky about the stupid partition tricks that a RAID controller may pull. Even if it tried a "quick" format, though, the TrueNAS layout that includes the swap "up front" has probably protected the data from being destroyed. If you let it go for a full erase/format which might take an hour or two, the data is probably gone. Unfortunately, dummying up partition tables to recover data is relatively deep magic and probably beyond what you are likely to get in the forums here, so it may not matter which thing happened.

Tong_Po · Jul 19, 2023

jgreco said:
Yeah, the data's very likely to still be out there but you would need to dummy up the right kind of partition table so that ZFS could identify its partitions again. This is the danger I warn against in the HBA/RAID sticky about the stupid partition tricks that a RAID controller may pull. Even if it tried a "quick" format, though, the TrueNAS layout that includes the swap "up front" has probably protected the data from being destroyed. If you let it go for a full erase/format which might take an hour or two, the data is probably gone. Unfortunately, dummying up partition tables to recover data is relatively deep magic and probably beyond what you are likely to get in the forums here, so it may not matter which thing happened.

So without paying for some sort of recovery service, im pretty much wiping and starting over? With the exception of moving from RAID mode to HBA mode. Is the correct?

Whattteva · Jul 19, 2023

Tong_Po said:
So without paying for some sort of recovery service, im pretty much wiping and starting over? With the exception of moving from RAID mode to HBA mode. Is the correct?

I'm not sure if you don't have any options left, but you probably want to go to an actual recommended HBA (eg. H220) instead of just flashing that H730P controller into IT mode.

jgreco · Jul 19, 2023

Whattteva said:
I'm not sure if you don't have any options left, but you probably want to go to an actual recommended HBA (eg. H220) instead of just flashing that H730P controller into IT mode.

There's no "flash to IT mode" for this controller.

jgreco · Jul 19, 2023

Tong_Po said:
So without paying for some sort of recovery service, im pretty much wiping and starting over? With the exception of moving from RAID mode to HBA mode. Is the correct?

There are ways to search for the start of a ZFS partition on a physical drive, which is probably the only "safe" recovery option. The problem is that reconstructing a disk label is not a beginner level activity, and then you really do need to copy the data off and onto a new stable pool. For some of us, we've been doing hot shuffles of disk partitions since the '80's and this kind of recovery might merely be annoying rather than impossible. There actually was a thread in which somebody's Proxmox system killed a TrueNAS ZFS partition and was (I believe) successfully recovered that involved disk label twiddling and some other related recovery fun. This is different, but only somewhat. I'm going to opine that it is beyond the assistance you can expect from the forums but it is not out of the realm of possibility that someone who is intent on a challenge might be able to help you out.

Tong_Po · Jul 19, 2023

jgreco said:
There's no "flash to IT mode" for this controller.

It can be migrated to HBA mode but not with the current setup. I would need to delete all the Virtual Disks and then "Switch to HBA mode:.

Tong_Po · Jul 19, 2023

Would anyone be able to recommend a good, proven HBA? Im reading alot about LSI. I just need something that will support up to 24 SSDs and be compatible with the Dell R730xd.

Again, for those responding, you guys are incredible. Thank you for all the information and help along the way. Im thinking that I have lost my data and should just build fresh, in a supported manner.

Finding someone with the time to undertake this recovery task along with the associated fees may be just a bit too much at this point.

jgreco · Jul 19, 2023

Tong_Po said:
It can be migrated to HBA mode but not with the current setup. I would need to delete all the Virtual Disks and then "Switch to HBA mode:.

The "HBA mode" still relies upon the MRSAS driver, which is not recommended. It's possible that it works fine, but it is the MPR driver based cards that are recommended for use with TrueNAS. More recent LSI cards appear to have transitioned to relying on the MRSAS driver for their HBA support, but most of us here in the forums are cheapskates and haven't tried it. It's like being the first driver of a hydrogen powered car. Probably works. But you also get to discover any problems.

Tong_Po said:
Would anyone be able to recommend a good, proven HBA? Im reading alot about LSI. I just need something that will support up to 24 SSDs and be compatible with the Dell R730xd.

I believe that the best HBA for the R730xd is likely to be the Dell PERC HBA330. As far as I know, this is just the Dell version of the LSI 9300-8i HBA. However, there is also some sort of "mini" version of the HBA330 that is apparently intended for an internal expansion slot. The 9300-8i is a pretty good performer -- lots better than the 2008 or 2308 based HBA's -- and I believe other posters have had good luck with it in the R730xd.

Tong_Po said:
Again, for those responding, you guys are incredible. Thank you for all the information and help along the way. Im thinking that I have lost my data and should just build fresh, in a supported manner.

Finding someone with the time to undertake this recovery task along with the associated fees may be just a bit too much at this point.

Please do take the time to browse through

HELP ZFS Pool data recovery

My TrueNAS setup: Lives on Proxmox VM Three 4TB hard drives passed through to the VM Running in RaidZ1 File system: ZFS Please help! 4 days ago I had one of my 4TB hard drives fail on me. I instantly ordered a new one, it came in yesterday. Threw it into my Proxmox server. Located it and...

www.truenas.com

You may be able to tag some posters who might enjoy the challenge of another messed up pool. I just don't have the time for this right now, but perhaps @joeschmuck or @HoneyBadger will happen on by. In general, no one here likes a lost pool, and some of us like a challenge.

NickF · Jul 19, 2023

THE H730 Mini you have looks like this:

There are no SAS cables that plug into it directly, and instead go to a daughterboard IIRC.

The Equivalent Dell part number is an HBA330 which can be found on eBay for less than $20 USD.

Hba330 Mini for sale | eBay

Get the best deals for Hba330 Mini at eBay.com. We have a great online selection at the lowest prices with Fast & Free shipping on many items!

www.ebay.com

DO NOTE there are fakes out there see @jgreco's resource here:

Fake server cards

This resource was originally created by user: jgreco on the TrueNAS Community Forums Archive. Please DM this account or comment in this thread to claim it. Some time ago, forum frequenter @artlessknave kindly sent me a pair of LSI HBA cards that had failed to work out in a FreeNAS build. Having...

www.truenas.com

I recommend this seller and trust that they are genuine, but you pay for it!!

Dell HBA330 mini monolithic (=9300-8i) w/ P16 IT mode H330 ZFS FreeNAS unRAID | eBay

I have fully tested them using all 12 SAS HDD slots. Listed for sale are Dell PERC HBA330 mini monolithic cards (Dell P/N: P2R3R) for 13th & 14th generation Dell PowerEdge servers. They have been flashed with Dell IT mode firmware equivalent to Avago / LSI 9300-8i IT (Initiator Target) firmware...

www.ebay.com

HoneyBadger · Jul 20, 2023

jgreco said:
You may be able to tag some posters who might enjoy the challenge of another messed up pool. I just don't have the time for this right now, but perhaps @joeschmuck or @HoneyBadger will happen on by. In general, no one here likes a lost pool, and some of us like a challenge.

This one's a little further gone than the Proxmox disk-wipe, because we don't have an easy template to pull from unless someone else happens to have that exact drive and controller configuration; and I'd be really hesitant to recommend that the OP re-partition a drive just to get a sample.

@Tong_Po Does the PERC configuration show the disks as active and online, or are they shown as a foreign disk that needs to be imported? I'm still struggling to think of a reason why a software upgrade and reboot would have suddenly rendered them all as invisible with effectively no valid partition table.

I'm not sure how much logging we would be able to get from a debug file (as the system dataset was on the main pool) but if you can collect one from System -> Advanced -> Save Debug and attach it to a "Report A Bug" ticket at the top of the forums, we can try to dig a little deeper.

Tong_Po · Jul 20, 2023

HoneyBadger said:
This one's a little further gone than the Proxmox disk-wipe, because we don't have an easy template to pull from unless someone else happens to have that exact drive and controller configuration; and I'd be really hesitant to recommend that the OP re-partition a drive just to get a sample.

@Tong_Po Does the PERC configuration show the disks as active and online, or are they shown as a foreign disk that needs to be imported? I'm still struggling to think of a reason why a software upgrade and reboot would have suddenly rendered them all as invisible with effectively no valid partition table.

I'm not sure how much logging we would be able to get from a debug file (as the system dataset was on the main pool) but if you can collect one from System -> Advanced -> Save Debug and attach it to a "Report A Bug" ticket at the top of the forums, we can try to dig a little deeper.

HoneyBadger · Jul 20, 2023

@Tong_Po Can you capture the status from the PERC configuration (Ctrl+R, or launch the PERC utility from the BIOS/UEFI?) It should be a similar type of menu to the screenshot below - I don't have an H730P myself to navigate through, but check for the status and health of the virtual disks.

Tong_Po · Jul 20, 2023

Confirmed.. All are showing as Online and Ready.

HoneyBadger · Jul 20, 2023

Have you tried booting to a previous environment from the System -> Boot menu?

The only thought here is that somehow the previous environment was using a RAID-aware driver (possibly through a tunable?) and your updated installation has shifted to attempt to communicate with the raw disks, and it's not understanding the PERC-supplied header that it's getting from the raw disks.

Important Announcement for the TrueNAS Community.

Upgrade to TrueNAS-13.0-U5.2 and now my Storage Pool is OFFLINE

actually does care

Dabbler

actually does care

Dabbler

Guru

Dabbler

Resident Grinch

Dabbler

Wizard

Resident Grinch

Resident Grinch

Dabbler

Dabbler

Resident Grinch

Guru

actually does care

Dabbler

actually does care

Dabbler

actually does care

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Upgrade to TrueNAS-13.0-U5.2 and now my Storage Pool is OFFLINE"

Similar threads