Upgrade to TrueNAS-13.0-U5.2 and now my Storage Pool is OFFLINE

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
So hardware is being detected, let's move on to partitions. Can you capture the output of gpart show and paste it in [code][/code] tags?

It may be better to temporarily enable SSH (Services -> SSH -> Edit -> Log in as Root with Password) to make the copy-and-paste easier.
 

Tong_Po

Dabbler
Joined
Jul 19, 2023
Messages
28
So hardware is being detected, let's move on to partitions. Can you capture the output of gpart show and paste it in [code][/code] tags?

It may be better to temporarily enable SSH (Services -> SSH -> Edit -> Log in as Root with Password) to make the copy-and-paste easier.
root@ushpst-san02[~]# gpart show => 40 584843184 da0 GPT (279G) 40 532480 1 efi (260M) 532520 33554432 3 freebsd-swap (16G) 34086952 550731776 2 freebsd-zfs (263G) 584818728 24496 - free - (12M) root@ushpst-san02[~]#

1689794423483.png
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Well, that's not good at all - even an exported/disconnected pool after a reboot should be showing a partition table on its drives akin to below:

Code:
root@core-boot-pool[~]# gpart show
=>      40  33554352  da0  GPT  (16G)
        40      1024    1  freebsd-boot  (512K)
      1064  33521664    2  freebsd-zfs  (16G)
  33522728     31664       - free -  (15M)

=>      40  33554352  da1  GPT  (16G)
        40      2008       - free -  (1.0M)
      2048      1024    1  freebsd-boot  (512K)
      3072  33521664    2  freebsd-zfs  (16G)
  33524736     29656       - free -  (14M)

=>      40  16777136  da2  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da3  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da4  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da5  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da6  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da7  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)


Your serial numbers look like how I would expect for unconfigured/passthrough drives rather than the much more artificial ones that result from being a virtual disk, so I'm not sure what would have occurred here. Did you export or disconnect your pool prior to the upgrade?

Can you collect a debug file from System -> Advanced -> Save Debug and attach it to a Report A Bug ticket?
 

Tong_Po

Dabbler
Joined
Jul 19, 2023
Messages
28
Well, that's not good at all - even an exported/disconnected pool after a reboot should be showing a partition table on its drives akin to below:

Code:
root@core-boot-pool[~]# gpart show
=>      40  33554352  da0  GPT  (16G)
        40      1024    1  freebsd-boot  (512K)
      1064  33521664    2  freebsd-zfs  (16G)
  33522728     31664       - free -  (15M)

=>      40  33554352  da1  GPT  (16G)
        40      2008       - free -  (1.0M)
      2048      1024    1  freebsd-boot  (512K)
      3072  33521664    2  freebsd-zfs  (16G)
  33524736     29656       - free -  (14M)

=>      40  16777136  da2  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da3  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da4  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da5  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da6  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)

=>      40  16777136  da7  GPT  (8.0G)
        40        88       - free -  (44K)
       128   4194304    1  freebsd-swap  (2.0G)
   4194432  12582744    2  freebsd-zfs  (6.0G)


Your serial numbers look like how I would expect for unconfigured/passthrough drives rather than the much more artificial ones that result from being a virtual disk, so I'm not sure what would have occurred here. Did you export or disconnect your pool prior to the upgrade?

Can you collect a debug file from System -> Advanced -> Save Debug and attach it to a Report A Bug ticket?
I did not.. it pulled the download and I applied it. After the reboot, everything was all bad.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Ah! This is interesting.

You said your config is as follows:

Hard Drives (Boot): 2 x Dell 300GB 15k SAS Enterprise
Hard Drives (Storage): 16 x Dell 1.92TB SSD Enterprise

But in your screenshot I see that we have some “drives” that are a bunch of RAID0s or JBODs?

What’s more interesting is that you don’t see ANY partitions on those drives.

Did you change anything in the bios or idrac related to the disks and raid modes? Did you flash an update? Did you ever reboot the server before this most recent update ?

I think the Raid controller wiped the partition headers of the drives for some reason and I’m not sure those can be recovered. The data is probably likely still on the disks but we can’t know for certain…and likely won’t know without some serious data recovery foo.
 

Tong_Po

Dabbler
Joined
Jul 19, 2023
Messages
28
Ah! This is interesting.

You said your config is as follows:

Hard Drives (Boot): 2 x Dell 300GB 15k SAS Enterprise
Hard Drives (Storage): 16 x Dell 1.92TB SSD Enterprise

But in your screenshot I see that we have some “drives” that are a bunch of RAID0s or JBODs?

What’s more interesting is that you don’t see ANY partitions on those drives.

Did you change anything in the bios or idrac related to the disks and raid modes? Did you flash an update? Did you ever reboot the server before this most recent update ?

I think the Raid controller wiped the partition headers of the drives for some reason and I’m not sure those can be recovered. The data is probably likely still on the disks but we can’t know for certain…and likely won’t know without some serious data recovery foo.
Correct. The controller is in RAID mode. I had to put each individual drive in its own RAID0 so that TrueNAS could pick the individual drives. I would have wasted too much space for my liking on just creating one giant virtual disk.

Nothing was changed in the iDRAC or BIOS. The only update that was applied to anything was the TrueNAS update. Yes, the server was bounced a few times before and never had this issue.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I think the Raid controller wiped the partition headers of the drives for some reason and I’m not sure those can be recovered. The data is probably likely still on the disks but we can’t know for certain…and likely won’t know without some serious data recovery foo.

Yeah, the data's very likely to still be out there but you would need to dummy up the right kind of partition table so that ZFS could identify its partitions again. This is the danger I warn against in the HBA/RAID sticky about the stupid partition tricks that a RAID controller may pull. Even if it tried a "quick" format, though, the TrueNAS layout that includes the swap "up front" has probably protected the data from being destroyed. If you let it go for a full erase/format which might take an hour or two, the data is probably gone. Unfortunately, dummying up partition tables to recover data is relatively deep magic and probably beyond what you are likely to get in the forums here, so it may not matter which thing happened.
 

Tong_Po

Dabbler
Joined
Jul 19, 2023
Messages
28
Yeah, the data's very likely to still be out there but you would need to dummy up the right kind of partition table so that ZFS could identify its partitions again. This is the danger I warn against in the HBA/RAID sticky about the stupid partition tricks that a RAID controller may pull. Even if it tried a "quick" format, though, the TrueNAS layout that includes the swap "up front" has probably protected the data from being destroyed. If you let it go for a full erase/format which might take an hour or two, the data is probably gone. Unfortunately, dummying up partition tables to recover data is relatively deep magic and probably beyond what you are likely to get in the forums here, so it may not matter which thing happened.
So without paying for some sort of recovery service, im pretty much wiping and starting over? With the exception of moving from RAID mode to HBA mode. Is the correct?
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
So without paying for some sort of recovery service, im pretty much wiping and starting over? With the exception of moving from RAID mode to HBA mode. Is the correct?
I'm not sure if you don't have any options left, but you probably want to go to an actual recommended HBA (eg. H220) instead of just flashing that H730P controller into IT mode.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I'm not sure if you don't have any options left, but you probably want to go to an actual recommended HBA (eg. H220) instead of just flashing that H730P controller into IT mode.

There's no "flash to IT mode" for this controller.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
So without paying for some sort of recovery service, im pretty much wiping and starting over? With the exception of moving from RAID mode to HBA mode. Is the correct?

There are ways to search for the start of a ZFS partition on a physical drive, which is probably the only "safe" recovery option. The problem is that reconstructing a disk label is not a beginner level activity, and then you really do need to copy the data off and onto a new stable pool. For some of us, we've been doing hot shuffles of disk partitions since the '80's and this kind of recovery might merely be annoying rather than impossible. There actually was a thread in which somebody's Proxmox system killed a TrueNAS ZFS partition and was (I believe) successfully recovered that involved disk label twiddling and some other related recovery fun. This is different, but only somewhat. I'm going to opine that it is beyond the assistance you can expect from the forums but it is not out of the realm of possibility that someone who is intent on a challenge might be able to help you out.
 

Tong_Po

Dabbler
Joined
Jul 19, 2023
Messages
28
Would anyone be able to recommend a good, proven HBA? Im reading alot about LSI. I just need something that will support up to 24 SSDs and be compatible with the Dell R730xd.

Again, for those responding, you guys are incredible. Thank you for all the information and help along the way. Im thinking that I have lost my data and should just build fresh, in a supported manner.

Finding someone with the time to undertake this recovery task along with the associated fees may be just a bit too much at this point.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
It can be migrated to HBA mode but not with the current setup. I would need to delete all the Virtual Disks and then "Switch to HBA mode:.

The "HBA mode" still relies upon the MRSAS driver, which is not recommended. It's possible that it works fine, but it is the MPR driver based cards that are recommended for use with TrueNAS. More recent LSI cards appear to have transitioned to relying on the MRSAS driver for their HBA support, but most of us here in the forums are cheapskates and haven't tried it. It's like being the first driver of a hydrogen powered car. Probably works. But you also get to discover any problems.

Would anyone be able to recommend a good, proven HBA? Im reading alot about LSI. I just need something that will support up to 24 SSDs and be compatible with the Dell R730xd.

I believe that the best HBA for the R730xd is likely to be the Dell PERC HBA330. As far as I know, this is just the Dell version of the LSI 9300-8i HBA. However, there is also some sort of "mini" version of the HBA330 that is apparently intended for an internal expansion slot. The 9300-8i is a pretty good performer -- lots better than the 2008 or 2308 based HBA's -- and I believe other posters have had good luck with it in the R730xd.

Again, for those responding, you guys are incredible. Thank you for all the information and help along the way. Im thinking that I have lost my data and should just build fresh, in a supported manner.

Finding someone with the time to undertake this recovery task along with the associated fees may be just a bit too much at this point.

Please do take the time to browse through


You may be able to tag some posters who might enjoy the challenge of another messed up pool. I just don't have the time for this right now, but perhaps @joeschmuck or @HoneyBadger will happen on by. In general, no one here likes a lost pool, and some of us like a challenge.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
THE H730 Mini you have looks like this:
1689817119850.png


There are no SAS cables that plug into it directly, and instead go to a daughterboard IIRC.

The Equivalent Dell part number is an HBA330 which can be found on eBay for less than $20 USD.

DO NOTE there are fakes out there see @jgreco's resource here:

I recommend this seller and trust that they are genuine, but you pay for it!!
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
You may be able to tag some posters who might enjoy the challenge of another messed up pool. I just don't have the time for this right now, but perhaps @joeschmuck or @HoneyBadger will happen on by. In general, no one here likes a lost pool, and some of us like a challenge.
This one's a little further gone than the Proxmox disk-wipe, because we don't have an easy template to pull from unless someone else happens to have that exact drive and controller configuration; and I'd be really hesitant to recommend that the OP re-partition a drive just to get a sample.

@Tong_Po Does the PERC configuration show the disks as active and online, or are they shown as a foreign disk that needs to be imported? I'm still struggling to think of a reason why a software upgrade and reboot would have suddenly rendered them all as invisible with effectively no valid partition table.

I'm not sure how much logging we would be able to get from a debug file (as the system dataset was on the main pool) but if you can collect one from System -> Advanced -> Save Debug and attach it to a "Report A Bug" ticket at the top of the forums, we can try to dig a little deeper.
 
Last edited:

Tong_Po

Dabbler
Joined
Jul 19, 2023
Messages
28
This one's a little further gone than the Proxmox disk-wipe, because we don't have an easy template to pull from unless someone else happens to have that exact drive and controller configuration; and I'd be really hesitant to recommend that the OP re-partition a drive just to get a sample.

@Tong_Po Does the PERC configuration show the disks as active and online, or are they shown as a foreign disk that needs to be imported? I'm still struggling to think of a reason why a software upgrade and reboot would have suddenly rendered them all as invisible with effectively no valid partition table.

I'm not sure how much logging we would be able to get from a debug file (as the system dataset was on the main pool) but if you can collect one from System -> Advanced -> Save Debug and attach it to a "Report A Bug" ticket at the top of the forums, we can try to dig a little deeper.
1689865791139.png
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
@Tong_Po Can you capture the status from the PERC configuration (Ctrl+R, or launch the PERC utility from the BIOS/UEFI?) It should be a similar type of menu to the screenshot below - I don't have an H730P myself to navigate through, but check for the status and health of the virtual disks.

large
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Have you tried booting to a previous environment from the System -> Boot menu?

The only thought here is that somehow the previous environment was using a RAID-aware driver (possibly through a tunable?) and your updated installation has shifted to attempt to communicate with the raw disks, and it's not understanding the PERC-supplied header that it's getting from the raw disks.
 
Top