Reboot produces GPT Table Corrupt or Invalid

Status
Not open for further replies.

Todd Marimon

Dabbler
Joined
Dec 31, 2013
Messages
19
I am currently in the "playing around with" mode for FreeNAS (and ESXi), so no data is at risk here, but I still ultimately want to put this setup into production. So far, I haven't gained confidence that it will work, though.

I have just installed FreeNAS 9.2.0 on ESXi with the following stats:

4 vCPU
LSI M1015 (IT Mode) Passthrough to VM
12GB of of RAM (Host has 24GB)
8GB HDD
2x750 Samsung HD753LJ SATA HDDs

I did not have the following problem with FreeNAS 9.1.0 when I was playing with that prior to realizing 9.2.0 was out.

Every time I create a new zfs volume on the 750GB drives, everything works perfectly. I can create zvols and map them via iSCSI and I'm happy. But then.... I reboot the VM. Upon start-up, I see errors scroll by for the 2 drives:

Code:
Dec 31 15:35:22 freenas kernel: da1 at mps0 bus 0 scbus3 target 2 lun 0
Dec 31 15:35:22 freenas kernel: da1: <ATA SAMSUNG HD753LJ 1112> Fixed Direct Access SCSI-6 device 
Dec 31 15:35:22 freenas kernel: da1: 300.000MB/s transfers
Dec 31 15:35:22 freenas kernel: da1: Command Queueing enabled
Dec 31 15:35:22 freenas kernel: da1: 715404MB (1465149168 512 byte sectors: 255H 63S/T 91201C)
Dec 31 15:35:22 freenas kernel: da2 at mps0 bus 0 scbus3 target 3 lun 0
Dec 31 15:35:22 freenas kernel: da2: <ATA SAMSUNG HD753LJ 1112> Fixed Direct Access SCSI-6 device 
Dec 31 15:35:22 freenas kernel: da2: 300.000MB/s transfers
Dec 31 15:35:22 freenas kernel: da2: Command Queueing enabled
Dec 31 15:35:22 freenas kernel: da2: 715404MB (1465149168 512 byte sectors: 255H 63S/T 91201C)
Dec 31 15:35:22 freenas kernel: GEOM: da1: the secondary GPT table is corrupt or invalid.
Dec 31 15:35:22 freenas kernel: GEOM: da1: using the primary only -- recovery suggested.
Dec 31 15:35:22 freenas kernel: GEOM: da2: the secondary GPT table is corrupt or invalid.
Dec 31 15:35:22 freenas kernel: GEOM: da2: using the primary only -- recovery suggested.
Dec 31 15:35:22 freenas kernel: GEOM_MIRROR: Device mirror/system launched (2/2).
Dec 31 15:35:22 freenas kernel: GEOM: mirror/system: corrupt or invalid GPT detected.
Dec 31 15:35:22 freenas kernel: GEOM: mirror/system: GPT rejected -- may not be recoverable.


I have tried the following, without meaningful success:
Every time I do these various things, as soon as I reboot, I get the same result-- no more ZFS volume.
I'm kind of at a loss with this one-- I have no idea what is wrong. I have a feeling it is something to do with the 512B vs 4KB sectors, but why is this suddenly an issue? And what can I do to troubleshoot it? I've tried running 'gpart recover' commands, but they do not work:
Code:
[root@freenas] ~# gpart recover /dev/da1
gpart: arg0 'da1': invalid argument

Any help on this would be greatly appreciated.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Here's a $1,000,000 question.

Are you virtualizing your disks are are you using PCI passthrough?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
What motherboard are you using? Can you post your hardware makes/models?

One of the problems with VT-d is that it is very finicky and very temperamental. If you are using a board that isn't from one of the big companies that knows how to properly use VT-d technology, it can blow up in your face. I had a highend Gigabyte motherboard that had VT-d. It doesn't work right. Switched to Supermicro motherboard and the problems went away.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ok.. So I just deleted your post because 2 appeared. I deleted the extra, but now both are gone.

In short, the OP said that he did use PCIe passthrough.

Here's a paste:

Oh, I should have made that more clear...

The disks are attached to the LSI HBA card. So they are via the passthrough PCIe card.
 

Todd Marimon

Dabbler
Joined
Dec 31, 2013
Messages
19
What motherboard are you using? Can you post your hardware makes/models?

One of the problems with VT-d is that it is very finicky and very temperamental. If you are using a board that isn't from one of the big companies that knows how to properly use VT-d technology, it can blow up in your face. I had a highend Gigabyte motherboard that had VT-d. It doesn't work right. Switched to Supermicro motherboard and the problems went away.


The VMware host is a Tyan S7012 with a single Xeon L5520 with 24GB of DDR3 ECC (Might pick up a second down the road)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, Tyan is a "well known" company. Some experienced people here have warned against using them. I used one for a desktop around 2005 and it had odd quirks. I used it for less than 3 months because I got tired of it acting up.

Honestly, I don't have a smoking gun for you to check.

I'd do the normal stuff.. update your BIOS, check your BIOS settings, make sure you are using p14 firmware on your M1015(since the FreeNAS driver is p14). Other than that I have no other recommendations.

I can tell you that on my Supermicro system it has worked flawlessly and still is. :(
 

Todd Marimon

Dabbler
Joined
Dec 31, 2013
Messages
19
Well, Tyan is a "well known" company. Some experienced people here have warned against using them. I used one for a desktop around 2005 and it had odd quirks. I used it for less than 3 months because I got tired of it acting up.

Honestly, I don't have a smoking gun for you to check.

I'd do the normal stuff.. update your BIOS, check your BIOS settings, make sure you are using p14 firmware on your M1015(since the FreeNAS driver is p14). Other than that I have no other recommendations.

I can tell you that on my Supermicro system it has worked flawlessly and still is. :(

BIOS is up to date.


Should I try installing FreeNAS on the raw hardware to eliminate possible VT-d/ESXi oddness? I'm going to also re-try 9.1.0 to verify it worked correctly.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
That's what I would do. I'd be willing to bet good money the problem will magically go away when ESXi goes away. :(
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Is it just me or do we seem to see a lot of LGA1366 boards with what appear to be VT-d related problems? (your Gigabyte, someone else recently, now this)
 

Todd Marimon

Dabbler
Joined
Dec 31, 2013
Messages
19
Is it just me or do we seem to see a lot of LGA1366 boards with what appear to be VT-d related problems? (your Gigabyte, someone else recently, now this)


Well, I wouldn't write it off just yet-- Honestly, I was using 9.1.0 for days in a VM and didn't have this problem (but I will admit, I had lots of other problems, so it's possible somehow I didn't realize it-- I don't know how I would miss it, though).

I reverted my VM to 9.1.0 to see if that would work... and I have to eat my hat because the same thing happened with that version, too. So, I'm currently doing as I said before and installing FreeNAS 9.2.0 on the raw hardware. I'll report back as soon as that is done.

It is just vary odd to me that FreeNAS works on the very first boot after install, but then as soon as you reboot it, things hit the fan.
 

Todd Marimon

Dabbler
Joined
Dec 31, 2013
Messages
19
Well... Installed FreeNAS on the raw hardware, followed the same procedure for my initial configuration, ending in creating a volume, then a nested zvol on that. Reboot, then I still get:

Code:
The volume store1 (ZFS) status is UNKNOWN


So, I don't think this is a VMware issue.

How do I know what "P" my M1015 is running? I got the firmware from this post: http://forums.laptopvideo2go.com/topic/29059-sas2008-lsi92409211-firmware-files/

I'm also running the latest BIOS. Prior to installing ESXi I had to upgrade the BIOS in order to get the installer to even work. I might need to revert the BIOS (I'm not sure what was on it previously, though), or locate the one problematic setting. This is definitely not good in my mind, though.

One other thought, this was definitely working just fine for me before messing with ESXi-- so on the older BIOS on FreeNAS 9.1.0, I did not have any problems like this.

Suggestions? I almost want to fill the drives with random data, checksum, then reboot, re-checksum and compare. I also want to try other OSes to see if I can even blame FreeNAS.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
If you bootup FreeNAS and do dmesg | grep mps you'll know what version you flashed. It should say something like "mps0: Firmware: 15.00.00.00, Driver: 14.00.00.01-fbsd". In this example, this is a friend's server with firmware 15 and driver 14 on FreeNAS 9.1.1. You're supposed to keep the 2 on the same version at all times.

It is also possible you messed up the reflashing somehow??? Maybe find the p14 firmware and use that since we've been stuck there for over a year and despite my ticket that is 7 months old asking for the update, there's no rush to update drivers.

I'd keep the latest BIOS and try the correct firmware.
 

Todd Marimon

Dabbler
Joined
Dec 31, 2013
Messages
19
OK, I flashed the correct p14 firmware now. It took a little bit of figuring out on how to actually downgrade the firmware on the M1015. (For future googler's, run 'sas2flash -o -e 6' to erase the firmware, followed by your normal flash command, in my case 'sas2flash -o -f 2118it.bin -b mptsas2.rom')

Anyway, I rebooted, and created my volumes, then rebooted again... and STILL, the same problem.

This is rather baffling to me. This really seems like a FreeBSD issue to me. I checksummed the GPT data (pri and sec) just after creating my volumes, then rebooted and rechecksummed. The checksums are as follows:

Code:
before reboot:
Primary (first 16K):  2395c60bf610ac948503a83fa3a65db3
Secondary (last 16K): 86ba0a48e81a88190d06b51cbe092b37
after reboot:
Primary (first 16K):  2395c60bf610ac948503a83fa3a65db3
Secondary (last 16K): 20be3600e03f8797501e7d1f06ccc591 (Different??)


(The secondary changed, which might be due to an attempted recovery? It's just a back-up, so this shouldn't affect the GPT)

This is rather damning in my mind that FreeBSD is somehow messing up either prior to reboot or after reboot in how it is writing or interpreting (respectively) the GPT. I can't come up with any other explanation for why a reboot could affect how the drive comes back as "invalid". Is it possible there is some sort of write protection somewhere not allowing FreeBSD to write out to certain regions of the drive, which is causing problems for the GPT? Or could this be a sector size issue. Eitherway, I need suggestions on how it could be fixed. Could MBR tables be used??

Oh yea, Happy New Year!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Any chance we could do a team viewer session so I could check this out? If so pm me and we can setup a time. This is scaring me tbh.

Sent from my DROID BIONIC using Tapatalk
 

Todd Marimon

Dabbler
Joined
Dec 31, 2013
Messages
19
One New interesting tidbit... I just discovered that if I remove the drives (hot-unplug), then reboot the box, then hot-plug the drives again... the md5's come back identical after this. Then I just have to "Auto Import" then from the webUI, and there is no problem-- my data is all fine.

So, something about the shutdown process is messing with the GPT tables maybe?? This is highly suspect at least.
 

Todd Marimon

Dabbler
Joined
Dec 31, 2013
Messages
19
Piece of good news (for FreeNAS)... I was unable to replicate this in a VM on my workstation (I replicated things as best as I could, only virtual drives).

Another piece of good news... I've tried downgrading the BIOS on my motherboard to V2.02, and I don't seem to have any problems now. However, I also upgraded back to V3.00 and also am no longer having the problem. (The engineer in me wanted to verify downgrading is what fixed it).

I am now curious, if the very act of me hot swapping the drives in and out reset something with the LSI card to make it stop causing problems.

I'm truly baffled, but I don't seem to be having a problem now. I'll be playing even more with this, however, to gain confidence that this issue is now in the past. I'm probably going to reinstall ESXi and start from there again. I will report back here if I have any issues like this again.

Wish me luck!

Also, if anyone has theories on what the problem could have been, please post.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
No theories but glad to see you are not just pretending it's fixed. Understanding what blows stuff up is a key to reliability.
 

Todd Marimon

Dabbler
Joined
Dec 31, 2013
Messages
19
This is happening again. I have not yet figured out how to actually "fix it". It is a highly frustrating problem. It makes me very nervous to even consider trusting FreeNAS/ZFS with my data. Something is seriously amiss here. I'm really glad I've wasted my money on enterprise hardware and I can't even keep a filesystem through a reboot. (please disregard my frustrated tone... I really do want help with this problem)
 
Status
Not open for further replies.
Top