SOLVED GPT table is corrupt or invalid error on bootup

cyberjock · Apr 2, 2013

Yesterday morning I was playing around with testing the AES-NI benchmark thread and I panic'd the machine during a scrub(automated every 1st and 15th at 0330. The issue may have occurred before now, but I only noticed it rebooting after the panic.

On bootup(and dmesg output) has:

Code:

GEOM: da7: the primary GPT table is corrupt or invalid.
GEOM: da7: using the secondary instead -- recovery strongly advised.

I've spent the last 8 hours or so reading what others have done. I read every thread and nobody seems to have a solution they said worked aside from zeroing out the disk(or at least the gpt tables) and re-adding it to the array or using alternate utilities such as Parted Magic's 'gpt fdisk' command.

In the spirit of fixing this issue(and learning a little something) without using some other boot CD or wiping the drive, how do I fix this?

System specs:

FreeNAS-8.3.1-RELEASE-x64 (r13452)
E5606 with 20GB of RAM
ZFS v28 running 18x2TB on RAIDZ3

I've never had any problems with any of my disks, and reviewing the SMART data for the drive shows nothing to indicate anything is going wrong.

Here's some outputs that were commonly asked for by other people with the same issue...

Code:

# gpart show da7

=>        34  3907029101  da7  GPT  (1.8T) [CORRUPT]
          34          94       - free -  (47k)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  3902834703    2  freebsd-zfs  (1.8T)

# gpart list da7

Geom name: da7
modified: false
state: CORRUPT
fwheads: 255
fwsectors: 63
last: 3907029134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da7p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 65536
   Mode: r1w1e1
   rawuuid: 762670b2-4a95-11e2-bca4-0015171496ae
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da7p2
   Mediasize: 1998251367936 (1.8T)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 2147549184
   Mode: r1w1e2
   rawuuid: 763790a1-4a95-11e2-bca4-0015171496ae
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 1998251367936
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 3907029134
   start: 4194432
Consumers:
1. Name: da7
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r2w2e5

# zpool status
  pool: tank
 state: ONLINE
  scan: scrub repaired 0 in 17h23m with 0 errors on Mon Apr  1 21:23:24 2013
config:

	NAME                                            STATE     READ WRITE CKSUM
	tank                                            ONLINE       0     0     0
	  raidz3-0                                      ONLINE       0     0     0
	    gptid/6fbb91d5-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/70448fd2-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/70c0c7b3-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/713de0d5-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/71e3eea1-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/728458d2-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/7326aebc-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/73c64f27-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/7468c69a-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/75045f96-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/75a0096a-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/763790a1-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/76d701fa-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/77759c5c-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/78190bd3-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/78bb9173-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/795a7052-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/79fbc7b0-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0

errors: No known data errors

It was mentioned by several people to use gpart recover /dev/da7 and gpart recovery but everyone that tried that said it didn't work. Some places even mention using a sysctl kern.geom.debugflags=0x10 before running the other commands. In their defense though, they seemed to have other issues that may have prevented the command from fixing everything anyway. It seems this issue was more widespread with FreeNAS .7 and on USB sticks.

So before I try either of those commands I'm curious as to if those are still the recommended ways to repair the issue or if I have something misunderstood. The threads I found were sometimes quite old(2011 or older). I'm just thinking someone should validate the correct command to execute for this error.

Some places even say this is an issue with FreeBSD and ZFS and should be ignored. But considering one disk has this issue and the rest don't I'm thinking this is something that should be fixed.

Any input from the FreeBSD wizards?

joeschmuck · Apr 2, 2013

Just curious, the AES testing we did, are our machines at risk now?

Hope you find a fix for yourself as well.

paleoN · Apr 2, 2013

cyberjock said:
So before I try either of those commands I'm curious as to if those are still the recommended ways to repair the issue or if I have something misunderstood. The threads I found were sometimes quite old(2011 or older).

I'd double check the kern.geom.debugflags setting as that value has changed with kernel versions. I don't recall what the correct current value is. I would try and fix it without it first anyway. Try a plain gpart recover /dev/da7 first. Then offline da7 and repeat. Then disable swap on da7, don't forget it's encrypted now, and repeat.

cyberjock said:
Some places even say this is an issue with FreeBSD and ZFS and should be ignored. But considering one disk has this issue and the rest don't I'm thinking this is something that should be fixed.

They just did their setup wrong.

joeschmuck said:
Just curious, the AES testing we did, are our machines at risk now?

Not unless you had a panic during testing.

cyberjock · Apr 2, 2013

Just thought I'd clarify, I'm not using encryption and I plan to roll back to 8.3.0 since hard drive serial numbers don't appear in the GUI. That's a pretty big deal in my opinion because it helps validate which serial number are on which hard drives if you have one you know are failiing. We've had plenty of people pull the wrong "da3" and now there is no way to verify that easily if the drive keeps offlining itself(since the GUI doesn't seem to update serials the instant a drive goes offline).

No, your machines aren't at risk. I tried to run the commands I have at http://forums.freenas.org/showthread.php?12157-Encryption-performance-benchmarks and that's what crashed my system. I don't think all of geli is installed, probably because of the limited space on the USB stick. Running those commands on FreeNAS will panic the system. However, running them from a FreeBSD liveCD works fine. I do have a warning on that thread that those commands will panic FreeNAS.

joeschmuck · Apr 3, 2013

I agree with you that the serial numbers in the GUI are important but in the ticket I submitted reporting the problem, the answer was they doubt they will produce a new -p1 version just because of that problem. I honestly don't see why that couldn't easily be built and issued.

Thanks for the reply, my pool is running fine but I had to ask.

paleoN · Apr 3, 2013

joeschmuck said:
I agree with you that the serial numbers in the GUI are important but in the ticket I submitted reporting the problem, the answer was they doubt they will produce a new -p1 version just because of that problem. I honestly don't see why that couldn't easily be built and issued.

Actually, they are doing a -p1 with the serial number fix, and several other fixes as well.

cyberjock · Apr 3, 2013

paleoN said:
Actually, they are doing a -p1 with the serial number fix, and several other fixes as well.

That's good to hear!

I've been helping a friend with his pfsense box all day, so I haven't tested the fix yet. Hopefully tonight!

joeschmuck · Apr 3, 2013

paleoN said:
Actually, they are doing a -p1 with the serial number fix, and several other fixes as well.

That is great news as I haven't heard they were actually going to do the -p1 for sure. Any idea of an expected date? (CyberJock, sorry to take over your thread with that question)

cyberjock · Apr 4, 2013

So I was planning to pull all the drives except the one that has the bad GPT partition just to prevent any errors(I'm very conservative about messing with partition tables). I still unplug all the hard drives except the one I plan to install an OS on when doing OS installations.

I SSHed into my box and I did:

Code:

[root@zuul] ~# gpart list da7
Geom name: da7
modified: false
state: CORRUPT
fwheads: 255
fwsectors: 63
last: 3907029134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da7p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 65536
   Mode: r1w1e1
   rawuuid: 762670b2-4a95-11e2-bca4-0015171496ae
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da7p2
   Mediasize: 1998251367936 (1.8T)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 2147549184
   Mode: r1w1e2
   rawuuid: 763790a1-4a95-11e2-bca4-0015171496ae
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 1998251367936
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 3907029134
   start: 4194432
Consumers:
1. Name: da7
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r2w2e5

[root@zuul] ~# gpart recover /dev/da7
da7 recovered
[root@zuul] ~# gpart list da7
Geom name: da7
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 3907029134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da7p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 65536
   Mode: r1w1e1
   rawuuid: 762670b2-4a95-11e2-bca4-0015171496ae
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da7p2
   Mediasize: 1998251367936 (1.8T)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 2147549184
   Mode: r1w1e2
   rawuuid: 763790a1-4a95-11e2-bca4-0015171496ae
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 1998251367936
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 3907029134
   start: 4194432
Consumers:
1. Name: da7
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r2w2e5

I did a reboot and all is well. So I guess the correct command (at least for my situation) was gpart recover /dev/da7 as root. Hopefully someone will find this useful in the future. I didn't have to do anything except run this command. No sysctl parameters or anything.

Now that I've seen this issue I'm inclined to see about this issue being included in my modified nightly emails so I can make sure I don't have this issue again. It would have been a PITA if I had been forced to recreate the partition and resilver the drive.

Edit: I'm wondering if there would be any downside from doing a "gpart recover" as part of a cronjob. Maybe I'm just too OCD about this since it was potentially a one time oopsie. I just have to wonder if the issue was actually from my panic or of it had been there for a while.

paleoN · Apr 4, 2013

cyberjock said:
I did a reboot and all is well. So I guess the correct command (at least for my situation) was gpart recover /dev/da7 as root.

This was with the pool imported and /dev/da7 online or with only /dev/da7 in the system?

cyberjock said:
Edit: I'm wondering if there would be any downside from doing a "gpart recover" as part of a cronjob.

Given that it touches the partition info I would always prefer manual intervention myself.

cyberjock · Apr 4, 2013

paleoN said:
This was with the pool imported and /dev/da7 online or with only /dev/da7 in the system?

I ran that command with the server up and running, all drives installed, zpool mounted and shared, serving files to my HTPC. After the table was fixed I figured I'd reboot just to make sure something wasn't going to fubar it again on bootup. No error message and all is still well.

It's a little scary to me that the partition tables aren't locked. I thought that FreeNAS had locked them unless you did some kind of trickery to make sure you weren't playing with things you shouldn't. I guess I shouldn't be "too" surprised. After all, they aren't protected at all in Windows or Linux.

paleoN · Apr 4, 2013

cyberjock said:
It's a little scary to me that the partition tables aren't locked. I thought that FreeNAS had locked them unless you did some kind of trickery to make sure you weren't playing with things you shouldn't.

It does for some simple cases at least. Try stomping on the partition table via dd. I assume gpart recover is blessed in that regard.

ProtoSD · Apr 4, 2013

paleoN said:
It does for some simple cases at least. Try stomping on the partition table via dd. I assume gpart recover is blessed in that regard.

Probably not, think about how the disks get wiped from the GUI.

paleoN · Apr 4, 2013

ProtoSD said:
Probably not, think about how the disks get wiped from the GUI.

Actually, it calls __gpt_unlabeldisk which does a swapoff, geli detach and gpart destroy -F. Followed by a gpart create -s gpt and another destroy. Then it runs the dd command.

ProtoSD · Apr 4, 2013

paleoN said:
Actually, it calls __gpt_unlabeldisk which does a swapoff, geli detach and gpart destroy -F. Followed by a gpart create -s gpt and another destroy. Then it runs the dd command.

I guess they've made some changes, it used to use dd for wiping the gpt stuff.

cyberjock · Apr 4, 2013

Hmm... is it possible to "backup" the partition table? Maybe I could back them up to a thumbdrive in case the table gets corrupted.

paleoN · Apr 5, 2013

ProtoSD said:
I guess they've made some changes, it used to use dd for wiping the gpt stuff.

It still does or maybe it's more for the ZFS metadata? The disk_wipe function calls the above first. Which means the dd commands run after the partition table has been destroyed by gpart.

cyberjock said:
Hmm... is it possible to "backup" the partition table? Maybe I could back them up to a thumbdrive in case the table gets corrupted.

One way, man gpart, but GPT already includes a backup partition table on-disk. Which is how you recovered. In your case if you have three disks with both GPT headers corrupt or the first sectors corrupt you likely have a much more significant problem anyway.

cyberjock · Apr 5, 2013

Actually, I'd need 4 to be in bad shape since I'm a RAIDZ3. Although with 3 disks bad I'd be sweating a little. :P

paleoN · Apr 5, 2013

cyberjock said:
Actually, I'd need 4 to be in bad shape since I'm a RAIDZ3.

Yes, of course raidz3. Which makes the point even more.

heek002 · Dec 26, 2013

cyberjock said:
Yesterday morning I was playing around with testing the AES-NI benchmark thread and I panic'd the machine during a scrub(automated every 1st and 15th at 0330. The issue may have occurred before now, but I only noticed it rebooting after the panic.

On bootup(and dmesg output) has:

Code:
GEOM: da7: the primary GPT table is corrupt or invalid. GEOM: da7: using the secondary instead -- recovery strongly advised.

I've spent the last 8 hours or so reading what others have done. I read every thread and nobody seems to have a solution they said worked aside from zeroing out the disk(or at least the gpt tables) and re-adding it to the array or using alternate utilities such as Parted Magic's 'gpt fdisk' command.

In the spirit of fixing this issue(and learning a little something) without using some other boot CD or wiping the drive, how do I fix this?

System specs:

FreeNAS-8.3.1-RELEASE-x64 (r13452)
E5606 with 20GB of RAM
ZFS v28 running 18x2TB on RAIDZ3

I've never had any problems with any of my disks, and reviewing the SMART data for the drive shows nothing to indicate anything is going wrong.

Here's some outputs that were commonly asked for by other people with the same issue...

Code:
# gpart show da7 => 34 3907029101 da7 GPT (1.8T) [CORRUPT] 34 94 - free - (47k) 128 4194304 1 freebsd-swap (2.0G) 4194432 3902834703 2 freebsd-zfs (1.8T) # gpart list da7 Geom name: da7 modified: false state: CORRUPT fwheads: 255 fwsectors: 63 last: 3907029134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: da7p1 Mediasize: 2147483648 (2.0G) Sectorsize: 512 Stripesize: 0 Stripeoffset: 65536 Mode: r1w1e1 rawuuid: 762670b2-4a95-11e2-bca4-0015171496ae rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b label: (null) length: 2147483648 offset: 65536 type: freebsd-swap index: 1 end: 4194431 start: 128 2. Name: da7p2 Mediasize: 1998251367936 (1.8T) Sectorsize: 512 Stripesize: 0 Stripeoffset: 2147549184 Mode: r1w1e2 rawuuid: 763790a1-4a95-11e2-bca4-0015171496ae rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: (null) length: 1998251367936 offset: 2147549184 type: freebsd-zfs index: 2 end: 3907029134 start: 4194432 Consumers: 1. Name: da7 Mediasize: 2000398934016 (1.8T) Sectorsize: 512 Mode: r2w2e5 # zpool status pool: tank state: ONLINE scan: scrub repaired 0 in 17h23m with 0 errors on Mon Apr 1 21:23:24 2013 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz3-0 ONLINE 0 0 0 gptid/6fbb91d5-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/70448fd2-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/70c0c7b3-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/713de0d5-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/71e3eea1-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/728458d2-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/7326aebc-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/73c64f27-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/7468c69a-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/75045f96-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/75a0096a-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/763790a1-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/76d701fa-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/77759c5c-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/78190bd3-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/78bb9173-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/795a7052-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/79fbc7b0-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 errors: No known data errors

It was mentioned by several people to use gpart recover /dev/da7 and gpart recovery but everyone that tried that said it didn't work. Some places even mention using a sysctl kern.geom.debugflags=0x10 before running the other commands. In their defense though, they seemed to have other issues that may have prevented the command from fixing everything anyway. It seems this issue was more widespread with FreeNAS .7 and on USB sticks.

So before I try either of those commands I'm curious as to if those are still the recommended ways to repair the issue or if I have something misunderstood. The threads I found were sometimes quite old(2011 or older). I'm just thinking someone should validate the correct command to execute for this error.

Some places even say this is an issue with FreeBSD and ZFS and should be ignored. But considering one disk has this issue and the rest don't I'm thinking this is something that should be fixed.

Any input from the FreeBSD wizards?

Thanks the gpart recover worked for me!. Happy..

Important Announcement for the TrueNAS Community.

SOLVED GPT table is corrupt or invalid error on bootup

Inactive Account

Old Man

Wizard

Inactive Account

Old Man

Wizard

Inactive Account

Old Man

Inactive Account

Wizard

Inactive Account

Wizard

MVP

Wizard

MVP

Inactive Account

Wizard

Inactive Account

Wizard

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "GPT table is corrupt or invalid error on bootup"

Similar threads