GPT Corrupted?

thegazer · Oct 29, 2011

Decided to restart my Freenas 8.0.2 RC1 this morning, since my queue in sabnzbd become stuck.

After the restart I can't see my Media volume. The volume is a striped single disk zfs

I notice this message from dmesg

Code:

GEOM: da0s1: geometry does not match label (16h,63s != 255h,63s).
GEOM: ada1: the primary GPT table is corrupt or invalid.
GEOM: ada1: using the secondary instead -- recovery strongly advised.
Trying to mount root from ufs:/dev/ufs/FreeNASs1a
ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
            to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
ZFS filesystem version 4
ZFS storage pool version 15
(da0:umass-sim0:0:0:0): READ(10). CDB: 28 0 0 10 7 df 0 0 58 0 
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
(da0:umass-sim0:0:0:0): READ(10). CDB: 28 0 0 10 7 df 0 0 58 0 
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
(da0:umass-sim0:0:0:0): READ(10). CDB: 28 0 0 10 7 df 0 0 58 0 
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
(da0:umass-sim0:0:0:0): READ(10). CDB: 28 0 0 10 7 df 0 0 58 0 
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
(da0:umass-sim0:0:0:0): READ(10). CDB: 28 0 0 10 7 df 0 0 58 0 
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
g_vfs_done():ufs/FreeNASs1a[READ(offset=537849856, length=4096)]error = 5
g_vfs_done():ufs/FreeNASs1a[READ(offset=537862144, length=45056)]error = 5
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada1.

ada1 is my Media disk, it is a 2Tb disk.

I tried

Code:

gpart recover /dev/ada1

But it doesn't seem to do anything...

Help please?

Much appreciated in advance

thegazer · Oct 29, 2011

..also, this is the current gpart show

Code:

[reza@Enterprise /]$ gpart show
=>     63  7826616  da0  MBR  (3.7G)
       63  1930257    1  freebsd  [active]  (943M)
  1930320       63       - free -  (32K)
  1930383  1930257    2  freebsd  (943M)
  3860640     3024    3  freebsd  (1.5M)
  3863664    41328    4  freebsd  (20M)
  3904992  3921687       - free -  (1.9G)

=>      0  1930257  da0s1  BSD  (943M)
        0       16         - free -  (8.0K)
       16  1930241      1  !0  (943M)

=>       34  488397101  ada0  GPT  (233G)
         34  488397101     1  freebsd-ufs  (233G)

=>        34  3907029101  ada1  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834703     2  freebsd-zfs  (1.8T)

dmt0 · Oct 29, 2011

Could be similar to my trouble:
http://forums.freenas.org/showthrea...d-quot-even-though-the-system-still-sees-them.

I ended up booting from a GParted Live CD/USB and running gdisk utility on it (gpt fdisk).
It automatically found that the secondary gpt is corrupt, and copied it from the primary one. I chose to Write the change and quit. That was it.

thegazer · Oct 29, 2011

Thanks for the reply dmt0

After I did the got recover /dev/ada1, dmesg no longer complains about corupted GPT anymore, so do you think it is still the issue?

Meanwhile I tried these..

Code:

[reza@Enterprise] /# zpool import -D
no pools available to import
[reza@Enterprise] /# zpool import 
  pool: Media
    id: 6173278208893109116
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
	devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-3C
config:

	Media       UNAVAIL  insufficient replicas
	  ada1      UNAVAIL  cannot open

Here's what zdb showed me

Code:

[reza@Enterprise] /# zdb -l /dev/ada1
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
    version=15
    name='Media'
    state=0
    txg=599433
    pool_guid=6173278208893109116
    hostid=2211247409
    hostname='Enterprise.local'
    top_guid=8853252902912117650
    guid=8853252902912117650
    vdev_tree
        type='disk'
        id=0
        guid=8853252902912117650
        path='/dev/ada1'
        whole_disk=0
        metaslab_array=23
        metaslab_shift=34
        ashift=9
        asize=2000394125312
        is_log=0
        DTL=57
--------------------------------------------
LABEL 2
--------------------------------------------
    version=15
    name='Media'
    state=0
    txg=599433
    pool_guid=6173278208893109116
    hostid=2211247409
    hostname='Enterprise.local'
    top_guid=8853252902912117650
    guid=8853252902912117650
    vdev_tree
        type='disk'
        id=0
        guid=8853252902912117650
        path='/dev/ada1'
        whole_disk=0
        metaslab_array=23
        metaslab_shift=34
        ashift=9
        asize=2000394125312
        is_log=0
        DTL=57
--------------------------------------------
LABEL 3
--------------------------------------------
    version=15
    name='Media'
    state=0
    txg=599433
    pool_guid=6173278208893109116
    hostid=2211247409
    hostname='Enterprise.local'
    top_guid=8853252902912117650
    guid=8853252902912117650
    vdev_tree
        type='disk'
        id=0
        guid=8853252902912117650
        path='/dev/ada1'
        whole_disk=0
        metaslab_array=23
        metaslab_shift=34
        ashift=9
        asize=2000394125312
        is_log=0
        DTL=57

abienkow · Mar 13, 2013

thegazer said:

Thanks for the reply dmt0

After I did the got recover /dev/ada1, dmesg no longer complains about corupted GPT anymore, so do you think it is still the issue?

Meanwhile I tried these..

Code:

[reza@Enterprise] /# zpool import -D
no pools available to import
[reza@Enterprise] /# zpool import 
  pool: Media
    id: 6173278208893109116
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
	devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-3C
config:

	Media       UNAVAIL  insufficient replicas
	  ada1      UNAVAIL  cannot open

Here's what zdb showed me

Code:

[reza@Enterprise] /# zdb -l /dev/ada1
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
    version=15
    name='Media'
    state=0
    txg=599433
    pool_guid=6173278208893109116
    hostid=2211247409
    hostname='Enterprise.local'
    top_guid=8853252902912117650
    guid=8853252902912117650
    vdev_tree
        type='disk'
        id=0
        guid=8853252902912117650
        path='/dev/ada1'
        whole_disk=0
        metaslab_array=23
        metaslab_shift=34
        ashift=9
        asize=2000394125312
        is_log=0
        DTL=57
--------------------------------------------
LABEL 2
--------------------------------------------
    version=15
    name='Media'
    state=0
    txg=599433
    pool_guid=6173278208893109116
    hostid=2211247409
    hostname='Enterprise.local'
    top_guid=8853252902912117650
    guid=8853252902912117650
    vdev_tree
        type='disk'
        id=0
        guid=8853252902912117650
        path='/dev/ada1'
        whole_disk=0
        metaslab_array=23
        metaslab_shift=34
        ashift=9
        asize=2000394125312
        is_log=0
        DTL=57
--------------------------------------------
LABEL 3
--------------------------------------------
    version=15
    name='Media'
    state=0
    txg=599433
    pool_guid=6173278208893109116
    hostid=2211247409
    hostname='Enterprise.local'
    top_guid=8853252902912117650
    guid=8853252902912117650
    vdev_tree
        type='disk'
        id=0
        guid=8853252902912117650
        path='/dev/ada1'
        whole_disk=0
        metaslab_array=23
        metaslab_shift=34
        ashift=9
        asize=2000394125312
        is_log=0
        DTL=57

Is there a solution to this problem?

I have the same issue. I ran gpart recover /dev/ada1, and now my zfs system, a single drive system, is in the same state as mentioned in this thread.

abienkow · Mar 23, 2013

abienkow said:
Is there a solution to this problem?

I have the same issue. I ran gpart recover /dev/ada1, and now my zfs system, a single drive system, is in the same state as mentioned in this thread.

Any updates. I have the same issue as well; ran gpart recover as well.

cyberjock · Mar 23, 2013

Why would there be updates? The OP's dmesg output is telling him that he has a bad disk(da0) and he had a stripped array. There is no redundancy from any disk failures. So naturally, he lost all data. He likely got no response 2 years ago because his problem was his own mistakes. After a while the more senior guy stop answering forum posts that should be answered by the owner of the server. He should have been able to identify the errors, what they meant in relation to his problem, and how he could(or couldn't) access his data anymore.

Single drive systems have the same issue. No redundancy. So if a disk goes bad say goodbye to your data.

abienkow · Mar 24, 2013

cyberjock said:
Why would there be updates? The OP's dmesg output is telling him that he has a bad disk(da0) and he had a stripped array. There is no redundancy from any disk failures. So naturally, he lost all data. He likely got no response 2 years ago because his problem was his own mistakes. After a while the more senior guy stop answering forum posts that should be answered by the owner of the server. He should have been able to identify the errors, what they meant in relation to his problem, and how he could(or couldn't) access his data anymore.

Single drive systems have the same issue. No redundancy. So if a disk goes bad say goodbye to your data.

Running gpart recover should not destroy a disk. My understanding is that is simply recovers the partition entry from a backup copy, which would be a small part at the beginning of the disk.

zdb -l shows all of the zfs entries intact except for the first one, which makes sense as for the above.

I think there should be a away to fix the zdb entries and recover the zfs filesystem, as the rest of the disk is untouched.

cyberjock · Mar 25, 2013

abienkow said:
I think there should be a away to fix the zdb entries and recover the zfs filesystem, as the rest of the disk is untouched.

Not really. Once you have even a single bad sector on a drive, you are already losing data. The recoverability of the partition table, the file system, and the data contained in the files themselves will be based on what the failure mechanism is. Is there metal bits flying around inside the drive? Is the head failing? Is there a firmware issue and the head is writing junk data to the platters?

This "unknown" status of the drive's method of failure is precisely why every "data recovery shop" website you visit will have a message that says "if your data is very important you should turn off the drive and not power it on at all until it has been analyzed" by them in a clean room where they can open up the drive and see exactly what the problem is and hopefully allow you to recover your data. Just turning the drive on causes the drive to do some initial head seeks and the last thing you need is lose debris in the drive to be smeared around even more by the heads moving during the power-up diagnostics.

Once you start racking up lots of bad sectors it is typically a smart assumption to assume that virtually no data on the drive should be trusted any more than you absolutely have to. There's no telling how much additional damage will occur if the drive is used further, or to what extent the damage is at that has already occurred. Any attempt to recover data that requires you to use a gpart recovery or other recovery tools is a "Hail Mary" at best, and at worst it is destroying the exact data you are hoping to recover.

So no, while you may not have touched the rest of the disk, the bad sectors(and the cause for those failed/failing sectors) is unknown, and you won't be able to convince me that the damage is limited to "just" the sector used for the partition table.

abienkow · Mar 25, 2013

cyberjock said:
Not really. Once you have even a single bad sector on a drive, you are already losing data. The recoverability of the partition table, the file system, and the data contained in the files themselves will be based on what the failure mechanism is. Is there metal bits flying around inside the drive? Is the head failing? Is there a firmware issue and the head is writing junk data to the platters?

This "unknown" status of the drive's method of failure is precisely why every "data recovery shop" website you visit will have a message that says "if your data is very important you should turn off the drive and not power it on at all until it has been analyzed" by them in a clean room where they can open up the drive and see exactly what the problem is and hopefully allow you to recover your data. Just turning the drive on causes the drive to do some initial head seeks and the last thing you need is lose debris in the drive to be smeared around even more by the heads moving during the power-up diagnostics.

Once you start racking up lots of bad sectors it is typically a smart assumption to assume that virtually no data on the drive should be trusted any more than you absolutely have to. There's no telling how much additional damage will occur if the drive is used further, or to what extent the damage is at that has already occurred. Any attempt to recover data that requires you to use a gpart recovery or other recovery tools is a "Hail Mary" at best, and at worst it is destroying the exact data you are hoping to recover.

So no, while you may not have touched the rest of the disk, the bad sectors(and the cause for those failed/failing sectors) is unknown, and you won't be able to convince me that the damage is limited to "just" the sector used for the partition table.

In my case I know there is no physical issues with the disk. I got into this state by human error, running the gpart recover. I'm simply trying to recover the zpool since its metadata was corrupted by the gpart recover step. My understanding is that zfs keeps multiple copies of the information and therefore it should be recoverable. Also the zdb output indicates only LABEL 0 is missing the other 4 label information are interact.

To get into this state I ran the following,

$ gpart create -s GPT ada3
$ gpart add -a 4k -b 2048 -t freebsd-zfs ada3
$ zpool create -f storage /dev/ada3

Later I had to reinstall my system, my system was installed on a different drive, and during the re-installation I notice GPT corrupted errors, so I ran gpart recover on the ada3 device. This caused the zfs pool not to be importable, as well I did not do zpool export before re-installing the system.

cyberjock · Mar 25, 2013

Sounds like your issues arent related to the original thread.

Important Announcement for the TrueNAS Community.

GPT Corrupted?

thegazer

Cadet

thegazer

Cadet

dmt0

Dabbler

thegazer

Cadet

abienkow

Cadet

abienkow

Cadet

cyberjock

Inactive Account

abienkow

Cadet

cyberjock

Inactive Account

abienkow

Cadet

cyberjock

Inactive Account

Similar threads

Important Announcement for the TrueNAS Community.

GPT Corrupted?

Cadet

Cadet

Dabbler

Cadet

Cadet

Cadet

Inactive Account

Cadet

Inactive Account

Cadet

Inactive Account

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "GPT Corrupted?"

Similar threads