GPT Corrupted?

Status
Not open for further replies.

thegazer

Cadet
Joined
Oct 29, 2011
Messages
4
Decided to restart my Freenas 8.0.2 RC1 this morning, since my queue in sabnzbd become stuck.

After the restart I can't see my Media volume. The volume is a striped single disk zfs

I notice this message from dmesg

Code:
GEOM: da0s1: geometry does not match label (16h,63s != 255h,63s).
GEOM: ada1: the primary GPT table is corrupt or invalid.
GEOM: ada1: using the secondary instead -- recovery strongly advised.
Trying to mount root from ufs:/dev/ufs/FreeNASs1a
ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
            to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
ZFS filesystem version 4
ZFS storage pool version 15
(da0:umass-sim0:0:0:0): READ(10). CDB: 28 0 0 10 7 df 0 0 58 0 
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
(da0:umass-sim0:0:0:0): READ(10). CDB: 28 0 0 10 7 df 0 0 58 0 
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
(da0:umass-sim0:0:0:0): READ(10). CDB: 28 0 0 10 7 df 0 0 58 0 
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
(da0:umass-sim0:0:0:0): READ(10). CDB: 28 0 0 10 7 df 0 0 58 0 
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
(da0:umass-sim0:0:0:0): READ(10). CDB: 28 0 0 10 7 df 0 0 58 0 
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
g_vfs_done():ufs/FreeNASs1a[READ(offset=537849856, length=4096)]error = 5
g_vfs_done():ufs/FreeNASs1a[READ(offset=537862144, length=45056)]error = 5
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada1.


ada1 is my Media disk, it is a 2Tb disk.

I tried
Code:
gpart recover /dev/ada1


But it doesn't seem to do anything...

Help please?

Much appreciated in advance
 

thegazer

Cadet
Joined
Oct 29, 2011
Messages
4
..also, this is the current gpart show

Code:
[reza@Enterprise /]$ gpart show
=>     63  7826616  da0  MBR  (3.7G)
       63  1930257    1  freebsd  [active]  (943M)
  1930320       63       - free -  (32K)
  1930383  1930257    2  freebsd  (943M)
  3860640     3024    3  freebsd  (1.5M)
  3863664    41328    4  freebsd  (20M)
  3904992  3921687       - free -  (1.9G)

=>      0  1930257  da0s1  BSD  (943M)
        0       16         - free -  (8.0K)
       16  1930241      1  !0  (943M)

=>       34  488397101  ada0  GPT  (233G)
         34  488397101     1  freebsd-ufs  (233G)

=>        34  3907029101  ada1  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834703     2  freebsd-zfs  (1.8T)
 

dmt0

Dabbler
Joined
Oct 28, 2011
Messages
47

thegazer

Cadet
Joined
Oct 29, 2011
Messages
4
Thanks for the reply dmt0

After I did the got recover /dev/ada1, dmesg no longer complains about corupted GPT anymore, so do you think it is still the issue?

Meanwhile I tried these..

Code:
[reza@Enterprise] /# zpool import -D
no pools available to import
[reza@Enterprise] /# zpool import 
  pool: Media
    id: 6173278208893109116
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
	devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-3C
config:

	Media       UNAVAIL  insufficient replicas
	  ada1      UNAVAIL  cannot open


Here's what zdb showed me

Code:
[reza@Enterprise] /# zdb -l /dev/ada1
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
    version=15
    name='Media'
    state=0
    txg=599433
    pool_guid=6173278208893109116
    hostid=2211247409
    hostname='Enterprise.local'
    top_guid=8853252902912117650
    guid=8853252902912117650
    vdev_tree
        type='disk'
        id=0
        guid=8853252902912117650
        path='/dev/ada1'
        whole_disk=0
        metaslab_array=23
        metaslab_shift=34
        ashift=9
        asize=2000394125312
        is_log=0
        DTL=57
--------------------------------------------
LABEL 2
--------------------------------------------
    version=15
    name='Media'
    state=0
    txg=599433
    pool_guid=6173278208893109116
    hostid=2211247409
    hostname='Enterprise.local'
    top_guid=8853252902912117650
    guid=8853252902912117650
    vdev_tree
        type='disk'
        id=0
        guid=8853252902912117650
        path='/dev/ada1'
        whole_disk=0
        metaslab_array=23
        metaslab_shift=34
        ashift=9
        asize=2000394125312
        is_log=0
        DTL=57
--------------------------------------------
LABEL 3
--------------------------------------------
    version=15
    name='Media'
    state=0
    txg=599433
    pool_guid=6173278208893109116
    hostid=2211247409
    hostname='Enterprise.local'
    top_guid=8853252902912117650
    guid=8853252902912117650
    vdev_tree
        type='disk'
        id=0
        guid=8853252902912117650
        path='/dev/ada1'
        whole_disk=0
        metaslab_array=23
        metaslab_shift=34
        ashift=9
        asize=2000394125312
        is_log=0
        DTL=57
 

abienkow

Cadet
Joined
Mar 13, 2013
Messages
4
Thanks for the reply dmt0

After I did the got recover /dev/ada1, dmesg no longer complains about corupted GPT anymore, so do you think it is still the issue?

Meanwhile I tried these..

Code:
[reza@Enterprise] /# zpool import -D
no pools available to import
[reza@Enterprise] /# zpool import 
  pool: Media
    id: 6173278208893109116
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
	devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-3C
config:

	Media       UNAVAIL  insufficient replicas
	  ada1      UNAVAIL  cannot open


Here's what zdb showed me

Code:
[reza@Enterprise] /# zdb -l /dev/ada1
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
    version=15
    name='Media'
    state=0
    txg=599433
    pool_guid=6173278208893109116
    hostid=2211247409
    hostname='Enterprise.local'
    top_guid=8853252902912117650
    guid=8853252902912117650
    vdev_tree
        type='disk'
        id=0
        guid=8853252902912117650
        path='/dev/ada1'
        whole_disk=0
        metaslab_array=23
        metaslab_shift=34
        ashift=9
        asize=2000394125312
        is_log=0
        DTL=57
--------------------------------------------
LABEL 2
--------------------------------------------
    version=15
    name='Media'
    state=0
    txg=599433
    pool_guid=6173278208893109116
    hostid=2211247409
    hostname='Enterprise.local'
    top_guid=8853252902912117650
    guid=8853252902912117650
    vdev_tree
        type='disk'
        id=0
        guid=8853252902912117650
        path='/dev/ada1'
        whole_disk=0
        metaslab_array=23
        metaslab_shift=34
        ashift=9
        asize=2000394125312
        is_log=0
        DTL=57
--------------------------------------------
LABEL 3
--------------------------------------------
    version=15
    name='Media'
    state=0
    txg=599433
    pool_guid=6173278208893109116
    hostid=2211247409
    hostname='Enterprise.local'
    top_guid=8853252902912117650
    guid=8853252902912117650
    vdev_tree
        type='disk'
        id=0
        guid=8853252902912117650
        path='/dev/ada1'
        whole_disk=0
        metaslab_array=23
        metaslab_shift=34
        ashift=9
        asize=2000394125312
        is_log=0
        DTL=57

Is there a solution to this problem?

I have the same issue. I ran gpart recover /dev/ada1, and now my zfs system, a single drive system, is in the same state as mentioned in this thread.
 

abienkow

Cadet
Joined
Mar 13, 2013
Messages
4
Is there a solution to this problem?

I have the same issue. I ran gpart recover /dev/ada1, and now my zfs system, a single drive system, is in the same state as mentioned in this thread.

Any updates. I have the same issue as well; ran gpart recover as well.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Why would there be updates? The OP's dmesg output is telling him that he has a bad disk(da0) and he had a stripped array. There is no redundancy from any disk failures. So naturally, he lost all data. He likely got no response 2 years ago because his problem was his own mistakes. After a while the more senior guy stop answering forum posts that should be answered by the owner of the server. He should have been able to identify the errors, what they meant in relation to his problem, and how he could(or couldn't) access his data anymore.

Single drive systems have the same issue. No redundancy. So if a disk goes bad say goodbye to your data.
 

abienkow

Cadet
Joined
Mar 13, 2013
Messages
4
Why would there be updates? The OP's dmesg output is telling him that he has a bad disk(da0) and he had a stripped array. There is no redundancy from any disk failures. So naturally, he lost all data. He likely got no response 2 years ago because his problem was his own mistakes. After a while the more senior guy stop answering forum posts that should be answered by the owner of the server. He should have been able to identify the errors, what they meant in relation to his problem, and how he could(or couldn't) access his data anymore.

Single drive systems have the same issue. No redundancy. So if a disk goes bad say goodbye to your data.

Running gpart recover should not destroy a disk. My understanding is that is simply recovers the partition entry from a backup copy, which would be a small part at the beginning of the disk.

zdb -l shows all of the zfs entries intact except for the first one, which makes sense as for the above.

I think there should be a away to fix the zdb entries and recover the zfs filesystem, as the rest of the disk is untouched.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I think there should be a away to fix the zdb entries and recover the zfs filesystem, as the rest of the disk is untouched.

Not really. Once you have even a single bad sector on a drive, you are already losing data. The recoverability of the partition table, the file system, and the data contained in the files themselves will be based on what the failure mechanism is. Is there metal bits flying around inside the drive? Is the head failing? Is there a firmware issue and the head is writing junk data to the platters?

This "unknown" status of the drive's method of failure is precisely why every "data recovery shop" website you visit will have a message that says "if your data is very important you should turn off the drive and not power it on at all until it has been analyzed" by them in a clean room where they can open up the drive and see exactly what the problem is and hopefully allow you to recover your data. Just turning the drive on causes the drive to do some initial head seeks and the last thing you need is lose debris in the drive to be smeared around even more by the heads moving during the power-up diagnostics.

Once you start racking up lots of bad sectors it is typically a smart assumption to assume that virtually no data on the drive should be trusted any more than you absolutely have to. There's no telling how much additional damage will occur if the drive is used further, or to what extent the damage is at that has already occurred. Any attempt to recover data that requires you to use a gpart recovery or other recovery tools is a "Hail Mary" at best, and at worst it is destroying the exact data you are hoping to recover.

So no, while you may not have touched the rest of the disk, the bad sectors(and the cause for those failed/failing sectors) is unknown, and you won't be able to convince me that the damage is limited to "just" the sector used for the partition table.
 

abienkow

Cadet
Joined
Mar 13, 2013
Messages
4
Not really. Once you have even a single bad sector on a drive, you are already losing data. The recoverability of the partition table, the file system, and the data contained in the files themselves will be based on what the failure mechanism is. Is there metal bits flying around inside the drive? Is the head failing? Is there a firmware issue and the head is writing junk data to the platters?

This "unknown" status of the drive's method of failure is precisely why every "data recovery shop" website you visit will have a message that says "if your data is very important you should turn off the drive and not power it on at all until it has been analyzed" by them in a clean room where they can open up the drive and see exactly what the problem is and hopefully allow you to recover your data. Just turning the drive on causes the drive to do some initial head seeks and the last thing you need is lose debris in the drive to be smeared around even more by the heads moving during the power-up diagnostics.

Once you start racking up lots of bad sectors it is typically a smart assumption to assume that virtually no data on the drive should be trusted any more than you absolutely have to. There's no telling how much additional damage will occur if the drive is used further, or to what extent the damage is at that has already occurred. Any attempt to recover data that requires you to use a gpart recovery or other recovery tools is a "Hail Mary" at best, and at worst it is destroying the exact data you are hoping to recover.

So no, while you may not have touched the rest of the disk, the bad sectors(and the cause for those failed/failing sectors) is unknown, and you won't be able to convince me that the damage is limited to "just" the sector used for the partition table.

In my case I know there is no physical issues with the disk. I got into this state by human error, running the gpart recover. I'm simply trying to recover the zpool since its metadata was corrupted by the gpart recover step. My understanding is that zfs keeps multiple copies of the information and therefore it should be recoverable. Also the zdb output indicates only LABEL 0 is missing the other 4 label information are interact.

To get into this state I ran the following,

$ gpart create -s GPT ada3
$ gpart add -a 4k -b 2048 -t freebsd-zfs ada3
$ zpool create -f storage /dev/ada3

Later I had to reinstall my system, my system was installed on a different drive, and during the re-installation I notice GPT corrupted errors, so I ran gpart recover on the ada3 device. This caused the zfs pool not to be importable, as well I did not do zpool export before re-installing the system.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Sounds like your issues arent related to the original thread.
 
Last edited by a moderator:
Status
Not open for further replies.
Top