Played stupid games, won a stupid prize (AKA: pool metadata is corrupted)

Status
Not open for further replies.

malcolmputer

Explorer
Joined
Oct 28, 2013
Messages
55
Let's get this out of the way, this hardware is so unsupported it's not even funny.

Problems with the build:
  • No ECC (standard DDR3 desktop RAM)
  • AMD processor (Althlon 5350)
  • Consumer motherboard (AM1 with onboard sound, no IPMI)
  • Unsupported RAID card in Pass-through mode (ARC-1220). It was added in 9.10, is relatively untested and while it claims to have a pass through mode, who knows how close to a HBA it actually is.
  • Desktop case (drives run hot)
  • No UPS (there was a rain storm last night, power remained up, but after a reboot pool won't import)
With all of that in mind, after re-booting the freenas box this morning I get:

Code:
[root@freenas] ~# zpool import
  pool: storage
  id: 11770584371817827292
  state: FAULTED
 status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
  The pool may be active on another system, but can be imported using
  the '-f' flag.
  see: http://illumos.org/msg/ZFS-8000-72
 config:

  storage  FAULTED  corrupted data
  raidz2-0  FAULTED  corrupted data
  gptid/fdb4f6fa-ee2b-11e4-a89f-d05099952c84  ONLINE
  gptid/fdf03ba4-ee2b-11e4-a89f-d05099952c84  ONLINE
  gptid/fec02a28-ee2b-11e4-a89f-d05099952c84  ONLINE
  gptid/fef7d8c4-ee2b-11e4-a89f-d05099952c84  ONLINE
  gptid/ff4d01fc-ee2b-11e4-a89f-d05099952c84  ONLINE
  gptid/ffd40d94-ee2b-11e4-a89f-d05099952c84  ONLINE


If the pool is hosed, I'm ok with that. I expected serious data loss with the un-supported-ness of this hardware, but I curious to see if I can get it back.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
That looks lovely sir.

You should probably wait for a second opinion, but I think you ought to try the -f option, and expect about a 20% chance of success.
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
You can try clearing the error on the pool:
zpool clear -F <pool>
And then import it again:
zpool import <pool>
Should that fail, you can attempt to force it:
zpool import -F <pool>

With that said, if you do manage to get it mounted, it won't be long before it implodes again. Backup your data.
 

malcolmputer

Explorer
Joined
Oct 28, 2013
Messages
55
@m0nkey_ Looks like that did it. Don't worry, This is just me goofing off with plugins on my "test" system. Not going into production, and no real data is going on it.

@DrKK It's my favorite FreeNAS box. It's so unsupported it forces you to learn how FreeNAS works because it's going to break on you. Also, while you are cringing, the RAID adapter isn't battery backed and is in write-back mode.

Code:
[root@freenas] ~# zpool clear -F storage
cannot open 'storage': no such pool
[root@freenas] ~# zpool import storage
cannot import 'storage': I/O error
  Recovery is possible, but will result in some data loss.
  Returning the pool to its state as of Sun Sep 25 02:20:26 2016
  should correct the problem.  Approximately 619 minutes of data
  must be discarded, irreversibly.  Recovery can be attempted
  by executing 'zpool import -F storage'.  A scrub of the pool
  is strongly recommended after recovery.
[root@freenas] ~# zpool import -F storage
[root@freenas] ~# zpool status
  pool: freenas-boot
 state: ONLINE
  scan: none requested
config:

  NAME  STATE  READ WRITE CKSUM
  freenas-boot  ONLINE  0  0  0
  mirror-0  ONLINE  0  0  0
  gptid/76bb85a8-ee28-11e4-b3f2-d05099952c84  ONLINE  0  0  0
  gptid/76e7be40-ee28-11e4-b3f2-d05099952c84  ONLINE  0  0  0

errors: No known data errors

  pool: storage
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Sep 25 00:00:24 2016
config:

  NAME  STATE  READ WRITE CKSUM
  storage  ONLINE  0  0  0
  raidz2-0  ONLINE  0  0  0
  gptid/fdb4f6fa-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/fdf03ba4-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/fec02a28-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/fef7d8c4-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/ff4d01fc-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/ffd40d94-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0

errors: No known data errors


Thanks for the help guys.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Try running a scrub of storage, and see if it reports any errors. I'm frankly surprised that the pool keeps enough history to revert over 10 hours to get to a clean state.
 

malcolmputer

Explorer
Joined
Oct 28, 2013
Messages
55
Try running a scrub of storage, and see if it reports any errors. I'm frankly surprised that the pool keeps enough history to revert over 10 hours to get to a clean state.

I think it helps that it is only 3.6GB used in the pool, but yeah, it completed just fine:

Code:
  pool: storage
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Sep 25 15:39:53 2016
config:

  NAME  STATE  READ WRITE CKSUM
  storage  ONLINE  0  0  0
  raidz2-0  ONLINE  0  0  0
  gptid/fdb4f6fa-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/fdf03ba4-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/fec02a28-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/fef7d8c4-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/ff4d01fc-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/ffd40d94-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0

errors: No known data errors

 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Given that, I'm not sure I agree with @m0nkey_ that the death of your pool is imminent. Since it's only a test system anyway, I'd say carry on and see what else breaks.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Try running a scrub of storage, and see if it reports any errors. I'm frankly surprised that the pool keeps enough history to revert over 10 hours to get to a clean state.

Well, it's CoW.

Basically, it's a big stack of changes starting with the uber block. Just disassemble the top layers of the tower and you end up 10 hours earlier.

Where it gets dicy is when you overwrite free space. Which of course doesn't happen if you have the 'free space' referred to in a snapshot
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
I'm frankly surprised that the pool keeps enough history to revert over 10 hours to get to a clean state.

The answer is it doesn't. But if that was the amount of time between the final semi-TXG, and the previous TXG, you can get an output like that.

If there is a single problem that might have caused this, I would guess the RAID card being in write-back mode.
 
Status
Not open for further replies.
Top