Played stupid games, won a stupid prize (AKA: pool metadata is corrupted)

malcolmputer · Sep 25, 2016

Let's get this out of the way, this hardware is so unsupported it's not even funny.

Problems with the build:

No ECC (standard DDR3 desktop RAM)
AMD processor (Althlon 5350)
Consumer motherboard (AM1 with onboard sound, no IPMI)
Unsupported RAID card in Pass-through mode (ARC-1220). It was added in 9.10, is relatively untested and while it claims to have a pass through mode, who knows how close to a HBA it actually is.
Desktop case (drives run hot)
No UPS (there was a rain storm last night, power remained up, but after a reboot pool won't import)

With all of that in mind, after re-booting the freenas box this morning I get:

Code:

[root@freenas] ~# zpool import
  pool: storage
  id: 11770584371817827292
  state: FAULTED
 status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
  The pool may be active on another system, but can be imported using
  the '-f' flag.
  see: http://illumos.org/msg/ZFS-8000-72
 config:

  storage  FAULTED  corrupted data
  raidz2-0  FAULTED  corrupted data
  gptid/fdb4f6fa-ee2b-11e4-a89f-d05099952c84  ONLINE
  gptid/fdf03ba4-ee2b-11e4-a89f-d05099952c84  ONLINE
  gptid/fec02a28-ee2b-11e4-a89f-d05099952c84  ONLINE
  gptid/fef7d8c4-ee2b-11e4-a89f-d05099952c84  ONLINE
  gptid/ff4d01fc-ee2b-11e4-a89f-d05099952c84  ONLINE
  gptid/ffd40d94-ee2b-11e4-a89f-d05099952c84  ONLINE

If the pool is hosed, I'm ok with that. I expected serious data loss with the un-supported-ness of this hardware, but I curious to see if I can get it back.

DrKK · Sep 25, 2016

That looks lovely sir.

You should probably wait for a second opinion, but I think you ought to try the -f option, and expect about a 20% chance of success.

m0nkey_ · Sep 25, 2016

You can try clearing the error on the pool:
zpool clear -F <pool>
And then import it again:
zpool import <pool>
Should that fail, you can attempt to force it:
zpool import -F <pool>

With that said, if you do manage to get it mounted, it won't be long before it implodes again. Backup your data.

malcolmputer · Sep 25, 2016

@m0nkey_ Looks like that did it. Don't worry, This is just me goofing off with plugins on my "test" system. Not going into production, and no real data is going on it.

@DrKK It's my favorite FreeNAS box. It's so unsupported it forces you to learn how FreeNAS works because it's going to break on you. Also, while you are cringing, the RAID adapter isn't battery backed and is in write-back mode.

Code:

[root@freenas] ~# zpool clear -F storage
cannot open 'storage': no such pool
[root@freenas] ~# zpool import storage
cannot import 'storage': I/O error
  Recovery is possible, but will result in some data loss.
  Returning the pool to its state as of Sun Sep 25 02:20:26 2016
  should correct the problem.  Approximately 619 minutes of data
  must be discarded, irreversibly.  Recovery can be attempted
  by executing 'zpool import -F storage'.  A scrub of the pool
  is strongly recommended after recovery.
[root@freenas] ~# zpool import -F storage
[root@freenas] ~# zpool status
  pool: freenas-boot
 state: ONLINE
  scan: none requested
config:

  NAME  STATE  READ WRITE CKSUM
  freenas-boot  ONLINE  0  0  0
  mirror-0  ONLINE  0  0  0
  gptid/76bb85a8-ee28-11e4-b3f2-d05099952c84  ONLINE  0  0  0
  gptid/76e7be40-ee28-11e4-b3f2-d05099952c84  ONLINE  0  0  0

errors: No known data errors

  pool: storage
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Sep 25 00:00:24 2016
config:

  NAME  STATE  READ WRITE CKSUM
  storage  ONLINE  0  0  0
  raidz2-0  ONLINE  0  0  0
  gptid/fdb4f6fa-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/fdf03ba4-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/fec02a28-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/fef7d8c4-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/ff4d01fc-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/ffd40d94-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0

errors: No known data errors

Thanks for the help guys.

danb35 · Sep 25, 2016

Try running a scrub of storage, and see if it reports any errors. I'm frankly surprised that the pool keeps enough history to revert over 10 hours to get to a clean state.

malcolmputer · Sep 25, 2016

danb35 said:
Try running a scrub of storage, and see if it reports any errors. I'm frankly surprised that the pool keeps enough history to revert over 10 hours to get to a clean state.

I think it helps that it is only 3.6GB used in the pool, but yeah, it completed just fine:

Code:

  pool: storage
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Sep 25 15:39:53 2016
config:

  NAME  STATE  READ WRITE CKSUM
  storage  ONLINE  0  0  0
  raidz2-0  ONLINE  0  0  0
  gptid/fdb4f6fa-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/fdf03ba4-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/fec02a28-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/fef7d8c4-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/ff4d01fc-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0
  gptid/ffd40d94-ee2b-11e4-a89f-d05099952c84  ONLINE  0  0  0

errors: No known data errors

danb35 · Sep 25, 2016

Given that, I'm not sure I agree with @m0nkey_ that the death of your pool is imminent. Since it's only a test system anyway, I'd say carry on and see what else breaks.

Stux · Sep 25, 2016

danb35 said:
Try running a scrub of storage, and see if it reports any errors. I'm frankly surprised that the pool keeps enough history to revert over 10 hours to get to a clean state.

Well, it's CoW.

Basically, it's a big stack of changes starting with the uber block. Just disassemble the top layers of the tower and you end up 10 hours earlier.

Where it gets dicy is when you overwrite free space. Which of course doesn't happen if you have the 'free space' referred to in a snapshot

rs225 · Sep 26, 2016

danb35 said:
I'm frankly surprised that the pool keeps enough history to revert over 10 hours to get to a clean state.

The answer is it doesn't. But if that was the amount of time between the final semi-TXG, and the previous TXG, you can get an output like that.

If there is a single problem that might have caused this, I would guess the RAID card being in write-back mode.

Important Announcement for the TrueNAS Community.

Played stupid games, won a stupid prize (AKA: pool metadata is corrupted)

malcolmputer

Explorer

DrKK

FreeNAS Generalissimo

m0nkey_

MVP

malcolmputer

Explorer

danb35

Hall of Famer

malcolmputer

Explorer

danb35

Hall of Famer

Stux

MVP

rs225

Guru

Similar threads