SOLVED ZPool Cache switched to Stripe

Soljia · Mar 9, 2015

So, a little back story. I got a new HBA (M1015 in IT Mode), yay. I shutdown all services relying on the ZPOOL, and exported it. I then did my shutdown and installed the new card and hooked my drives up to the new HBA. Fired it back up, and did an Auto Import. The ZPool seemed to be exactly the way it was before. Did some testing and was getting terrible performance from before. So, fast forward to today. I was doing some searching for performance issues, and when I did

Code:

zpool status

I noticed that my cache device looks like its not longer a cache device.

Code:


[root@freenas] ~# zpool status -v
  pool: Storage
state: ONLINE
  scan: resilvered 682G in 7h6m with 0 errors on Fri Feb  6 05:38:27 2015
config:

  NAME  STATE  READ WRITE CKSUM
  Storage  ONLINE  0  0  0
  gptid/c3a1dbf7-255c-11e4-bcf1-0015175b2e90  ONLINE  0  0  0
  raidz3-1  ONLINE  0  0  0
  gptid/c4442745-255c-11e4-bcf1-0015175b2e90  ONLINE  0  0  0
  gptid/c4fd106d-255c-11e4-bcf1-0015175b2e90  ONLINE  0  0  0
  gptid/ba1b044f-adb0-11e4-a29c-0015175b2e90  ONLINE  0  0  0
  gptid/c666e636-255c-11e4-bcf1-0015175b2e90  ONLINE  0  0  0
  gptid/c716e721-255c-11e4-bcf1-0015175b2e90  ONLINE  0  0  0

So, did some googling and found another thread with someone with a similar issue (thread). If you don't want to read it, the general basis is...I used a debug code. (Still a noob at this stuff)

Code:


[root@freenas] ~# sysctl kern.geom.debugflags=0x10
kern.geom.debugflags: 0 -> 16
[root@freenas] ~#
[root@freenas] ~#
[root@freenas] ~# zpool add Storage cache da0

and ended up with this.

Code:


[root@freenas] ~# zpool status -v
  pool: Storage
state: ONLINE
  scan: resilvered 682G in 7h6m with 0 errors on Fri Feb  6 05:38:27 2015
config:

  NAME  STATE  READ WRITE CKSUM
  Storage  ONLINE  0  0  0
  gptid/c3a1dbf7-255c-11e4-bcf1-0015175b2e90  ONLINE  0  0  0
  raidz3-1  ONLINE  0  0  0
  gptid/c4442745-255c-11e4-bcf1-0015175b2e90  ONLINE  0  0  0
  gptid/c4fd106d-255c-11e4-bcf1-0015175b2e90  ONLINE  0  0  0
  gptid/ba1b044f-adb0-11e4-a29c-0015175b2e90  ONLINE  0  0  0
  gptid/c666e636-255c-11e4-bcf1-0015175b2e90  ONLINE  0  0  0
  gptid/c716e721-255c-11e4-bcf1-0015175b2e90  ONLINE  0  0  0
  cache
  da0  ONLINE  0  0  0

This still looked super weird to me. However, it seemed to be working. I reconnected my ESX host via iSCSI. VM's wouldn't boot. Interesting. Checked the zpool status -v and I saw the my Storage/VMs dataset was in permanent error. Ok, fine. I have snapshots, restored from a snapshot. Tried again with the VMs (unmounted, detached, turned off iscsi, and then reverse of course). Still got boot errors.

So, I decided to run a scrub. Ya, that didn't go so hot. I was up to 2900 errors. I stopped the scrub. So, far the only affected things really aren't that important. Honestly, I wouldn't mind losing the whole pool, but I don't have to, then bonus. Anyways, I removed da0 from the cache device. I looked up how to fix the cache device showing up as it is above. I took a look on the interface and saw something even more weird. The drive is now showing up as being part of a stripe. (see attached). At this point, I'm mother fucking myself. So... I tried anyway I could to remove the drive from the pool. However, it all failed, which I'm assuming is for good reason. My guess at this point, is I probably need to copy everything I can to another location outside the zpool and just kill it with fire. However, again, if I don't need to then awesome.

Also, I still have access to all the data (except for my VM dataset), so... it's still alive. Help appreciated.

Please feel free to call me a moron.

cyberjock · Mar 9, 2015

Yeah, you are a moron.

Can you provide the output for "zpool history" and a debug file? Do *not* reboot this box. I'd also recommend you stop all file sharing services and hold tight.

You are using da0 for a strip AND a cache device simultaneously. If that stripe disk ends up with corruption that prevents mounting, your whole pool is gone forever.

I can promise you two things:

1. You did this by adding the cache device from the CLI. The WebGUI does NOT add cache devices as whole disks.
2. You are going to have to rebuild the pool, so better find free space somewhere.

Soljia · Mar 9, 2015

The debug is coming...

Code:

[root@freenas] /mnt/Storage/DVR/humes/Videos# zpool history
History for 'Storage':
2014-08-16.11:48:38 zpool create -o cachefile=/data/zfs/zpool.cache -o failmode=continue -o autoexpand=on -O compression=lz4 -O aclmode=passthrough -O aclinherit=passthrough -f -m /Storage -o altroot=/mnt Storage /dev/gptid/c3a1dbf7-255c-11e4-bcf1-0015175b2e90 raidz3 /dev/gptid/c4442745-255c-11e4-bcf1-0015175b2e90 /dev/gptid/c4fd106d-255c-11e4-bcf1-0015175b2e90 /dev/gptid/c5b20182-255c-11e4-bcf1-0015175b2e90 /dev/gptid/c666e636-255c-11e4-bcf1-0015175b2e90 /dev/gptid/c716e721-255c-11e4-bcf1-0015175b2e90
2014-08-16.11:48:43 zfs inherit mountpoint Storage
2014-08-16.11:48:43 zpool set cachefile=/data/zfs/zpool.cache Storage
2014-08-16.11:52:07 zfs create -o volblocksize=64k -V 1TB Storage/VMs
2014-08-31.12:06:33 zpool import -c /data/zfs/zpool.cache.saved -o cachefile=none -R /mnt -f 6609984130854588888
2014-08-31.12:06:33 zpool set cachefile=/data/zfs/zpool.cache Storage
2014-09-21.00:00:09 zpool scrub Storage
2014-09-30.20:11:03 zfs create -V 1T Storage/Samba
2014-09-30.20:11:38 zfs create Storage/CIFS
2014-11-02.00:00:09 zpool scrub Storage
2014-12-07.00:00:14 zpool scrub Storage
2015-01-09.18:38:55 zfs create -V 500g Storage/DVR
2015-01-09.18:41:07 zfs destroy Storage/DVR
2015-01-09.18:41:45 zfs create Storage/DVR
2015-01-18.00:00:30 zpool scrub Storage
2015-01-20.09:00:08 zfs snapshot Storage/VMs@auto-20150120.0900-2w
2015-01-21.09:00:08 zfs snapshot Storage/VMs@auto-20150121.0900-2w
2015-01-22.09:00:10 zfs snapshot Storage/VMs@auto-20150122.0900-2w
2015-01-23.09:00:06 zfs snapshot Storage/VMs@auto-20150123.0900-2w
2015-01-24.08:33:44 zfs set compression=lz4 Storage
2015-01-24.08:33:47 zfs inherit atime Storage
2015-01-24.08:33:50 zfs inherit dedup Storage
2015-01-24.08:33:50 zfs set refquota=none Storage
2015-01-24.08:33:55 zfs set refreservation=0 Storage
2015-01-26.09:00:24 zfs snapshot Storage/VMs@auto-20150126.0900-2w
2015-01-27.09:00:10 zfs snapshot Storage/VMs@auto-20150127.0900-2w
2015-01-28.09:00:07 zfs snapshot Storage/VMs@auto-20150128.0900-2w
2015-01-29.09:00:08 zfs snapshot Storage/VMs@auto-20150129.0900-2w
2015-01-30.09:00:08 zfs snapshot Storage/VMs@auto-20150130.0900-2w
2015-02-02.18:38:30 zpool import -c /data/zfs/zpool.cache.saved -o cachefile=none -R /mnt -f 6609984130854588888
2015-02-02.18:38:30 zpool set cachefile=/data/zfs/zpool.cache Storage
2015-02-02.18:55:34 zpool clear Storage
2015-02-02.21:23:44 zfs create Storage/CIFS/rick
2015-02-03.09:00:02 zfs snapshot Storage/VMs@auto-20150203.0900-2w
2015-02-03.09:00:16 zfs destroy -r -d Storage/VMs@auto-20150120.0900-2w
2015-02-03.15:58:19 zfs create Storage/Jails
2015-02-03.16:09:25 zfs destroy Storage/Samba
2015-02-03.16:15:36 zfs create -o mountpoint=/Storage/Jails/.warden-template-standard -p Storage/Jails/.warden-template-standard
2015-02-03.16:16:32 zfs snapshot Storage/Jails/.warden-template-standard@clean
2015-02-03.16:16:35 zfs clone Storage/Jails/.warden-template-standard@clean Storage/Jails/owncloud
2015-02-04.09:00:01 zfs snapshot Storage/VMs@auto-20150204.0900-2w
2015-02-04.09:00:16 zfs destroy -r -d Storage/VMs@auto-20150121.0900-2w
2015-02-05.09:00:02 zfs snapshot Storage/VMs@auto-20150205.0900-2w
2015-02-05.09:00:16 zfs destroy -r -d Storage/VMs@auto-20150122.0900-2w
2015-02-05.22:04:01 zpool import -c /data/zfs/zpool.cache.saved -o cachefile=none -R /mnt -f 6609984130854588888
2015-02-05.22:04:01 zpool set cachefile=/data/zfs/zpool.cache Storage
2015-02-05.22:04:19 zfs set mountpoint=none Storage/Jails/.warden-template-standard
2015-02-05.22:04:19 zfs rename -f Storage/Jails/.warden-template-standard Storage/Jails/.warden-template-standard-9.2-RELEASE-x64
2015-02-05.22:04:24 zfs set mountpoint=/mnt/Storage/Jails/.warden-template-standard-9.2-RELEASE-x64 Storage/Jails/.warden-template-standard-9.2-RELEASE-x64
2015-02-05.22:29:37 zpool import -c /data/zfs/zpool.cache.saved -o cachefile=none -R /mnt -f 6609984130854588888
2015-02-05.22:29:37 zpool set cachefile=/data/zfs/zpool.cache Storage
2015-02-05.22:29:39 zfs inherit -r mountpoint Storage
2015-02-05.22:29:44 zfs set aclmode=restricted Storage/CIFS/rick
2015-02-05.22:29:50 zfs create Storage/.system
2015-02-05.22:29:50 zfs create Storage/.system/cores
2015-02-05.22:29:50 zfs create Storage/.system/samba4
2015-02-05.22:29:50 zfs create Storage/.system/syslog-4955d2f387c34d78ab543df9f9349c7e
2015-02-05.22:29:53 zfs create Storage/.system/rrd-4955d2f387c34d78ab543df9f9349c7e
2015-02-05.22:32:19 zpool replace Storage 10749019550730847106 gptid/ba1b044f-adb0-11e4-a29c-0015175b2e90
2015-02-06.09:00:02 zfs snapshot Storage/VMs@auto-20150206.0900-2w
2015-02-06.09:00:12 zfs destroy -r -d Storage/VMs@auto-20150123.0900-2w
2015-02-06.21:00:06 zfs snapshot Storage/VMs@auto-20150206.2100-2w
2015-02-07.09:00:06 zfs snapshot Storage/VMs@auto-20150207.0900-2w
2015-02-07.21:00:07 zfs snapshot Storage/VMs@auto-20150207.2100-2w
2015-02-08.09:00:07 zfs snapshot Storage/VMs@auto-20150208.0900-2w
2015-02-08.12:55:39 zfs create -o mountpoint=/Storage/Jails/.warden-template-pluginjail -p Storage/Jails/.warden-template-pluginjail
2015-02-08.13:08:52 zfs snapshot Storage/Jails/.warden-template-pluginjail@clean
2015-02-08.13:08:55 zfs clone Storage/Jails/.warden-template-pluginjail@clean Storage/Jails/crashplan_1
2015-02-08.13:12:28 zfs destroy -fr Storage/Jails/crashplan_1
2015-02-08.13:17:54 zfs clone Storage/Jails/.warden-template-pluginjail@clean Storage/Jails/crashplan_1
2015-02-08.13:22:23 zfs clone Storage/Jails/.warden-template-pluginjail@clean Storage/Jails/crashplan_2
2015-02-08.13:25:33 zfs destroy -fr Storage/Jails/crashplan_1
2015-02-08.21:00:05 zfs snapshot Storage/VMs@auto-20150208.2100-2w
2015-02-09.19:29:53 zfs destroy -fr Storage/Jails/crashplan_2
2015-02-09.19:32:33 zfs clone Storage/Jails/.warden-template-pluginjail@clean Storage/Jails/crashplan_1
2015-02-09.20:13:41 zfs destroy -fr Storage/Jails/crashplan_1
2015-02-10.00:00:02 zfs snapshot Storage/VMs@auto-20150210.0000-2w
2015-02-10.00:00:12 zfs destroy -r -d Storage/VMs@auto-20150126.0900-2w
2015-02-10.09:00:02 zfs destroy -r -d Storage/VMs@auto-20150127.0900-2w
2015-02-10.12:00:04 zfs snapshot Storage/VMs@auto-20150210.1200-2w
2015-02-11.00:00:06 zfs snapshot Storage/VMs@auto-20150211.0000-2w
2015-02-11.09:00:13 zfs destroy -r -d Storage/VMs@auto-20150128.0900-2w
2015-02-11.12:00:06 zfs snapshot Storage/VMs@auto-20150211.1200-2w
2015-02-12.00:00:06 zfs snapshot Storage/VMs@auto-20150212.0000-2w
2015-02-12.09:00:14 zfs destroy -r -d Storage/VMs@auto-20150129.0900-2w
2015-02-12.12:00:06 zfs snapshot Storage/VMs@auto-20150212.1200-2w
2015-02-13.00:00:06 zfs snapshot Storage/VMs@auto-20150213.0000-2w
2015-02-13.09:00:17 zfs destroy -r -d Storage/VMs@auto-20150130.0900-2w
2015-02-13.12:00:09 zfs snapshot Storage/VMs@auto-20150213.1200-2w
2015-02-14.00:00:06 zfs snapshot Storage/VMs@auto-20150214.0000-2w
2015-02-14.12:00:05 zfs snapshot Storage/VMs@auto-20150214.1200-2w
2015-02-15.00:00:02 zfs snapshot Storage/VMs@auto-20150215.0000-2w
2015-02-15.12:00:06 zfs snapshot Storage/VMs@auto-20150215.1200-2w
2015-02-17.00:00:08 zfs snapshot Storage/VMs@auto-20150217.0000-2w
2015-02-17.09:00:13 zfs destroy -r -d Storage/VMs@auto-20150203.0900-2w
2015-02-17.12:00:07 zfs snapshot Storage/VMs@auto-20150217.1200-2w
2015-02-18.00:00:06 zfs snapshot Storage/VMs@auto-20150218.0000-2w
2015-02-18.09:00:21 zfs destroy -r -d Storage/VMs@auto-20150204.0900-2w
2015-02-18.12:00:07 zfs snapshot Storage/VMs@auto-20150218.1200-2w
2015-02-19.00:00:07 zfs snapshot Storage/VMs@auto-20150219.0000-2w
2015-02-19.09:00:21 zfs destroy -r -d Storage/VMs@auto-20150205.0900-2w
2015-02-19.12:00:07 zfs snapshot Storage/VMs@auto-20150219.1200-2w
2015-02-20.00:00:05 zfs snapshot Storage/VMs@auto-20150220.0000-2w
2015-02-20.09:00:13 zfs destroy -r -d Storage/VMs@auto-20150206.0900-2w
2015-02-20.12:00:07 zfs snapshot Storage/VMs@auto-20150220.1200-2w
2015-02-20.21:00:17 zfs destroy -r -d Storage/VMs@auto-20150206.2100-2w
2015-02-21.00:00:07 zfs snapshot Storage/VMs@auto-20150221.0000-2w
2015-02-21.09:00:17 zfs destroy -r -d Storage/VMs@auto-20150207.0900-2w
2015-02-21.09:54:38 zfs clone Storage/Jails/.warden-template-pluginjail@clean Storage/Jails/crashplan_1
2015-02-21.09:59:48 zfs create -o mountpoint=/Storage/Jails/.warden-template-standard -p Storage/Jails/.warden-template-standard
2015-02-21.10:02:23 zfs snapshot Storage/Jails/.warden-template-standard@clean
2015-02-21.10:02:28 zfs clone Storage/Jails/.warden-template-standard@clean Storage/Jails/Crashplan
2015-02-21.10:29:14 zfs destroy -fr Storage/Jails/Crashplan
2015-02-21.10:29:44 zfs clone Storage/Jails/.warden-template-standard@clean Storage/Jails/Crashplan
2015-02-21.11:52:33 zfs destroy -fr Storage/Jails/Crashplan
2015-02-21.11:58:09 zfs clone Storage/Jails/.warden-template-pluginjail@clean Storage/Jails/Crashplan
2015-02-21.12:00:06 zfs snapshot Storage/VMs@auto-20150221.1200-2w
2015-02-21.12:23:00 zfs create Storage/Jails/CrashplanStorage
2015-02-21.12:35:19 zfs destroy -fr Storage/Jails/crashplan_1
2015-02-21.12:37:35 zfs clone Storage/Jails/.warden-template-pluginjail@clean Storage/Jails/Crashplan2
2015-02-21.12:37:44 zfs destroy -fr Storage/Jails/Crashplan
2015-02-21.12:50:34 zfs destroy -fr Storage/Jails/Crashplan2
2015-02-21.12:51:03 zfs clone Storage/Jails/.warden-template-pluginjail@clean Storage/Jails/Crashplan
2015-02-21.12:56:40 zfs destroy -fr Storage/Jails/Crashplan
2015-02-21.12:57:04 zfs clone Storage/Jails/.warden-template-pluginjail@clean Storage/Jails/Crashplan
2015-02-21.21:00:05 zfs destroy -r -d Storage/VMs@auto-20150207.2100-2w
2015-02-22.00:00:01 zfs snapshot Storage/VMs@auto-20150222.0000-2w
2015-02-22.09:00:13 zfs destroy -r -d Storage/VMs@auto-20150208.0900-2w
2015-02-22.12:00:05 zfs snapshot Storage/VMs@auto-20150222.1200-2w
2015-02-22.21:00:04 zfs destroy -r -d Storage/VMs@auto-20150208.2100-2w
2015-02-23.17:52:55 zpool export -f Storage
2015-02-23.21:00:37 zpool import -f -R /mnt 6609984130854588888
2015-02-23.21:00:41 zpool set cachefile=/data/zfs/zpool.cache Storage
2015-02-23.21:00:41 zfs set aclmode=passthrough Storage
2015-02-23.21:00:46 zfs set aclinherit=passthrough Storage
2015-02-25.17:57:59 zfs set compression=lz4 Storage
2015-02-25.17:57:59 zfs inherit atime Storage
2015-02-25.17:58:00 zfs inherit dedup Storage
2015-02-25.17:58:00 zfs set refquota=none Storage
2015-02-25.17:58:05 zfs set refreservation=0 Storage
2015-03-07.11:29:44 zpool export -f Storage
2015-03-07.11:57:14 zpool import -f -R /mnt 6609984130854588888
2015-03-07.11:57:20 zfs inherit -r mountpoint Storage
2015-03-07.11:57:20 zpool set cachefile=/data/zfs/zpool.cache Storage
2015-03-07.11:57:21 zfs set aclmode=passthrough Storage
2015-03-07.11:57:26 zfs set aclinherit=passthrough Storage
2015-03-08.10:58:26 zfs create -o secondarycache=metadata -o primarycache=metadata -o compression=off Storage/.system/perftest
2015-03-08.11:27:48 zfs destroy Storage/.system/perftest
2015-03-09.09:06:32 zpool import -c /data/zfs/zpool.cache.saved -o cachefile=none -R /mnt -f 6609984130854588888
2015-03-09.09:06:32 zpool set cachefile=/data/zfs/zpool.cache Storage
2015-03-09.09:21:40 zpool add Storage cache da0
2015-03-09.09:56:57 zfs snapshot Storage/VMs@manual-20150309
2015-03-09.09:58:06 zfs clone Storage/VMs@auto-20150222.1200-2w Storage/VMs2
2015-03-09.10:01:08 zfs snapshot Storage/VMs2@auto-20150309.1001-8h
2015-03-09.10:02:08 zfs snapshot Storage/VMs2@auto-20150309.1002-1w
2015-03-09.10:03:08 zfs snapshot Storage/VMs2@auto-20150309.1003-1m
2015-03-09.10:26:08 zpool clear Storage
2015-03-09.10:26:29 zpool scrub Storage
2015-03-09.10:43:12 zpool scrub -s Storage
2015-03-09.10:48:02 zpool remove Storage da0
2015-03-09.10:57:24 zpool export -f Storage
2015-03-09.11:08:10 zpool import -f -R /mnt 6609984130854588888
2015-03-09.11:08:16 zfs inherit -r mountpoint Storage
2015-03-09.11:08:16 zpool set cachefile=/data/zfs/zpool.cache Storage
2015-03-09.11:08:16 zfs set aclmode=passthrough Storage
2015-03-09.11:08:21 zfs set aclinherit=passthrough Storage

Soljia · Mar 9, 2015

Without a doubt. I realized this after saw a partition number after the other disks.
Figured, meh.

cyberjock said:
Yeah, you are a moron.

Can you provide the output for "zpool history" and a debug file? Do *not* reboot this box. I'd also recommend you stop all file sharing services and hold tight.

You are using da0 for a strip AND a cache device simultaneously. If that stripe disk ends up with corruption that prevents mounting, your whole pool is gone forever.

I can promise you two things:

1. You did this by adding the cache device from the CLI. The WebGUI does NOT add cache devices as whole disks.
2. You are going to have to rebuild the pool, so better find free space somewhere.

Soljia · Mar 9, 2015

Well, I hope there isn't any confidential information or anything personal in there :)

I attached it to the previous post for some reason.

https://forums.freenas.org/index.php?attachments/debug-freenas-20150309125419-tgz.6992/

Soljia · Mar 9, 2015

Also, just to confirm I don't think I'm using it as both strip and cache any more. I removed da0 from cache before this post. Here is the latest zpool status

Code:

[root@freenas] /mnt/Storage/DVR/humes/Videos# zpool status
  pool: Storage
 state: ONLINE
status: One or more devices has experienced an error resulting in data
  corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
  entire pool from backup.
  see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub canceled on Mon Mar  9 10:43:07 2015
config:

  NAME  STATE  READ WRITE CKSUM
  Storage  ONLINE  0  0  6
  gptid/c3a1dbf7-255c-11e4-bcf1-0015175b2e90  ONLINE  0  0  159
  raidz3-1  ONLINE  0  0  0
  gptid/c4442745-255c-11e4-bcf1-0015175b2e90  ONLINE  0  0  0
  gptid/c4fd106d-255c-11e4-bcf1-0015175b2e90  ONLINE  0  0  0
  gptid/ba1b044f-adb0-11e4-a29c-0015175b2e90  ONLINE  0  0  0
  gptid/c666e636-255c-11e4-bcf1-0015175b2e90  ONLINE  0  0  0
  gptid/c716e721-255c-11e4-bcf1-0015175b2e90  ONLINE  0  0  0

errors: 2902 data errors, use '-v' for a list

cyberjock said:
Yeah, you are a moron.

Can you provide the output for "zpool history" and a debug file? Do *not* reboot this box. I'd also recommend you stop all file sharing services and hold tight.

You are using da0 for a strip AND a cache device simultaneously. If that stripe disk ends up with corruption that prevents mounting, your whole pool is gone forever.

I can promise you two things:

1. You did this by adding the cache device from the CLI. The WebGUI does NOT add cache devices as whole disks.
2. You are going to have to rebuild the pool, so better find free space somewhere.

cyberjock · Mar 9, 2015

See this entry:

2015-03-09.09:21:40 zpool add Storage cache da0

That command was done from the CLI, and is another example of why we tell people that doing stuff from the CLI is extremely dangerous.

:/

Nothing confidential, just usernames, passwords, your first born's name, your SSN, your address... see? nothing serious. ;)

So here's what you need to do.

1. In the WebGUI you need to offline the cache device. You need to do this first so you don't have more writes going to the da0 device.
2. As you could see with "zpool status -v Storage" you've already got corruption, so data loss is already happening. All of the files that are listed are corrupted. There will no doubt be more files, but zfs won't know which ones until you touch those blocks and ZFS realizes the checksums don't match. :/
3. You need to backup all of your data elsewhere, then destroy and recreate the pool.
4. Promise yourself to never ever do the CLI again. The WebGUI has "seat belts" that prevent you from doing stupid things like you did from the CLI.

I can't tell people enough times to not use the CLI as you can do incredibly stupid and unrecoverable things, even with just a simple typo. Unfortunately you just did one. Everyone thinks they are an expert, and 99% of them are nothing of the sort. :(

You need to do copy operations *only*. If you start moving data that's going to require transactions to be written to the pool, which increases the chance of the pool crashing and you never accessing the data again.

If you have a full backup of all of this data, simply destroy the pool and restore from backup. Statistically, you probably don't since 90%+ of users that have a bad pool don't do backups. :/

There's a *really* good chance you're only going to be able to copy a small amount of data off the pool before the system crashes or the pool unmounts. So grab the most important data first. I've only seen someone do this one other time.. it didn't go well for him and he had to pay me to do data recovery and it took 2 solid weeks to get the data back. I don't do that kind of thing anymore and unless your data is worth more than $10,000 you don't really have any options for data recovery either. Even if I wanted to do it, you'd be looking at a 2 week wait as I wouldn't be able to do much until 2 weekend from now. You should expect many files to be corrupt though. You can figure out which files are corrupt after you finish copying the data by doing the command "zpool status -v Storage" again and looking at the list. It's gonna be a *lot* of files most likely. Since the pool was originally created incorrectly (you can see this from the first line of the zpool history output) it's theoretically possible that 100% of your files are corrupted. That's not likely as you probably didn't consume the entire da0 device as L2ARC, so not all of the data was overwritten.

Good luck!

Soljia · Mar 9, 2015

Just to be sure, the cache device is no longer showing up under cache in the status. Do you want me to offline the cache drive that is attached as Stripe? I'm assuming not.
The file that are listed in zpool status -v right now are very very unimportant
What do you suggest to offload the files without causing issues? Is a normal cp operation to an external drive ok?
I promise to not do CLI, unless I'm told by you or another expert.
You are correct, I do not backup my FreeNAS device. It is the backup device (and obviously a target for VMs). So, while it is being rebuilt, I will be vulnerable. The VM's were mostly test VMs or something that can be rebuilt easily. The worst part of this will be a pissy wife, as our TV will cease to function :)

Thanks for your help.

cyberjock said:
See this entry:

That command was done from the CLI, and is another example of why we tell people that doing stuff from the CLI is extremely dangerous.

:/

Nothing confidential, just usernames, passwords, your first born's name, your SSN, your address... see? nothing serious. ;)

So here's what you need to do.

1. In the WebGUI you need to offline the cache device. You need to do this first so you don't have more writes going to the da0 device.
2. As you could see with "zpool status -v Storage" you've already got corruption, so data loss is already happening. All of the files that are listed are corrupted. There will no doubt be more files, but zfs won't know which ones until you touch those blocks and ZFS realizes the checksums don't match. :/
3. You need to backup all of your data elsewhere, then destroy and recreate the pool.
4. Promise yourself to never ever do the CLI again. The WebGUI has "seat belts" that prevent you from doing stupid things like you did from the CLI.

I can't tell people enough times to not use the CLI as you can do incredibly stupid and unrecoverable things, even with just a simple typo. Unfortunately you just did one. Everyone thinks they are an expert, and 99% of them are nothing of the sort. :(

You need to do copy operations *only*. If you start moving data that's going to require transactions to be written to the pool, which increases the chance of the pool crashing and you never accessing the data again.

If you have a full backup of all of this data, simply destroy the pool and restore from backup. Statistically, you probably don't since 90%+ of users that have a bad pool don't do backups. :/

There's a *really* good chance you're only going to be able to copy a small amount of data off the pool before the system crashes or the pool unmounts. So grab the most important data first. I've only seen someone do this one other time.. it didn't go well for him and he had to pay me to do data recovery and it took 2 solid weeks to get the data back. I don't do that kind of thing anymore and unless your data is worth more than $10,000 you don't really have any options for data recovery either. Even if I wanted to do it, you'd be looking at a 2 week wait as I wouldn't be able to do much until 2 weekend from now. You should expect many files to be corrupt though. You can figure out which files are corrupt after you finish copying the data by doing the command "zpool status -v Storage" again and looking at the list. It's gonna be a *lot* of files most likely. Since the pool was originally created incorrectly (you can see this from the first line of the zpool history output) it's theoretically possible that 100% of your files are corrupted. That's not likely as you probably didn't consume the entire da0 device as L2ARC, so not all of the data was overwritten.

Good luck!

cyberjock · Mar 9, 2015

Soljia said:
Just to be sure, the cache device is no longer showing up under cache in the status. Do you want me to offline the cache drive that is attached as Stripe? I'm assuming not.

I don't even know what you are trying to say. The cache device is removed. There is no offlining stripes.

Copying via cp to an external drive would work.

Soljia · Mar 9, 2015

I'm confused.

You said..

1. In the WebGUI you need to offline the cache device. You need to do this first so you don't have more writes going to the da0 device.

However, da0 is no longer attached as a cache device. I'm asking if you meant to offline da0p2 (https://forums.freenas.org/index.php?attachments/2015-03-09_11-21-36-png.6990/), however, as you stated there is no "offlining" a stripe drive. Which answers the question. I didn't know if there was some magic voodoo that would give me the ability to offline a stripped device without foobaring stuff.

cyberjock said:
I don't even know what you are trying to say. The cache device is removed. There is no offlining stripes.

Copying via cp to an external drive would work.

Soljia · Mar 9, 2015

Well, I would say there was about 40-50% corruption. I just blew out the volume and started over. You live and learn. Very little is irreplaceable data. Thanks for the help.

Important Announcement for the TrueNAS Community.

SOLVED ZPool Cache switched to Stripe

Soljia

Dabbler

Attachments

cyberjock

Inactive Account

Soljia

Dabbler

Soljia

Dabbler

Attachments

Soljia

Dabbler

Soljia

Dabbler

cyberjock

Inactive Account

Soljia

Dabbler

cyberjock

Inactive Account

Soljia

Dabbler

Soljia

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

SOLVED ZPool Cache switched to Stripe

Dabbler

Attachments

Inactive Account

Dabbler

Dabbler

Attachments

Dabbler

Dabbler

Inactive Account

Dabbler

Inactive Account

Dabbler

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "ZPool Cache switched to Stripe"

Similar threads