What a disaster Replication task done!

mathcrowd

Dabbler
Joined
Apr 18, 2023
Messages
11
I'm running a small startup team doing math education related, and store all the files on the nextcloud which mount a nfs share from truenas in last three years.

I'm setting up a new server today. and want to migrate some dataset between servers.

Unfortunitely, i used the replication task.

Setting source:

Server A/Pool 1/{dataset a,dataset b, dataset c}

Setting target:

Server B/Pool 2

Pool 2 already contains dataset a,b,c older version. and dataset d,e,f.

So I started the replication task, after 100g data has been copied, I finally realized all datasets in Pool 2 dispeared.

I quickly shutdown the server, (It seems the task can not be stopped.)

----------------------------------------------------------------


dataset d,e,f is about 30T size, I think the data is still on mydisk.

all my snapshots have gone.

---------------------------------------------------------------

I have no backup for these data.

Since I thought raid1 is safe enough( how silly i was), and lack of budget to by more disks.

-----------------------------------------------------------------

By google, i found that tpx may save my life.

i have run `zpool history -il`

and but most of them is not the related pool and have wrong date( a day before ).

Can any one help, Thanks.
 

mathcrowd

Dabbler
Joined
Apr 18, 2023
Messages
11
updating:

i found these related commands history.

but it's wierd that dataset d,e,f (metioned upstairs, which is not the target ) is not listed below.

so how to find my missing datasets.

```
2023-04-18.21:51:01 zpool import 11985382682913978068 DataCenter
2023-04-18.21:51:10 zfs inherit -r DataCenter
2023-04-18.22:00:43 zfs destroy -r DataCenter/static
2023-04-18.22:11:05 zfs create -o aclmode=passthrough -o casesensitivity=sensitive -o org.freenas:description=video collection -o copies=1 -o org.truenas:managedby=192.168.30.50 -o quota=none -o refquota=none -o refreservation=none -o reservation=none -o xattr=sa -o special_small_blocks0 DataCenter/video
2023-04-18.22:52:49 zfs recv -s -F -x sharenfs -x sharesmb -x mountpoint DataCenter/video
2023-04-18.22:52:50 zfs set readonly=on DataCenter/video
2023-04-18.22:52:50 zfs set readonly=on DataCenter/video
2023-04-18.22:52:50 zfs destroy DataCenter@auto-2023-04-05_00-00,auto-2023-04-06_00-00,auto-2023-04-07_00-00,auto-2023-04-08_00-00,auto-2023-04-09_00-00,auto-2023-04-10_00-00,auto-2023-04-11_00-00,auto-2023-04-12_00-00,auto-2023-04-13_00-00,auto-2023-04-14_00-00,auto-2023-04-15_00-00,auto-2023-04-17_00-00
2023-04-18.22:52:50 zfs destroy DataCenter/apt-cache@auto-2023-04-18_00-00
2023-04-18.22:52:50 zfs destroy DataCenter/calibre@auto-2023-04-18_00-00
2023-04-18.22:52:51 zfs destroy DataCenter/config@auto-2023-04-18_00-00
2023-04-18.22:52:51 zfs destroy DataCenter/gitlab@auto-2023-04-18_00-00
2023-04-18.22:52:51 zfs destroy DataCenter/tug@auto-2023-04-18_00-00
2023-04-18.22:54:21 zpool remove DataCenter /dev/gptid/531c733a-d53e-11ed-8c24-005056808ef7
2023-04-18.22:54:58 zfs destroy DataCenter/apt-cache@auto-2023-04-05_00-00,auto-2023-04-06_00-00,auto-2023-04-07_00-00,auto-2023-04-08_00-00,auto-2023-04-09_00-00,auto-2023-04-10_00-00,auto-2023-04-11_00-00,auto-2023-04-12_00-00,auto-2023-04-13_00-00,auto-2023-04-14_00-00,auto-2023-04-15_00-00
2023-04-18.22:54:58 zfs destroy DataCenter/calibre@auto-2023-04-05_00-00,auto-2023-04-06_00-00,auto-2023-04-07_00-00,auto-2023-04-08_00-00,auto-2023-04-09_00-00,auto-2023-04-10_00-00,auto-2023-04-11_00-00,auto-2023-04-12_00-00,auto-2023-04-13_00-00,auto-2023-04-14_00-00,auto-2023-04-15_00-00
2023-04-18.22:54:58 zfs destroy DataCenter/config@auto-2023-04-05_00-00,auto-2023-04-06_00-00,auto-2023-04-07_00-00,auto-2023-04-08_00-00,auto-2023-04-09_00-00,auto-2023-04-10_00-00,auto-2023-04-11_00-00,auto-2023-04-12_00-00,auto-2023-04-13_00-00,auto-2023-04-14_00-00,auto-2023-04-15_00-00
2023-04-18.22:54:58 zfs destroy DataCenter/davinci@auto-2023-04-05_00-00,auto-2023-04-06_00-00,auto-2023-04-07_00-00,auto-2023-04-08_00-00,auto-2023-04-09_00-00,auto-2023-04-10_00-00,auto-2023-04-11_00-00,auto-2023-04-12_00-00,auto-2023-04-13_00-00,auto-2023-04-14_00-00,auto-2023-04-15_00-00,auto-2023-04-17_00-00
2023-04-18.22:57:04 zfs destroy DataCenter/gitlab@auto-2023-04-05_00-00,auto-2023-04-06_00-00,auto-2023-04-07_00-00,auto-2023-04-08_00-00,auto-2023-04-09_00-00,auto-2023-04-10_00-00,auto-2023-04-11_00-00,auto-2023-04-12_00-00,auto-2023-04-13_00-00,auto-2023-04-14_00-00,auto-2023-04-15_00-00
2023-04-18.22:59:01 zfs destroy DataCenter/minio@auto-2023-04-05_00-00,auto-2023-04-06_00-00,auto-2023-04-07_00-00,auto-2023-04-08_00-00,auto-2023-04-09_00-00,auto-2023-04-10_00-00,auto-2023-04-11_00-00,auto-2023-04-12_00-00,auto-2023-04-13_00-00,auto-2023-04-14_00-00,auto-2023-04-15_00-00,auto-2023-04-17_00-00
2023-04-18.22:59:49 zfs destroy DataCenter/registry@auto-2023-04-05_00-00,auto-2023-04-06_00-00,auto-2023-04-07_00-00,auto-2023-04-08_00-00,auto-2023-04-09_00-00,auto-2023-04-10_00-00,auto-2023-04-11_00-00,auto-2023-04-12_00-00,auto-2023-04-13_00-00,auto-2023-04-14_00-00,auto-2023-04-15_00-00,auto-2023-04-17_00-00
2023-04-18.23:00:07 zfs destroy DataCenter/softwares@auto-2023-04-05_00-00,auto-2023-04-06_00-00,auto-2023-04-07_00-00,auto-2023-04-08_00-00,auto-2023-04-09_00-00,auto-2023-04-10_00-00,auto-2023-04-11_00-00,auto-2023-04-12_00-00,auto-2023-04-13_00-00,auto-2023-04-14_00-00,auto-2023-04-15_00-00,auto-2023-04-17_00-00
2023-04-18.23:00:07 zfs destroy DataCenter/tug@auto-2023-04-05_00-00,auto-2023-04-06_00-00,auto-2023-04-07_00-00,auto-2023-04-08_00-00,auto-2023-04-09_00-00,auto-2023-04-10_00-00,auto-2023-04-11_00-00,auto-2023-04-12_00-00,auto-2023-04-13_00-00,auto-2023-04-14_00-00,auto-2023-04-15_00-00
2023-04-18.23:13:04 zpool remove DataCenter /dev/gptid/53052171-d53e-11ed-8c24-005056808ef7
2023-04-18.23:24:51 zpool add DataCenter special /dev/gptid/f4186fa9-ddfb-11ed-a4c0-00505680d330
2023-04-18.23:32:37 zpool remove DataCenter /dev/gptid/454be33a-d0a8-11ed-aa8b-00505680d00f
2023-04-18.23:43:35 zpool attach DataCenter /dev/gptid/f4186fa9-ddfb-11ed-a4c0-00505680d330 /dev/gptid/a8a8513c-ddff-11ed-a4c0-00505680d330
2023-04-18.23:46:03 zpool remove DataCenterNone
2023-04-19.00:00:39 zfs recv -s -F -x sharenfs -x sharesmb -x mountpoint DataCenter/apt-cache
2023-04-19.00:00:49 zfs set readonly=on DataCenter/apt-cache
2023-04-19.00:01:04 zfs recv -s -F -x sharenfs -x sharesmb -x mountpoint DataCenter/apt-cache
2023-04-19.00:01:19 zfs recv -s -F -x sharenfs -x sharesmb -x mountpoint DataCenter/apt-cache
2023-04-19.00:01:38 zfs recv -s -F -x sharenfs -x sharesmb -x mountpoint DataCenter/apt-cache
2023-04-19.00:02:48 zfs recv -s -F -x sharenfs -x sharesmb -x mountpoint DataCenter/apt-cache
2023-04-19.00:03:07 zfs recv -s -F -x sharenfs -x sharesmb -x mountpoint DataCenter/apt-cache
2023-04-19.00:04:17 zfs recv -s -F -x sharenfs -x sharesmb -x mountpoint DataCenter/apt-cache
2023-04-19.00:10:40 zpool import 11985382682913978068 DataCenter
2023-04-19.00:10:40 zpool set cachefile=/data/zfs/zpool.cache DataCenter

```
 

mathcrowd

Dabbler
Joined
Apr 18, 2023
Messages
11
updating:

finally i found destroy commands using `history -i`

Desperately,i'm doing 'vdev rm/add' operation while the replication task is running.

So, when i try

Code:
zpool import -N -o readonly=on -F -T 17032157 DataCenter


I got "one or more devices is currently unavailable".

Original disks info:

1 vdisk in esxi for special vdev
2 physical disk in esxi for l2rc cache
2 vdisk in esxi for log vdev

What I've done:

1. remove l2rc cache
2. add one physical disk removed as special vdev
3. delete original vdisk special vdev
4. add anothor physical disk as mirror
5. remove 2 log vdev

Is it possible find my datasets back? It's really important to our team.
We've upload all of our work onto nextcloud.
crying....
 
Joined
Oct 22, 2019
Messages
3,641
Wait, you removed a special vdev? That basically takes down the entire pool.

It's very difficult to follow your posts at this point in time.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
My read from that is that he has destroyed his target pool in several ways by flailing around doing the equivalent of pressing random buttons.
And no backup.
AND its virtual

I don't see any way back from that

"delete original vdisk special vdev" - thats fatal for whichever pool that used to be.
 

mathcrowd

Dabbler
Joined
Apr 18, 2023
Messages
11
Wait, you removed a special vdev? That basically takes down the entire pool.

It's very difficult to follow your posts at this point in time.
Actually I replaced my special vdev with two physical disks which were l2rc cache.

The whole proccess is with no warning and error.

I can still import the Pool but with all the datasets loosing.
 

mathcrowd

Dabbler
Joined
Apr 18, 2023
Messages
11
My read from that is that he has destroyed his target pool in several ways by flailing around doing the equivalent of pressing random buttons.
And no backup.
AND its virtual

I don't see any way back from that

"delete original vdisk special vdev" - thats fatal for whichever pool that used to be.
I still have the vdisks files. only loosing the l2rc caching data.

I thought metadata will be migrated to the new disks.

However, it made me impossible to rollback.
 

mathcrowd

Dabbler
Joined
Apr 18, 2023
Messages
11
Actually I replaced my special vdev with two physical disks which were l2rc cache.

The whole proccess is with no warning and error.

I can still import the Pool but with all the datasets loosing.
Furthermore, I run the esxi on 2 node vsan but only running one node.

So I think the original vdisk can be found from another node if i do not sysnc two node data.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I think you need to a) illustrate what is going on, b) boil this down to a list of specific actions taken, and c) stop touching the server because this is impossible to follow.

Frankly, your setup is a complete disaster and it's only gotten worse by carrying out seemingly random actions with no clear plan and certainly not a bit of input from someone knowledgeable.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Back to the start - replication means replication. If you replicate dataset X on machine A to machine B, then dataset X on machine B will look exactly like the one on machine A. This happens on the block level. Everything that might have been on machine B in a dataset named X will be gone. If you don't have another backup, that's bad but fact. That's all that can be said from the TrueNAS point of view.

Now with the rest of your posts describing various actions on your part ... you seem to be running TrueNAS virtualized on virtual disk images. This is a configuration that is in itself strongly discouraged and known to be dangerous. That being said there might be a chance you can recover that data if e.g. VMFS snapshots are available.

But with even our most experienced regular @jgreco getting nausea from your rapid unordered retelling of events, to help you we will need a complete and precise description of your setup on the target machine to even begin thinking about useful suggestions.

Anyone providing technical support via some remote medium, be it a forum or the phone, needs to construct a complete mental image of the state of your machine. This is currently not possible from your posts alone. Please provide more structured information.

Kind regards, keeping my fingers crossed for your data,
Patrick
 

mathcrowd

Dabbler
Joined
Apr 18, 2023
Messages
11
I think you need to a) illustrate what is going on, b) boil this down to a list of specific actions taken, and c) stop touching the server because this is impossible to follow.

Frankly, your setup is a complete disaster and it's only gotten worse by carrying out seemingly random actions with no clear plan and certainly not a bit of input from someone knowledgeable.
More Simplized illustration:

I accidentally triggered the destroy all datasets commands without noticing it.

And when the destroy command is runned, I was still replacing the vdevs including:

Original State:

* 4 data vdev, with 2 disk in a mirrored group
* 1 special vdev, which is esxi vmdk object
* 2 log dev, which are esxi vmdk object

What I do:

1. remove 2 l2rc cache vdevs (which is physical disks).
2. add one disk mentioned above to pool as special vdev.
3. delete the original special vdev.
4. add one more disk as mirror in special vdev.
5. delete the log devs.

When I replacing the vdevs, the server is stilling running a replication task which trigger the destroy command.
About 900Gb data is written to the pool. And the whole pool size is about 30Tb.

6. I reboot my server to stop the task.
7. I found the tpx of the destroy command.

Code:
2023-04-18.23:13:41 [txg:17032225] destroy DataCenter/nextcloud (143) (bptree, mintxg=1)


8. But can not do the rollback after exported the pool
Code:
zpool import -N -o readonly=on -F -T 17032157 DataCenter`

err: one or more devices is currently unavailable

9. I tried this configs and try importing again
Code:
set vfs.zfs.recover=1
set vfs.zfs.debug=1
sysctl vfs.zfs.spa.load_verify_metadata=0
sysctl vfs.zfs.spa.load_verify_data=0
zpool import -N -o readonly=on -f -R /mnt -F -T 17032157 DataCenter

monitoring the dbgmsg:
Code:
spa.c:6249:spa_tryimport(): spa_tryimport: importing DataCenter, max_txg=17032157
spa_misc.c:419:spa_load_note(): spa_load($import, config trusted): LOADING
spa.c:8369:spa_async_request(): spa=$import async request task=1
spa.c:8369:spa_async_request(): spa=$import async request task=1
spa.c:8369:spa_async_request(): spa=$import async request task=1
spa.c:8369:spa_async_request(): spa=$import async request task=1
spa.c:8369:spa_async_request(): spa=$import async request task=1
spa.c:8369:spa_async_request(): spa=$import async request task=1
spa.c:8369:spa_async_request(): spa=$import async request task=1
spa.c:8369:spa_async_request(): spa=$import async request task=1
spa.c:8369:spa_async_request(): spa=$import async request task=1
spa.c:8369:spa_async_request(): spa=$import async request task=1
spa_misc.c:404:spa_load_failed(): spa_load($import, config untrusted): FAILED: no valid uberblock found
spa_misc.c:419:spa_load_note(): spa_load($import, config untrusted): UNLOADING
spa.c:6110:spa_import(): spa_import: importing DataCenter, max_txg=17032157 (RECOVERY MODE)
spa_misc.c:419:spa_load_note(): spa_load(DataCenter, config trusted): LOADING
spa_misc.c:404:spa_load_failed(): spa_load(DataCenter, config untrusted): FAILED: no valid uberblock found
spa_misc.c:419:spa_load_note(): spa_load(DataCenter, config untrusted): UNLOADING


----------------------------------------------------------------------------------------------------

I still keeping all my esxi vmdk files now.

Hope somebody to help.

I'm so silly to have no backups.

It's a disaster to our team.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Honestly, your situation is unrecoverable, and a complete comedy of errors. Why didn't you stop digging when you realized you were in a hole?

The only hope you have to recovering your data is to use a ZFS rescue utility like Klennet ZFS Recovery. Honestly, you need to have a stable bare metal Windows rescue environment to even attempt a rescue. Make multiple copies of your VMDKs, and keep a set untouched as your gold masters, used only as sources for copies.
 

mathcrowd

Dabbler
Joined
Apr 18, 2023
Messages
11
Honestly, your situation is unrecoverable, and a complete comedy of errors. Why didn't you stop digging when you realized you were in a hole?

The only hope you have to recovering your data is to use a ZFS rescue utility like Klennet ZFS Recovery. Honestly, you need to have a stable bare metal Windows rescue environment to even attempt a rescue. Make multiple copies of your VMDKs, and keep a set untouched as your gold masters, used only as sources for copies.
you mean run zfs recovery on data vdev or special vdev?

acctually, i can import the pool. but barely anyting remain on it.
 
Last edited by a moderator:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I accidentally triggered the destroy all datasets commands without noticing it.
How on Earth did that happen?

And when the destroy command is runned, I was still replacing the vdevs including:
What do those things have to do with each other?

1. remove 2 l2rc cache vdevs (which is physical disks).
That's fine.

add one disk mentioned above to pool as special vdev.
That's a lot less fine.

delete the original special vdev.
That is impossible to do on ZFS. If you mean you destroyed the virtual disks or something, that would destroy the pool.

add one more disk as mirror in special vdev.
Uncontroversial

delete the log devs.
SLOG ? Fine.

When I replacing the vdevs, the server is stilling running a replication task which trigger the destroy command.
About 900Gb data is written to the pool. And the whole pool size is about 30Tb.
And you didn't get any serious warnings? I'm assuming you configured this from the GUI?
 

mathcrowd

Dabbler
Joined
Apr 18, 2023
Messages
11
I accidentally triggered the destroy all datasets commands without noticing it.

How on Earth did that happen?

I just set the pool root as target when doing replication task, so boom!!!!

delete the original special vdev.
That is impossible to do on ZFS. If you mean you destroyed the virtual disks or something, that would destroy the pool.

since i add one physical disk as special vdev, so truenas allow me to remove the former one, remaining one disk as special vdev.
actually it take a while to replace the disk.

the esxi vdisk file was kept and have backups before all disater happened.
 

mathcrowd

Dabbler
Joined
Apr 18, 2023
Messages
11
Honestly, your situation is unrecoverable, and a complete comedy of errors. Why didn't you stop digging when you realized you were in a hole?

The only hope you have to recovering your data is to use a ZFS rescue utility like Klennet ZFS Recovery. Honestly, you need to have a stable bare metal Windows rescue environment to even attempt a rescue. Make multiple copies of your VMDKs, and keep a set untouched as your gold masters, used only as sources for copies.
I'm trying the software now, should I choose new special vdev(physical disk, can imported normally) or the original vdev(vmdk esxi)
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
I'm trying the software now, should I choose new special vdev(physical disk, can imported normally) or the original vdev(vmdk esxi)
Sorry, I don't know. You'll have to ask Klennet support.
 
Top