disk pool throwing alert

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
the trouble with the compare you are running in that it is not a compare but an sync. compare would use -n.
I was starting to think I had somehow given the wrong switches or something...

That -uav will indeed compare (and rectify any missing) files.

However, if the 2 sides match (and were put there by rsync in the first place) it just outputs an empty list.

I think what's going on here is that we're rsyncing over a structure that has multiple child datasets and rsync isn't set to follow the links correctly, so we're forking off a directory under each child dataset and filling that up with the contents of the source, since that directory is found to be empty.

I suspect the best answer to compare the content is using zfs again:

zfs diff tank/media@media bunker/media@media

I would suggest rolling back to the @media snapshot on bunker first.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
so as per above, i delete bunker/media and starting over.
but i can add, tank only has one dataset: media
bunker had "home" and "timemachine
bunker/home
bunker/timemachine
nothing else
everything else is directories on that dataset, no nested datasets.
will run the above command once the new "send" has completed.
G
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
better not to use uav better to use nav when you expect nothing to be copied so you don't get caught with a bunch of new files when maybe src and dst should use /mnt/tank/media/ and /mnt/bunker/media/ that way avoid maybe creating /mnt/bunker/media/media and duplicating into it
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
this has been the command set being followed

Create snapshot - I created a new one just before executing the zfscopy, although there was no changes from previous
/usr/sbin/zfs snapshot -r tank/media@media

Copy snapshot to target
tmux new-session -d -s zfscopy '/usr/sbin/zfs send -R tank/media@media | pv | zfs recv -Fv bunker/media'
--> takes 12 hours

Upsert, this is the "upsert" command used, in line with your recommendation, also in line with sretalla and if we saying this should only report delta's... then here is where we have a problem, as a bucket load of files are showed/reported.
tmux new-session -d -s rsync 'rsync -anv /mnt/tank/media /mnt/bunker/media'

and then regarding you last comment about source and target, well we have been using fully qualified on the rsync command.

G
 
Joined
Jun 15, 2022
Messages
674
think I can probably improve on this badblock check... speed...
it's just under 24hours and still busy, and zero errors on this drive so far, this is also not one of the drives that previously threw errors.
command used:
tmux new-session -d -s badblocks '/usr/sbin/badblocks -c 1024 -e 64 -p 1 -svw -o /root/badblocks/sda.out /dev/sda'
Maybe. Figure out your transfer rate and see if it matches your drive transfer rate. If you're doing X% of the drive in Y hours convert that all to MB/second and compare it to the rated drive MB/s and there you go.

Older SAS HDDs can do about 120 MB/s for a long sequential write which is what badblocks does. SATA can get kind of mucked up and slowed down to 80 or even 60 MB/s when using multiple drives on older systems, but it all depends on the hardware.

Newer SAS HDDs do a solid 150 MB/s, though when I test 12 drives using a Gen2 PCIe HBA I see a bottleneck in the PCIe bus (the server being Gen3), whereas a more efficient Gen3 HBA in the same system handles 12 drives without slowdown. 16 drives may trip it up, but at that point so might CPU overhead on Read/Verify, so "it depends."

Newer SATA "seems" to get "a little" mucked up with multiple drives, but I haven't done enough testing to say anything conclusive, so if you see slowdowns when testing multiple SATA drives with SATA controllers you'll have an answer. They should do 150 MB/s with 1 or 2 drives, 3 might slow some systems down, and 5+ seems to be where things get squirrley, though take that as speculation as I run SAS on larger systems and have not tested this.

SATA SSDs...wow, I've seen 256MB/s continuous sequential write on consumer drives all the way to over 500 MB/s on Enterprise drives. I run them off SAS controllers because SATA controllers seem to bottleneck on sequential transfers in parallel, though are probably fine for keeping up with random transfers--this is all highly subjective and outside my use-case. Since SSDs have limited write cycles, they generally either work or don't, and badblocks write testing stresses the drive, most people don't do write testing. If I were building a high-speed array with 30+ SSDs I'd probably buy 2 and run badblocks for 50 full runs (4 full drive writes per test) with great cooling as they do heat up and see how it goes. If they last I'd buy 30 drives, shelve 2, and throw the 2 test units into the array as they should fail first and indicate when my whole array is going to start puking SSDs and fail spectacularly.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
Maybe. Figure out your transfer rate and see if it matches your drive transfer rate. If you're doing X% of the drive in Y hours convert that all to MB/second and compare it to the rated drive MB/s and there you go.

Older SAS HDDs can do about 120 MB/s for a long sequential write which is what badblocks does. SATA can get kind of mucked up and slowed down to 80 or even 60 MB/s when using multiple drives on older systems, but it all depends on the hardware.

Newer SAS HDDs do a solid 150 MB/s, though when I test 12 drives using a Gen2 PCIe HBA I see a bottleneck in the PCIe bus (the server being Gen3), whereas a more efficient Gen3 HBA in the same system handles 12 drives without slowdown. 16 drives may trip it up, but at that point so might CPU overhead on Read/Verify, so "it depends."

Newer SATA "seems" to get "a little" mucked up with multiple drives, but I haven't done enough testing to say anything conclusive, so if you see slowdowns when testing multiple SATA drives with SATA controllers you'll have an answer. They should do 150 MB/s with 1 or 2 drives, 3 might slow some systems down, and 5+ seems to be where things get squirrley, though take that as speculation as I run SAS on larger systems and have not tested this.

SATA SSDs...wow, I've seen 256MB/s continuous sequential write on consumer drives all the way to over 500 MB/s on Enterprise drives. I run them off SAS controllers because SATA controllers seem to bottleneck on sequential transfers in parallel, though are probably fine for keeping up with random transfers--this is all highly subjective and outside my use-case. Since SSDs have limited write cycles, they generally either work or don't, and badblocks write testing stresses the drive, most people don't do write testing. If I were building a high-speed array with 30+ SSDs I'd probably buy 2 and run badblocks for 50 full runs (4 full drive writes per test) with great cooling as they do heat up and see how it goes. If they last I'd buy 30 drives, shelve 2, and throw the 2 test units into the array as they should fail first and indicate when my whole array is going to start puking SSDs and fail spectacularly.
I mostly have 2 drives I want to make sure is 100%, 2 thats thrown alerts, the others have been well behaved, so if it takes a bit of time, then I'm good. the plan is to check these 2 drives, and then join them with 3 thats in the above "tank" diskpool into a new tank diskpool, thus me trying to moving the data off the pool atm...

G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
first Badblocks completed, while watching it on a regular basis, never saw any errors, so doubting that last hour produced any, but:
the below was the command executed,

any idea why sda.out was 0 size, empty

tmux new-session -d -s badblocks '/usr/sbin/badblocks -c 1024 -e 64 -p 1 -svw -o /root/badblocks/sda.out /dev/sda'
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
so where are we with commands to execute once the initial zfs send is completed ?

???? tmux new-session -d -s rsync 'rsync -anv /mnt/tank/media /mnt/bunker/media' ???

G
 
Joined
Jun 15, 2022
Messages
674
first Badblocks completed, while watching it on a regular basis, never saw any errors, so doubting that last hour produced any, but:
the below was the command executed,
tmux new-session -d -s badblocks '/usr/sbin/badblocks -c 1024 -e 64 -p 1 -svw -o /root/badblocks/sda.out /dev/sda'
any idea why sda.out was 0 size, empty
The output file is empty because there were no errors.

so where are we with commands to execute once the initial zfs send is completed ?

???? tmux new-session -d -s rsync 'rsync -anv /mnt/tank/media /mnt/bunker/media' ???
I have a script I use to capture drive data and screen output, but it's beyond the scope of this thread. Instead of using tmux I have a test rig I put the drives into and run Gnome, make 4 virtual screens, set the mouse to change screens when it hits a border, then put 6 terminal windows per screen each running their own badblocks so I can keep an eye on all drives by changing screens. I also have a script to log drive temperatures, so it is more simplistic and works pretty well with one drive per terminal window. It's not as smooth as it could be, but that's kind of the way life works.
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
so where are we with commands to execute once the initial zfs send is completed ?

???? tmux new-session -d -s rsync 'rsync -anv /mnt/tank/media /mnt/bunker/media' ???

G
I think you should use tmux new-session -d -s rsync 'rsync -anv /mnt/tank/media/ /mnt/bunker/media/' note the trailing / on both media directories, this is the syntax I normally use so ensure I don't introduce spurious directoreis at the remote end. Since you are using -n nothing will happen anyway.

However if you have done zfs send/recv of tank/media@backup then for the comparison you should use /mnt/tank/media/.zfs/snapshot/media/ as the src for rsync, as they is what was zfs replicated, not the current version of tank/media which you were going to compare against.

Maybe @sretalla idea of zfs diff is better, it removes rsync issues entirely.

zfs diff tank/media@media bunker/media@media
If that returns no differences as I expect then you can try
zfs diff tank/media@backup bunker/media
and see if there is a difference between the original and what is now on /mnt/bunker/media.
You could also do
zfs diff tank/media@media tank/media
to see if there is a difference between the snapshot and the current file system image.

Then maybe use rsync -anv /mnt/tank/media/.zfs/snapshot/media/ /mnt/bunker/media/ and see what rsync thinks is different between src snapshot and destination, but that won't be comparing content, only file attributes. To compare content you would need to use rsync -anvc and I expect that would be significantly slower.

Personally I would just be replicating the snapshot, and be done with it, as I am happy that snapshots work as I expect, and I don't need rsync to confirm it for me. I might do another snapshot and then do an zfs send/recv incremental snapshot if I though there were any changes between @media and now, but zfs diff tank/media@media tank/media should tell me that anyway.
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
A little exampe of rsync -auv path1 path2 vs rsync -auv path1/ path2/ just with empty files, so stats aren't that interesting, but look what happens to the paths:
Code:
$ find d?
d1
d1/a
d1/a/c
d1/a/b
d2
d2/a
d2/a/c
d2/a/b
$ rsync -auv d1/a d2/a
sending incremental file list
a/
a/b/
a/c/

sent 118 bytes  received 28 bytes  292.00 bytes/sec
total size is 0  speedup is 0.00
$ find d2
d2
d2/a
d2/a/a
d2/a/a/c
d2/a/a/b
d2/a/c
d2/a/b
$

Starts off the same, but after rsync -auv, d1/a has been duplicated into d2/a/a, which is what I think happened to you @georgelza

On the other hand this is what happened when using rsync -auv path1/ path2/
Code:
$ find d?
d1
d1/a
d1/a/c
d1/a/b
d2
d2/a
d2/a/c
d2/a/b
$ rsync -auv d1/a/ d2/a/
sending incremental file list

sent 97 bytes  received 14 bytes  222.00 bytes/sec
total size is 0  speedup is 0.00
$ find d2
d2
d2/a
d2/a/c
d2/a/b
$ 

using trailing / there is no duplication.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
... can't help but look and see a haze...

all i want is to make sure bunker/media == source of tank/media
before i switch plex to point to bunker/media and i destroy tank/media
all i want to do is run some for of comparison and have it not output any files found not to be on bunker/media thanks is on tank/media
at times it seems a technology gets to powerful and has to many options to do the same thing.

you guys knows ZFS allot better than me.
I've now copied tank/media to bunker/media by having replicated the snapshot.
how do i now confirm without worry than bunker/media = tank/media
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
You are using an enterprise software solution for a home NAS, complexity is to be expected. And I can understand you seeing a haze of stuff you are not really interested in and just wanting a solution.

If you want to compare the two snapshots then the zfs diffs I posted above in post #130 will do that. If you are sure you haven't modified the data after you have taken the snapshots then you are done. If you want to compare the tank/media@media snapshot to the current tank/media dataset, there is also zfs diff above. You could run all the zfs diffs and rsync -anv and it shouldn't produce much output unless something has changed. So if you start seeing lots of output that is time to stop and rexamine what is happened, not say I'm seeing a couple of files and have both @sretella and me think they were the only differences, and would seem to be consistent with in progress update. You have much more visibility that us as to what is happening, while we have a much better understanding of how to do things, even if we might disagree on the fine details.

You have my opinions about how to verify with both zfs diff and rsync -anv, and also why I think your previous run with rsync -auv basically started duplicating the copy and filling the disk. I suggest you run all those commands from post #130 and see what happens, there should not be copious output unless you have changed data. If there is changed data then an incremental snapshot update should allow a relatively quick update of bunker without the initial 6TB copy required again, just the changed data.

You can also wait for @sretalla to chime in with his opinion. Easier for him to have something to comment on if you have already run the zfs diff and rsync -anv commands and posted the results.
 
Last edited:

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
understand how we got here...

I created the snapshot and then applied the snapshot to bunker/ creating bunker/media.
I then said lets make sure all is good and asked for a command to check/verify it was all copied correctly, before I delete the source.
I executed the command given and had a large number of files returned as not on bunker/media and the wheels came off.

will run the diff command, hopefully it returns zero

G
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
I agree. And I think I understand how the wheels came off, and how to stop it happening again. None of us are perfect and while running the commands on our own systems in front of us it would be easy to detect something wrong and stop and fix it. Not so easy with the delays and visibility differences inherent in a forum for communication.

I suggest you run all the zfs/rsync commands in post 130. That will verify multiple different situations, and allow fixing if necessary.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
Commands to be executed, in order... and wait XXXX hours ;)

tmux new-session -d -s dif 'zfs diff tank/media@media bunker/media@media'

tmux new-session -d -s dif 'zfs diff tank/media@backup bunker/media'

tmux new-session -d -s dif 'zfs diff tank/media@media tank/media'


Then maybe use rsync -anv /mnt/tank/media/.zfs/snapshot/media/ /mnt/bunker/media/'
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
something wrong here, the command exits immediately.

tmux new-session -d -s dif '/usr/sbin/zfs diff tank/media@media bunker/media@media'


G
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
You might find that zfs diff is significantly faster than you expect, as I expect it is operating on metadata, not reading all the data and comparing it. If the internal metadata is consistent then the data is good, and if the same, then it is the same. If the internal metadata is bad, then there is pool corruption somewhere. That is just the way zfs works, all the data and metadata is checksummed all the time, and if there is a checksum error on read it is corrected from redundant data and updated, or marked as broken.

If zfs diff isnt throwing an error or producing output then they are the same.

rsync -anv will be looking at the file metadata, size, mtime etc, not reading the actual blocks and comparing them, even so it will probably take much longer as it has to traverse the file tree and lookup each file's metadata.

if you really want to take a long time and compare the data you could use rsync -anvc, which should checksum the data blocks on each side, but I haven't tested it just going from the manual.

If you want to do adhoc verification, then pick some files, verify the size, modify time, and run md5sum on the files on each end, and you will get a checksum of the files which should match both sides, just like I had you do earlier.
 
Last edited:

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
/usr/sbin/zfs diff tank/media@media bunker/media@media

Not an earlier snapshot from the same fs: invalid name

/usr/sbin/zfs snapshot -r tank/media@media

I only have that one snapshot created above that was used as the driver for the zfs send.

G
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
OK, so it is saying that it will only diff snapshots on the same media, something I missed as explained in the manual, but doesn't sound unreasonable

In general I wouldn't be bothering with verifying the replication because if the replication failed, then the destination snapshot would not exist.

Also we previously did zfs list -o name,guid,creation tank/media@media bunker/media@media which had the same guid and creation, which is usually what I look at when I want to know if the snapshot came from the same place.

try the 3rd zfs diff to see if there is any change between the @media snapshot and the current data on tank/media, this would mean we need to make another snapshot and update bunker/media from it.
 
Top