disk pool throwing alert

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
will run the first badblocks tonight/over night
G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
zfs snapshot -r <old pool>/<dataset>@<snapshot name>
zfs send -R <old pool>/<dataset>@<snapshot name> | zfs recv -Fv <New Pool>/<dataset>

Once you're happy the data is all there and safe;
zfs destroy -f <old pool>/<dataset>

You could also set up a replication task and use it once, then do the delete.
once i've moved the media (from tank/media to plex/media)... guessing (as this is only used by plex) I need to update the plex config to then point to bunker/media

G
 
Joined
Jun 15, 2022
Messages
674
so, how much cpu does bad blocks eat up.

I have one drive connected/unused already on which i can start bad blocks.

I have a screen/kb avail so was thinking running it in there... allowing me to disconnect

command planned

badblocks -b 4096 <searching for block size for ST4000VN008-2DR166> -c 256 -e 64 -p 1 -svw -o <where ?>

please explain a bit more: -svw to see what's going on (this does wipe the drive, a read won't detect future errors)

G
badblocks CPU usage depends on many factors. I have an old server (like 15 years old with PCI HBA) that it easily uses 50% with 4 drives, but the HP server (PCIe) runs 12 drives at something like 6% total CPU usage until a Verify pass, and that runs about 7% CPU usage per drive. Your 1 drive shouldn't impact server performance.

The block size you can leave default if you want (don't specify -b).

-c 256 should be fine, though 1024 might be far faster. Anything over 2048 seems to be only a slight gain, so 4096 and above is probably not needed.

-o I run to a USB drive and analyze the results in my office, but you could just as easily | tee <network_path> and watch the file with tail.

man badblocks shows help for -s (show progress) -v (verbose).

-w does a destructive write with bit patterns that make "stressful" magnetic encoding, so if the block is getting weak one of the patterns ought to show it:
0xaa is binary: 10101010
0x55: 01010101
0xff: 11111111
0x00: 00000000

Those patterns "cover the worst-case possibilities." Once I had just one of those patterns fail, but the rest stuck without issue, so the block's magnetic ability was "getting weak."

There's a lot of theory on magnetic storage, years back disks were magnetic film on glass platters and bad blocks were expected--there was actually a bad block list from the manufacturer on the label, which then was manually entered into the MFM or RLL controller so the controller wouldn't use those blocks (generally there were 3 to 9 bad blocks). From what I last heard (and I could be incorrect) modern platters are made from a long aluminum cylinder that is squashed, sawed, and polished because a way was found to pack data tighter and more reliably.

In modern industrial-quality drives 1 bad block is often considered "the beginning of the end." I'd argue it depends on the size of the drive and how quickly blocks are going bad, as some of my drives developed 3 bad blocks years back and are still running with those 3 bad blocks. However, I've had other drives start to throw 0 bad blocks on a Short SMART test, 1 or 2 bad blocks on a Long SMART test (both are read tests), and pulling from service and running a badblocks write test started throwing tens of bad blocks, a retest usually starts throwing hundreds of bad blocks. This is why I keep several spare drives on the shelf to chuck into systems that look like they might have a drive going bad.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
I ran th
use tmux from ssh...

tmux new-session -d -s badblocks '/usr/sbin/badblocks -b.....'

Then at any time:

tmux attach -t badblocks

When in that session, CTRL + B, then D to disconnect. Also CTRL + C when in that session to terminate the running code.
is last night.

how would i confirm it's completed, is there a way to check/confirm it was all copied ?

G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
badblocks CPU usage depends on many factors. I have an old server (like 15 years old with PCI HBA) that it easily uses 50% with 4 drives, but the HP server (PCIe) runs 12 drives at something like 6% total CPU usage until a Verify pass, and that runs about 7% CPU usage per drive. Your 1 drive shouldn't impact server performance.

The block size you can leave default if you want (don't specify -b).

-c 256 should be fine, though 1024 might be far faster. Anything over 2048 seems to be only a slight gain, so 4096 and above is probably not needed.

-o I run to a USB drive and analyze the results in my office, but you could just as easily | tee <network_path> and watch the file with tail.

man badblocks shows help for -s (show progress) -v (verbose).

-w does a destructive write with bit patterns that make "stressful" magnetic encoding, so if the block is getting weak one of the patterns ought to show it:
0xaa is binary: 10101010
0x55: 01010101
0xff: 11111111
0x00: 00000000

Those patterns "cover the worst-case possibilities." Once I had just one of those patterns fail, but the rest stuck without issue, so the block's magnetic ability was "getting weak."

There's a lot of theory on magnetic storage, years back disks were magnetic film on glass platters and bad blocks were expected--there was actually a bad block list from the manufacturer on the label, which then was manually entered into the MFM or RLL controller so the controller wouldn't use those blocks (generally there were 3 to 9 bad blocks). From what I last heard (and I could be incorrect) modern platters are made from a long aluminum cylinder that is squashed, sawed, and polished because a way was found to pack data tighter and more reliably.

In modern industrial-quality drives 1 bad block is often considered "the beginning of the end." I'd argue it depends on the size of the drive and how quickly blocks are going bad, as some of my drives developed 3 bad blocks years back and are still running with those 3 bad blocks. However, I've had other drives start to throw 0 bad blocks on a Short SMART test, 1 or 2 bad blocks on a Long SMART test (both are read tests), and pulling from service and running a badblocks write test started throwing tens of bad blocks, a retest usually starts throwing hundreds of bad blocks. This is why I keep several spare drives on the shelf to chuck into systems that look like they might have a drive going bad.
FYI, I've had 24 blocks for a while on one of the disks... another had 16 and suddenly the reporting when it was picked up increased... and it started to climb, been semi stable as 26think...

G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
Trying to reconfigure my NFS and SMB share it shows me permission error on bunker/media.
closer look, there are no permissions associated with it, where as on tank/media I have ?

G
 

Attachments

  • Screenshot 2023-08-08 at 07.40.07.png
    Screenshot 2023-08-08 at 07.40.07.png
    847.4 KB · Views: 132

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
the dataset eventually "now" showed on the new diskpool with the same permissions... well that gave me an indication how long it took to move it allllll.
initial update of the my shares was incomplete, forgot my apps don't show/point to the share, but to the actual mount point which has changed...
<would have been nice if the app referenced the dataset, wherever ever it lived, thus if you moved it it understood/followed>.

G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
any ide
rsync -avnc /mnt/pool/original /mnt/pool/new
a how long a 6TB dataset check might take on a i5.

realise this could be how long is the string.

G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
I executed the rsync inside a tux session... allowing me to walk away.
1691479418745.png


G
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
how long a 6TB dataset check might take on a i5.

realise this could be how long is the string.
Probably more important to know would be how many files/directories... I don't know how to equate that down to a rate per processor type or pool though, so it takes as long as it takes.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
with the above rsync, how would i monitor it... guessing that that tmux session would be for as long as it's running, when done it will close and with it any output ?

I've attached to the session for rsync and seeing allot of file names scroll pass ?
sorta more expected it to not show allot or maybe files checked, or files found lacking ;)
G
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
That list should be (as it would say at the top), the "incremental file list"... meaning exists in the first structure, but missing in the second.

A long list isn't what you would want to see there if they are equal.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
then i would also be concerned, why would there be differences.
terminal/tmux is not allowing me to scroll up... so ca't see how long this list is.
G

That list should be (as it would say at the top), the "incremental file list"... meaning exists in the first structure, but missing in the second.

A long list isn't what you would want to see there if they are equal.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
a question would then be, can I run the rsync again (overwrite where the files are and add where not)... to try and recopy the data, aka get rid of the delta.

G
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
change the switches to remove the n and the copy will happen. (only missing or changed files will copy).
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
As in

change :
tmux new-session -d -s rsync 'rsync -avnc /mnt/tank/media /mnt/bunker/media'
To:
tmux new-session -d -s rsync 'rsync -avc /mnt/tank/media /mnt/bunker/media'

G
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
change :
tmux new-session -d -s rsync 'rsync -avnc /mnt/tank/media /mnt/bunker/media'
To:
tmux new-session -d -s rsync 'rsync -avc /mnt/tank/media /mnt/bunker/media'
That should do it.

It will probably go faster if you drop the c also as it won't do a checksum verify of the files to see if they are different and will trust timestamps instead.

Also add u to skip files that may be modified on the target... maybe that was my mistake to not have that in there already. (I usually use -uav)
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
what does the "u" do ?
will have to go "man" a bit.
G
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
what does the "u" do ?
Skips files that are newer on the target side (meaning you copied them already and then changed it... don't overwrite with the original from the source).

Probably not much of an impact for you if you haven't started using the new dataset yet.
 
Top