Pool Layout Question

Scharbag

Guru
Joined
Feb 1, 2012
Messages
620
Wow - SMR disks blow.

This pool now has no SMR disks (detached a CMR 4TB from backuptank and put it into bigtank):

1662007710095.png


This pool has 2 SMR disks (SMR 6TB disk I detached from bigtank is is the one being resilvered into backuptank):

1662007758846.png


And I am getting this wonderful news from TrueNAS:

1662007811141.png


Happy I will be rid of all SMR disks soon. The 20TB drives are just starting their disk-burnin.sh journey... Should take a week while (weeks??). Long SMART test will take ~24 hours. A big thank you to whoever invented TMUX!!!

Cheers,
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
The resilver speed for your SMR drives is still relatively decent. I remember reading a test (I think on STH - Serve The Home) where with SMR drives it took more than a week.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
Just for a note, new 13.0-U2 got number of scrub improvements, some of which should make the process a bit more sequential, that may help SMR a bit, if at all possible. Though the main direction of the improvements was to reduce scrub CPU usage.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
It's a bit late to do it now, but some newer (WD and Seagate) SMR disks understand TRIM commands to "refresh" their shingle layout. Blasting the drive with zeroes (manually or via ATA_SECURE_ERASE) seems to also do the trick for the older models.

Put another one on my list of "things to do should I ever get the time" - try to tune enough ZFS tunables so that it never writes anything smaller than a 256MB full-zone allocation.
 

souporman

Explorer
Joined
Feb 3, 2015
Messages
57
I would have to agree that with drives larger than say 10TB, and certainly 20TB, the risk of un-recoverable read error during disk replacement starts to get high. Thus, 3 way mirrors or RAID-Z2.
? Maybe I am wrong, but I've always thought UREs didn't affect mirrors like they do RAIDz1 and 2 because of the fact that a mirror does not use a parity disk. I don't think a URE affects Mirrors during the re-mirroring process. Isn't that like one of the biggest selling points of mirrors, besides better random IO?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I don't think a URE affects Mirrors during the re-mirroring process.
Of course it does--how could it not? If you have a two-disk mirror, one disk dies, you're trying to replace it, and you have a read error on the only remaining copy of your data, how would that not cause data loss?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
? Maybe I am wrong, but I've always thought UREs didn't affect mirrors like they do RAIDz1 and 2 because of the fact that a mirror does not use a parity disk. I don't think a URE affects Mirrors during the re-mirroring process. Isn't that like one of the biggest selling points of mirrors, besides better random IO?
Of course it does--how could it not? If you have a two-disk mirror, one disk dies, you're trying to replace it, and you have a read error on the only remaining copy of your data, how would that not cause data loss?
Yes, it is a case of the disk's error rate. Nothing to do with RAID-5/6 or RAID-Z1/2/3.

If a disk's error rate is 1 in X, and you have to read X from a disk to recover from a different failed disk, then you have a chance of getting a URE. With newer disks being 5 x X in size, the possibility of a URE is much higher.

The problem is compounded by RAID-5/6 or RAID-Z1/2/3 because at times you have to read much more that X, because of the disk stripe. Thus, people have been suggesting RAID-6 or RAID-Z2/3 as a way to over come the problem.


It is possible to have hundreds / thousands, even millions of UREs during recovery and still have 100% data recovery. As long as you have enough redundancy in the individual stripes, ZFS can complete the re-silver.

However, this is much less likely in 2 way mirrors. If a disk fails completely in a 2 way mirror, and the data has a single copy, (aka ZFS "copies=1"), then any URE on a data block will result in failure of recovery for that block / file.

Note that ZFS by default has an extra copy of metadata, and even more of critical metadata. So in the case of a directory entry having a URE, another copy, ON THE SAME DISK, is available for recovery purposes. This is because metadata in some respects is more important that raw data, simply because it can cause much larger amount of data loss.
 

Scharbag

Guru
Joined
Feb 1, 2012
Messages
620
Just for a note, new 13.0-U2 got number of scrub improvements, some of which should make the process a bit more sequential, that may help SMR a bit, if at all possible. Though the main direction of the improvements was to reduce scrub CPU usage.
Given I am mid scrub, and mid testing of the new drives, I will upgrade to 13.0U2 once everything is done. I plan on getting rid of all of my SMR drives ASAFP.

:)
 

Scharbag

Guru
Joined
Feb 1, 2012
Messages
620
1662159198763.png


Yeah, SMR can rot in the bowels of hell.
 

Scharbag

Guru
Joined
Feb 1, 2012
Messages
620
So, yeah, this is almost done...

1662733246546.png


Looks like a 20TB disk will take a little over 9 days to do a full burn in as there is still a 24 to 26 hour SMART extended test to do.

But, confidence should be high that these manufacturer rectified drives will last.

Then we move on to moving all of the data across...
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919

Scharbag

Guru
Joined
Feb 1, 2012
Messages
620

Scharbag

Guru
Joined
Feb 1, 2012
Messages
620
Finally done at about 4pm today. So, yeah, 9+ days.

Here goes the copy :)
 
Top