Resilver in Progress - Wonky Drives

sypher

Cadet
Joined
May 7, 2019
Messages
5
Hi guys,
bit of a problem here.

A drive failed, other drives have subsequently shown signs of bad health.

I however don't understand the current resiliency of the tank, i'd like some help in understanding that.

pool: tank
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sat Dec 4 09:20:33 2021
11.7T scanned at 9.28G/s, 7.19T issued at 1.25G/s, 32.9T total
994G resilvered, 21.88% done, 05:51:26 to go
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/5c023dac-7ed1-11e7-b44d-645106d8d754 ONLINE 0 0 0
gptid/0c404deb-54db-11ec-af93-000c2918b177 ONLINE 0 0 0 (resilvering)
gptid/5d9af434-7ed1-11e7-b44d-645106d8d754 ONLINE 0 0 0
gptid/5e46a3dd-7ed1-11e7-b44d-645106d8d754 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
gptid/8ada6537-8031-11e7-a999-000c2907ef12 ONLINE 0 0 0
gptid/8c05a1b5-8031-11e7-a999-000c2907ef12 ONLINE 0 0 0
gptid/8d23771a-8031-11e7-a999-000c2907ef12 ONLINE 0 0 0
gptid/e8fd3923-5159-11ec-92fa-000c2918b177 ONLINE 0 0 0
logs
gptid/d58f7ae3-cf25-11e8-9dd2-000c2918b177 ONLINE 0 0 0
cache
gptid/0d898ed7-51e2-11ec-82e4-000c2918b177 ONLINE 0 0 0

errors: No known data errors


Problem is the following: resilver takes *a long* time, you can see 5 hours in the timer, but that's just cuz i just reset the system, otherwise it would not show anything.

I have problems with 2 other disks, as can be seen from here:

Device: /dev/da10 [SAT], failed to read SMART Attribute Data.​

Device: /dev/da8 [SAT], Self-Test Log error count increased from 5 to 6.​


Smart data of the devices is as follows:

DA10 (this takes a few seconds to even come up, the times it does come up) - the disk is new

=== START OF INFORMATION SECTION ===
Device Model: WDC WD60EDAZ-11U78B0
Serial Number: WD-
LU WWN Device Id: 5 0014ee 214059afd
Firmware Version: 80.00A80
User Capacity: 6,001,175,126,016 bytes [6.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Dec 12 23:57:42 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 253 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 220 220 021 Pre-fail Always - 4000
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 9
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 206
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 8
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 2
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 8
194 Temperature_Celsius 0x0022 113 112 000 Old_age Always - 37
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Interrupted (host reset) 90% 182 -
# 2 Short offline Completed without error 00% 158 -
# 3 Short offline Completed without error 00% 134 -
# 4 Short offline Completed without error 00% 110 -
# 5 Short offline Completed without error 00% 87 -
# 6 Short offline Completed without error 00% 63 -
# 7 Short offline Completed without error 00% 39 -
# 8 Extended offline Interrupted (host reset) 10% 29 -

DA8
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Blue
Device Model: WDC WD60EZRZ-00RWYB1
Serial Number: WD-
LU WWN Device Id: 5 0014ee 262d0d748
Firmware Version: 80.00A80
User Capacity: 6,001,175,126,016 bytes [6.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5700 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Dec 13 00:01:16 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 190 184 021 Pre-fail Always - 9466
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 109
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 050 050 000 Old_age Always - 36834
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 108
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 100
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 1412871
194 Temperature_Celsius 0x0022 114 102 000 Old_age Always - 38
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 252
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 36810 620535024
# 2 Short offline Completed: read failure 90% 36786 620535024
# 3 Short offline Completed without error 00% 36762 -
# 4 Short offline Completed: read failure 90% 36738 620535024
# 5 Short offline Completed without error 00% 36714 -
# 6 Short offline Completed: read failure 90% 36690 620535024
# 7 Short offline Completed: read failure 90% 36666 620535024
# 8 Extended offline Completed: read failure 10% 36652 620535024
# 9 Short offline Completed without error 00% 36619 -
#10 Short offline Completed without error 00% 36595 -

Resilvering drive is DA10.

Any help, recommendations, suggestions, even hugs - everything's accepted.
Thanks!
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Hardware spec please. Include all drive models and what they are being used for (on the assumption they may be different)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Any help, recommendations, suggestions, even hugs - everything's accepted.

Intended as constructive criticism -- you're fine so far ("no lost data"), but be more than a bit worried.

RAIDZ1 only covers you for one error. You can lose an HDD and you're fine as long as you can still read EVERY REMAINING block in the pool (because any unreadable sector on the remaining drives would be a second error, which RAIDZ1 cannot recover).

Your new WD60EDAZ looks to me like it is probably an SMR drive. These are not compatible with ZFS (some may quibble over this).


Your drive, being less than 8TB and with the "A" in the model number, suggests SMR. These drives are susceptible to failure when used with ZFS, because they are meant for extremely low volumes of mostly sequential I/O. These work fine on Windows, but not ZFS. You can Google "ZFS" and "SMR" for many links on the topic, or read links derived from the blog post above. Be aware that SMR and ZFS has been a bit of a black eye for both iXsystems and drive manufacturers, so there is some attempt at positive spin in corporate statements. Read corporate posts skeptically. Also read end-user blog posts skeptically, since end users feel like they've been screwed by WD and Seagate, and are more than a bit angry in many cases.

Your *best* course of action, if you can, and I realize this sucks, would be to build a new RAIDZ2 pool on CMR drives and copy your data to it. There is no way to "fix" a RAIDZ1 pool's inherent limited redundancy. It looks to me like you have two four-disk RAIDZ1 vdevs, and while this offers some better performance than a larger single eight-disk RAIDZ2 vdev, it is risky if you lose a drive on a RAIDZ1.
 

sypher

Cadet
Joined
May 7, 2019
Messages
5
Thanks jgreco, extremely informative and punctual (as usual - i've been a bit of a lurker, but i value your insight on every post).

Alright, i'll have to check the rest of my pool, see if the resliver is actually working, then buy a 10-12tb drive and use it to move all the data.

This does really suck.

Any easy way to check for SMR drives?

My drives are the following:

2x ATA WDC WD60EZRZ-00R
5x ATA WDC WD60EZRZ-00G
1x ATA WDC WD60EDAZ-11U

for the RAIDZ2 vs RAIDZ1, i started with a RAIDZ1, then got another 4 disks and expanded it.
In the end, i wanted capacity and not redundancy, its a mistake that now, 3 years after, i guess i'll be paying.

Either way, i've learnt my lesson.

Now off to save the actual data off it, and we'll see ...
 

sypher

Cadet
Joined
May 7, 2019
Messages
5
Also, just a question - a single SECTOR failing on a second hard drive will bring the whole pool down with no chance of taking data out?

if that's a yes, is there any way to stop the resilvering now and take data out while i can still access it?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Thanks jgreco, extremely informative and punctual (as usual - i've been a bit of a lurker, but i value your insight on every post).

You just lucked out that I'm a bit insomniac tonight. :smile: Thanks for the compliment though. It's my goal to demystify the complicated.

Any easy way to check for SMR drives?

Yorick posted a guide to "known" ones.


However, this is not comprehensive, and I believe yours falls under the "various" guideline (entry #9). My guess at yours being SMR is formed by the 6TB size and the letter "A" in the model number, but could be wrong. The actual judging of SMR is left for an exercise to the reader (sorry), but you did arrive here with "issues" with that drive. Just sayin'. :-/

a single SECTOR failing on a second hard drive will bring the whole pool down with no chance of taking data out?

No. So here's the deal.

Critical ZFS metadata is redundant within a pool, above and beyond the block-level redundancy normally meant when we speak of "redundancy". This means multiple COPIES of critical ZFS metadata is stored. If you manage to wipe out critical metadata (which means damage/loss of all copies), hope is lost for some or all of the pool (depending on what specific metadata). ZFS would normally try to store copies on different vdevs if possible.

Noncritical ZFS metadata, such as, for example, the list of blocks that make up a file, is not so thoroughly protected. Loss or corruption of such metadata blocks results in the loss of related files, or, if the "file" is actually a directory (remembering that a UNIX directory is just a bizarre file of sorts), then the loss of that directory, along with no longer having a way to access the things within it, even though those files or subdirectories might be perfectly intact in their on-disk representation.

ZFS data blocks, which make up files, are not normally protected beyond RAIDZ/mirroring. It is possible to ask ZFS to store multiple copies of blocks through use of the "copies=" property, which is supposed to attempt to store copies on different vdevs if possible, but it isn't always going to be able to. Therefore you should not rely on that. Loss of a vdev is typically fatal to ZFS anyways.

A single sector failing when you have already lost an entire hard drive may render a ZFS block irretrievable, as there is neither the original block contents or enough redundancy available to rebuild it.

An irretrievable block within a file results in data loss within the file.

An irretrievable metadata block for a file results in loss of the file.

An irretrievable critical metadata block (meaning all copies) results in damage to the pool, which could be loss of a directory or even loss of the pool.

The failure that everyone is terrified of when using RAIDZ1 is when a drive fails, and, then, due to the strain placed on the system during a resilver, a second drive starts to lose sectors, possibly due to overheating due to all the unusual activity. This absolutely has a good chance to make some files irretrievable/truncated/damaged, and if enough sectors start vanishing from the remaining drives, metadata may go too, and then, much sadness.

ZFS has no fsck mechanism, because "repair" is such a weird concept when you're putting all your efforts into maintaining the integrity of the stored data to begin with. Filesystems like FAT and UFS benefit from fsck/chkdsk because writes can be interrupted mid-stream and before metadata is committed. With ZFS, metadata is always being written to the pool and to the ZIL, so inconsistent states shouldn't be possible. This allows the incredibly complex structures for things like snapshots and deduplication to be possible, because you're always supposed to be able to trust the pool.

But it also underlines why pool integrity is a concern with ZFS, and why some of us don't mind throwing "RAIDZ3 plus a warm spare" at pools where we'd prefer not to lose our data. In reality, loss of a single sector in addition to another drive's failure is not likely to result in an ACTUAL significant data loss, but, then again, who's to define what "significant" is? If you're storing your crypto for a million dollars worth of bitcoin and it becomes irretrievable, loss of that file could be ... annoying. Therefore, I protect anything important with a minimum of RAIDZ2 or mirrors, preferably RAIDZ3 or three-way mirrors, and typically backups as well.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
The accumulated knowledge of the forums on SMR is here:
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The accumulated knowledge of the forums on SMR is here:

*cough* already posted above *cough* :smile:
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
*cough* already posted above *cough*
Right... I did a scan for links and didn't see it... the new way the dark theme is flattening them makes it easier (for me at least) to miss that it's a link.

Anyway, better twice than not at all.
 

sypher

Cadet
Joined
May 7, 2019
Messages
5
*cough* already posted above *cough* :smile:


Just a quick update - i replaced the SMR drive with an 8TB CMR drive.

The difference is *HUGE*.

Resilver before did not finish after 20 days
Resilver now is finishing after 24 hours (20 times faster!!)

I'm stunned.

Now onwards to try and find a way to host 12 drives ...
 
Top