Resilvering notifications for no reason. WD60EFAX to blame?

metalliqaz · Jun 15, 2020

So, overnight I received a notification that the system was resilvering ("Pool SATA_ARRAY state is ONLINE: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.. ") Only a few minutes later the notification went away. I obviously did not replace any drives.

This system has a WD60EFAX drive in it. (I bought it before it was generally known to be DM-SMR) Could that have something to do with this behavior?

artlessknave · Jun 15, 2020

yup. there is a reason there is so much stink about WD sneaking them in. STH measured resilver times of like 9 days on SMR drives vs like 12 hours on CMR. any time the drive responds too slowly for normal pool performance and gets out of sync zfs will resilver it automatically by design. you probably want to seriously consider replacing it

also, your sig seems to indicate you have a stripe pool, and thus no redundancy anyway....?

HoneyBadger · Jun 15, 2020

artlessknave said:
also, your sig seems to indicate you have a stripe pool, and thus no redundancy anyway....?

For those on mobile here's the line:

metalliqaz said:
1x WD Red 6TB + 3x WD Red 3TB RAIDZ (12TB)

Assuming you mean RAIDZ1, the only way you're getting "12TB usable" from this collection of drives is if you cut the 6TB into two 3TB partitions, and then built a RAIDZ1 out of "all five" of the 3TB chunks. Please tell me I'm misunderstanding the situation here.

metalliqaz · Jun 15, 2020

artlessknave said:
yup. there is a reason there is so much stink about WD sneaking them in. STH measured resilver times of like 9 days on SMR drives vs like 12 hours on CMR. any time the drive responds too slowly for normal pool performance and gets out of sync zfs will resilver it automatically by design. you probably want to seriously consider replacing it

also, your sig seems to indicate you have a stripe pool, and thus no redundancy anyway....?

It is a normal RAIDZ1 pool, initially 4 3TB drives, but one failed. I replaced it with the WD60EFAX which resilvered fine. So, in reality I'm only using half the drive for now. I originally intended to eventually replace all the drives, thus doubling the final capacity. Obviously i'm not going to buy any more WD60EFAX crap.

metalliqaz · Jun 15, 2020

HoneyBadger said:
Assuming you mean RAIDZ1, the only way you're getting "12TB usable" from this collection of drives is if you cut the 6TB into two 3TB partitions, and then built a RAIDZ1 out of "all five" of the 3TB chunks. Please tell me I'm misunderstanding the situation here.

I updated the signature. I explained it above. It's only 9TB usable.

HoneyBadger · Jun 15, 2020

metalliqaz said:
I updated the signature. I explained it above. It's only 9TB usable.

Okay, that makes more sense. I figured either I was reading something wrong or you'd accidentally written the raw space before parity.

And yes, you definitely want to avoid SMR in the future. There's a (partial) list and a statement from iXsystems here:

WD Red SMR Drive Compatibility with ZFS

Thanks to the FreeNAS community, we have uncovered a potential ZFS compatibility issue with some capacities of newer WD Red drives that use SMR (Shingled Magnetic Recording) technology. Update to this post...

www.ixsystems.com

A potentially relevant part is bullet point 4:

At least one of the WD Red DM-SMR models (the 4TB WD40EFAX with firmware rev 82.00A82) does have a ZFS compatibility issue which can cause it to enter a faulty state under heavy write loads, including resilvering. This was confirmed in our labs this week during testing, causing this drive model to be disqualified from our products. We expect that the other WD Red DM-SMR drives with the same firmware will have the same issue, but testing is still ongoing to validate that assumption.

If your SMR drive had a hiccup and went into a "took too long to respond" sort of state, that could have caused ZFS to kick it out momentarily, and then when it reappeared it resilvered the data and everything was good again. But in your case I'd pull a smartctl -a of all drives and take a look for any signs of early failure (eg: non-zero values in Reallocated_Sector_Ct, Current_Pending_Sector, or Offline_Uncorrectable) - if it was one of your other drives that choked up, you might need to buy two new (non-SMR) drives to get yourself back to full health and full speed.

Important Announcement for the TrueNAS Community.

Resilvering notifications for no reason. WD60EFAX to blame?

metalliqaz

Dabbler

artlessknave

Wizard

HoneyBadger

actually does care

metalliqaz

Dabbler

metalliqaz

Dabbler

HoneyBadger

actually does care

WD Red SMR Drive Compatibility with ZFS

Similar threads

Important Announcement for the TrueNAS Community.

Resilvering notifications for no reason. WD60EFAX to blame?

metalliqaz

Dabbler

artlessknave

Wizard

HoneyBadger

actually does care

metalliqaz

Dabbler

metalliqaz

Dabbler

HoneyBadger

actually does care

WD Red SMR Drive Compatibility with ZFS

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Resilvering notifications for no reason. WD60EFAX to blame?"

Similar threads