WD Reds - Load Cycle Count weirdness

Status
Not open for further replies.

philiplu

Explorer
Joined
Aug 10, 2014
Messages
58
I've got 8 4TB WD Reds that have been spinning around 3 months now, running 9.2.1.7. They're all hooked up to a SuperMicro X10SL7-F. 6 of the drives are in RaidZ2 attached to the on-board LSI HBA. The other two are located in a SuperMicro CSE-M35T-1B 5-in-3 hot swap bay, with the bays communicating through the SATA 3Gbps chipset ports.

The 2 hot-swap drives have had their Load Cycle Counts increasing by roughly 100 a day (which works out to about every 15 minutes). The 6 LSI-connected drives have LCCs that stayed basically constant. I don't understand why the hot-swap drives are seeing 100 unloads a day, but I didn't worry about it too much.

But I just noticed that about 12 days ago, when the drives had about 1750 hours on them, the 6 LSI-connected drives started unloading 100 times a day as well. The system's been powered up 43 days, so there wasn't a reboot to explain the change in behavior. I checked /var/log/messages for that time period, and nothing out of the ordinary appears there. I also looked back through the Reporting graphs for Disk & CPU, and don't see anything weird there either. I don't recall anything strange happening on the system 12 days ago (wish I'd noticed sooner, when anything out of the ordinary would be easier to remember).

So does anyone have any idea why WD Reds sometimes go days, even weeks without unloading, other times unload regularly, and switch from one state to the other with no discernible reason?

I run a daily script that saves the output of smartctl -x for all the drives. I've attached three such logs. The one from 2014/11/04 shows the last log before the weird LCC increases started showing up everywhere. The 2014/11/05 log is the first one where the LCC increases appear on the LSI-connected drives. The 2014/11/16 is last night's log, just to show the LCC has been increasing consistently on all the WD Reds since it started. In the logs, da0 to da5 are the 6 LSI-connected drives, da6 & da7 are two mirrored SSDs, and ada0 and ada1 are the hot-swap bay drives which have had LCC increasing all along at the ~100/day rate.
 

Attachments

  • smartlog-201411042355.txt
    104.3 KB · Views: 290
  • smartlog-201411052355.txt
    106.8 KB · Views: 254
  • smartlog-201411162355.txt
    117.8 KB · Views: 241

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
This is very interesting. You got me. I just checked all of my WD reds. They've been in service about 13 months, I have precisely 68 unloads logged for each.

I have every power saving feature turned off----no drive standby, no power/acoustic management, etc. Always on. How is yours set?
 

marbus90

Guru
Joined
Aug 2, 2014
Messages
818
There has been a "bad" charge of Reds, whose Firmware doesn't keep the drives running 24/7. Don't know if that applies for you, but I would start with the RMA process or reflash those.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I don't know what to say...
 

philiplu

Explorer
Joined
Aug 10, 2014
Messages
58
Hah, figured this would turn out to be my fault. I looked back at file modification times, and something did happen back on Nov 4th. I've got a script which runs 'smartctl -l scttempsts' to log the temps of all my drives, for loading into a spreadsheet occasionally. Early on, I was running that every minute, but backed that down to every 10 minutes on, yep, the 4th, about a half-hour before the first log pasted above was written. The log is written to my main RaidZ2 pool, so once a minute was sufficient to keep the drives from spinning down. That log didn't affect the two hot-swap drives, so they're unloading ~100 times a day.

Did some more experimenting and looking at old logs, and it's clear now that all my WD Reds have a 300 second unload timer. I'm seeing 15 minute unload frequencies because that's my snapshot & replication period (the main pool gets locally replicated to the disks in the hot-swap bays). It looks like running "smartctl -A" is enough to keep the drives from unloading. When I do that in a loop sleeping 300 seconds between smartctl runs, the Load Cycle Count never increments. Bump the sleep to 310, and I start seeing LCC increments.

So now the question is why are my WD Reds unloading after 300 seconds, but the answer to that really isn't that interesting. I'll probably play around with camcontrol spindown or the like, but since I don't really want the drives to spin down, if that doesn't work (per marbus90's comment above), I can set up a 4 minute cron job that just hits every drive with a smartctl -A.

Note that I don't have any APM or standby counters enabled in the GUI - Storage/Volumes/View Disks shows HDD Standby set to Always On and APM/Acoustic Level to Disabled for all drives.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
So now the question is why are my WD Reds unloading after 300 seconds, but the answer to that really isn't that interesting. I'll probably play around with camcontrol spindown or the like, but since I don't really want the drives to spin down, if that doesn't work (per marbus90's comment above), I can set up a 4 minute cron job that just hits every drive with a smartctl -A.

Or you just change the interval for SMART monitoring to 4 minutes. ;)

In all seriousness, parking the drives when there's no work for them to do is what I do. My Greens are set to 300 seconds. I'm okay with that. Remember that incrementing at a reasonable rate isn't bad. The problem is and has always been that 350k cycles can come pretty damn fast with the default 8 second parking. It will take you far longer to reach a high count than the time you will be using the drives, so it's not the end of the world to leave it like it is.
 

philiplu

Explorer
Joined
Aug 10, 2014
Messages
58
What, use FreeNAS like an appliance and take advantage of the built-in functionality, rather than an excuse for hacking around? Weird.

You're right, I can just live with the default unload time and not worry about it, since 100 unloads a day translates to about 10 years before hitting 350K. I doubt that's a very hard limit, either - I've got a couple 3TB WD Greens in a Windows Home Server 2011/HP Microserver N34L (which the FreeNAS system is slowly replacing). Those drives have about 3 1/4 years on them, and I didn't run wdidle3 until well over a year in. They're currently sitting at about 1 million load/unload cycles each, and have had no trouble under WHS, with no complaints in the SMART data.
 
Status
Not open for further replies.
Top