SOLVED Unexpected HDD behaviour

Nogtail · Dec 29, 2022

I recently built myself a system which uses 6 x 8TB IronWolf drives in a RAID-Z2 configuration. I have created the pool and vdev, however I have not shared it or put any files on it. The system dataset is located on the boot pool so it should not be accessing the drives either. All other settings are installation defaults.

The hard drives are making constant noise as though they are accessing/writing data. Disk I/O graphs and gstat show no activity on anything but the boot drive. The SMART data shows 25 power on hours, slightly over 1 hour of head flying time and a load cycle count of over 240. I ran the command a few minutes later and the load cycle count had increased by another 4. The SMART values for all disks are almost identical.

What is causing all the disk activity? Is there any way I can keep the load cycle count down? At this rate they'll be reaching the rated 600,000 in only a few years.

winnielinnie · Dec 29, 2022

Did you enable any power-saving or acoustic features?

If TrueNAS Core, did you create any jails? If SCALE, did you install any Apps?

Nogtail · Dec 29, 2022

winnielinnie said:
Did you enable any power-saving or acoustic features?

If TrueNAS Core, did you create any jails? If SCALE, did you install any Apps?

APM/AAM are both set to the default disabled state for all drives. My drives don't support APM so I can't imagine that would be the problem.

I'm using Core, probably should have specified that in the question! No jails, plugins, VMs or anything. I have a pair of SSDs that I plan on using for that sort of stuff eventually, but wanted to work out what my HDDs were doing before setting them up.

winnielinnie · Dec 29, 2022

Nogtail said:
The SMART data shows 25 power on hours, slightly over 1 hour of head flying time and a load cycle count of over 240. I ran the command a few minutes later and the load cycle count had increased by another 4.

That seems worrying. There shouldn't be a reason for your drives to keep load cycling like that.

Especially since you're using the defaults (No idle standby, no APM, no acoustic management.)

Maybe the GUI's options didn't take effect?

Check via SSH or the Shell:
camcontrol identify /dev/ada0

Replace ada0 for your relevant drive(s).

Nogtail · Dec 29, 2022

winnielinnie said:
That seems worrying. There shouldn't be a reason for your drives to keep load cycling like that. Especially since you're using the defaults (No idle standby, no APM, no acoustic management.)

Maybe the GUI's options didn't take effect?

Check via SSH or the Shell:
camcontrol identify /dev/ada0

Replace ada0 for your relevant drive(s).

Anything in particular I should be looking for? The start stop count and power cycle count are both low so it seems to be keeping the drives spinning but constantly parking the heads. I'd expect it to park the heads once it has been idle for a while but I'm not sure why it feels the need to unpark them then park them again every few minutes.

Nogtail · Dec 29, 2022

One possible explanation I've found for the HDD activity is that by default TrueNAS creates a 2GB swap partition on each drive. I'm not sure if this is required in my case, as I have a 16GB swap partition on my boot SSD. I'm not sure why it would be using swap as I have 32GB of RAM which is mostly unused. Strangely the swap partition on my boot SSD isn't showing up in swapinfo.

winnielinnie · Dec 30, 2022

Nogtail said:
Anything in particular I should be looking for?

Do you see any values using camcontrol that are not reflected by the GUI? Such as if acoustic management, advanced power management, and sleep/suspend times are enabled?

Nogtail said:
One possible explanation I've found for the HDD activity is that by default TrueNAS creates a 2GB swap partition on each drive.

You can manually or automatically disable this swap, but I doubt that a fresh system is already using it. If swapinfo doesn't show any swap being used, then you can rule that out as the reason for the drives' behavior.

If it means anything, for comparison, all my WD Red Plus Drives have a "Load_Cycle_Count" that coincides to "once per day". Perhaps it's an internal mechanism to keep the drives healthy by making sure they can park and unpark the heads. That means 365 "Load_Cycle_Counts" per year.

joeschmuck · Dec 30, 2022

@Nogtail You should post your system specs per the forum rules. You are having people guessing at what is going on and it will take a long time to provide you the solution. I for one hate to guess randomly.

Also, when someone asks you to perform a command, generally we are looking for you to post the output of that command. Leaving it up to you to decipher it could be problematic.

Nogtail said:
Is there any way I can keep the load cycle count down? At this rate they'll be reaching the rated 600,000 in only a few years.

Yes, do not try to sleep / spin the drives down, but it doesn't look like you are trying to do that.

Another thing to try, disconnect the Ethernet cable to your NAS, see if the head loading stops. Often any system on the LAN can poll the devices and that will trigger the NAS to load the heads. If this in not helping, plug the NAS cable back in and go to the next step.

Post the output of smartctl -a /dev/ada0 where I'm assuming drive ada0 is listed. If you are using an HBA it could be /dev/da0 or da1 for example. I need to see the output in full. Post it in code brackets please. Then wait about 30 minutes to 1 hour and generate another smartctl output and post that. Please note in the posting the duration between the data points since the Power On Hours is only in whole hours based on power on time.

winnielinnie · Dec 30, 2022

joeschmuck said:
@Nogtail
Yes, do not try to sleep / spin the drives down, but it doesn't look like you are trying to do that.

Another thing to try, disconnect the Ethernet cable to your NAS, see if the head loading stops. Often any system on the LAN can poll the devices and that will trigger the NAS to load the heads. If this in not helping, plug the NAS cable back in and go to the next step.

After reading older posts in this forum, as well as all over the internet, it appears that this (constant parking/unparking of heads) is by design for the Seagate Iron Wolf drives. In order to disable the behavior, you might need to use third-party tools.

You'll notice a pattern of complaints:
* Not good for my NAS setup
* Constant "clicking" sound every 5-10 seconds
* Load_Cycle_Count is climbing very fast
* Not using any advanced power management features

joeschmuck · Dec 30, 2022

winnielinnie said:
it appears that this (constant parking/unparking of heads) is by design for the Seagate Iron Wolf drives. In order to disable the behavior

No kidding. You know I'm going to have to research this after I get back from a trip to town. I just find it hard to believe. I'm curious of the Seagate tools (what we used for the Red line) is capable of changing the behavior.

NugentS · Dec 30, 2022

They should be able to - but I don't have an Ironwolf to test with

joeschmuck · Dec 30, 2022

I'm curious if Seagate has a tool like Western Digital's WDIDLE3 to adjust the head parking timer. I did not find many sources that said why the head parks every 3 minutes (that was the value I found) but it then at some point will load the heads again. This is assuming the drive is doing this without any data interaction.

you can test the drive if it is doing the loading/unloading all on it's own if you wanted to. Just power on the system to the BIOS screen, see if the problem persists. That is one option and it keeps the data cable connected.

Here is a suggestion and I'm actually curious if it works.

In the GUI, select Storage -> Disks -> Edit each of your drives to change the Advance Power Management to 254 - Maximum performance. After saving the changes, if the drives do not stop clicking, reboot and monitor. I hope this will fix the issue but honestly, I have no idea why Seagate would park the heads and then load them again so frequently. And I'm still leaning towards the fact that the drive is being polled which is waking it up and loading the heads again, but I'm not certain. I hate taking some internet advice that doesn't come from a reputable place or someone that I trust.

Good luck!

joeschmuck · Dec 30, 2022

More research... I don't know which drive you have but this is for an IronWolf drive. Note the power modes. While I thin this is what is happening, I also think something it asking the drive for data after it enters one of these power modes. I hope that selecting APM 254 still solves this.

Nogtail · Dec 30, 2022

joeschmuck said:
I'm curious if Seagate has a tool like Western Digital's WDIDLE3 to adjust the head parking timer. I did not find many sources that said why the head parks every 3 minutes (that was the value I found) but it then at some point will load the heads again. This is assuming the drive is doing this without any data interaction.

you can test the drive if it is doing the loading/unloading all on it's own if you wanted to. Just power on the system to the BIOS screen, see if the problem persists. That is one option and it keeps the data cable connected.

Here is a suggestion and I'm actually curious if it works.

In the GUI, select Storage -> Disks -> Edit each of your drives to change the Advance Power Management to 254 - Maximum performance. After saving the changes, if the drives do not stop clicking, reboot and monitor. I hope this will fix the issue but honestly, I have no idea why Seagate would park the heads and then load them again so frequently. And I'm still leaning towards the fact that the drive is being polled which is waking it up and loading the heads again, but I'm not certain. I hate taking some internet advice that doesn't come from a reputable place or someone that I trust.

Good luck!

I don't think my drives support APM but I made the change anyway to see if there's any difference. So far it doesn't seem to have affected the rate load cycles increasing. From memory the HDDs spin down when sitting on the BIOS screen for a significant period of time so I'm not sure if that would be too useful but I might give it a go.

I ended up doing a clean install with swap on the data drives disabled to see if that would make any difference, but if anything it increased the frequency of head parks - seems to be parking around 10x per hour. I think there may be a bug in TrueNAS where the swap partition on the boot SSD is unused if swap is configured on data drives. It seems when a pool is created it replaces any swap already configured.

openSeaChest is installed by default so I used it to read the power EPC settings:

Code:

===EPC Settings===
        * = timer is enabled
        C column = Changeable
        S column = Savable
        All times are in 100 milliseconds

Name       Current Timer Default Timer Saved Timer   Recovery Time C S
Idle A     *1            *1            *1            1             Y Y
Idle B     *1200         *1200         *1200         4             Y Y
Idle C      0             6000          6000         60            Y Y
Standby Z   0             9000          9000         150           Y Y

This seems to mostly match up with the table you found. I assume 0 means the state is disabled. I'm not sure if I want to make any changes as I assume the defaults are there for a reason.

I've come across this thread which seems to have the same issue: https://www.truenas.com/community/t...een-epc-idle_a-and-idle_b-power-states.90751/

It'd be great to know what keeps waking the drives. I'd prefer to keep the drive settings at default and allow them to enter the idle_b state.

joeschmuck · Dec 30, 2022

Nogtail said:
From memory the HDDs spin down when sitting on the BIOS screen for a significant period of time so I'm not sure if that would be too useful but I might give it a go.

If that is true, then the drives are working as i would expect them to, spinning down and staying down.

Nogtail said:
I'd prefer to keep the drive settings at default and allow them to enter the idle_b state.

I can understand that. But if you know that the drives are loading the heads every 5 minutes then you could change the Idle_B value to 6 minutes (3600). This would allow the heads to stay loaded until the 6 minute mark vice 2 minute mark. And if something is happening every 5 minutes, well the heads remain loaded until after a 6 minute period of time.

Nogtail said:
It'd be great to know what keeps waking the drives.

I agree. I'm thinking it could be the System Dataset is still on the hard drives if the problem still happens when the Ethernet cable is disconnected. The System Dataset I think is accessed every 5 minutes. not positive it still is that time period. Another thing I haven't heard back from you on.

joeschmuck said:
Post the output of smartctl -a /dev/ada0 where I'm assuming drive ada0 is listed.

Still waiting on this info. Can't help much more without knowing exactly the drives you have.

winnielinnie · Dec 30, 2022

Nogtail said:
I think there may be a bug in TrueNAS where the swap partition on the boot SSD is unused if swap is configured on data drives. It seems when a pool is created it replaces any swap already configured.

You can rule out swap causing this behavior by outright disabling all swap:
swapoff -a

Confirm no swap is active:
swapinfo

winnielinnie · Dec 30, 2022

joeschmuck said:
I'm thinking it could be the System Dataset is still on the hard drives if the problem still happens when the Ethernet cable is disconnected. The System Dataset I think is accessed every 5 minutes.

Yes, please confirm this. Even if it might seem like a waste of time, it helps to rule things out sooner than later.

@joeschmuck it's almost like clockwork: every 5 minutes there is a notable write operation to the drives that house the System Dataset. (Not an issue if it's on the boot-pool or a separate SSD-only pool.)

Nogtail · Dec 30, 2022

joeschmuck said:
If that is true, then the drives are working as i would expect them to, spinning down and staying down.

I can understand that. But if you know that the drives are loading the heads every 5 minutes then you could change the Idle_B value to 6 minutes (3600). This would allow the heads to stay loaded until the 6 minute mark vice 2 minute mark. And if something is happening every 5 minutes, well the heads remain loaded until after a 6 minute period of time.

I agree. I'm thinking it could be the System Dataset is still on the hard drives if the problem still happens when the Ethernet cable is disconnected. The System Dataset I think is accessed every 5 minutes. not positive it still is that time period. Another thing I haven't heard back from you on.

Still waiting on this info. Can't help much more without knowing exactly the drives you have.

I might have been wrong about the drives spinning down in BIOS, it doesn't make sense for them to do that if Standby Z is disabled. I don't hear any activity so the heads seem to stay parked though.

I've done a bit of further testing and it seems the drives wake up almost exactly on the 5 minute mark. I assume this is a side effect of the drive monitoring, as SMART commands appear to wake the drive. As the drive is idle it parks the heads 2 minutes later, only for the process to repeat in another 3 minutes. Disabling SMART appears to fix the issue, although I'd rather not do that.

winnielinnie said:
Yes, please confirm this. Even if it might seem like a waste of time, it helps to rule things out sooner than later.

@joeschmuck it's almost like clockwork: every 5 minutes there is a notable write operation to the drives that house the System Dataset. (Not an issue if it's on the boot-pool or a separate SSD-only pool.)

The system dataset is located on the boot pool. I did a reinstall with swap disabled on the HDDs and it still has the issue so swap wasn't the problem.

Here is the output of smartctl -a /dev/ada0:

Code:

root@truenas[~]# smartctl -a /dev/ada0
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST8000VN004-2M2101
Serial Number:    (Removed for privacy)
LU WWN Device Id: 5 000c50 0e09d2a3d
Firmware Version: SC60
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Dec 30 18:12:48 2022 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  559) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 701) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x50bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   044    Pre-fail  Always       -       649667
  3 Spin_Up_Time            0x0003   083   083   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       39
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   045    Pre-fail  Always       -       34558
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       52
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       39
 18 Head_Health             0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   064   053   040    Old_age   Always       -       36 (Min/Max 34/37)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       19
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       456
194 Temperature_Celsius     0x0022   036   047   000    Old_age   Always       -       36 (0 19 0 0 0)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       649667
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       2h+00m+29.737s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       310037
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       339630

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

ChrisRJ · Dec 31, 2022

The timeout for unloading the heads can be changed using camcontrol. I had a similar behavior for my Seagate Exos X16 drives. A short OOTB timeout makes sense there, given that they are data center drives and in that environment any idle phase is unlikely to last longer than a few seconds. For NAS drives I am not sure I like a 5 minute timeout, tough.

As an example only, here is the command I run on all my HDDs:

Code:

camcontrol epc /dev/ada2 -c state -d -p Idle_b -s

You will need to adjust this to your system. Please consult the man page for more information.

I have decided to configure this as part of a startup script. The reason being that whenever a drive needs replacement, I would otherwise forget to change that value.

joeschmuck · Dec 31, 2022

ChrisRJ said:
For NAS drives I am not sure I like a 5 minute timeout, tough.

I personally agree which is why I prefer to disable the timer all together and let my drives spin and heads remain loaded.

@Nogtail I see a few things that concern me about the Hard Drive data.
1) The head fly hours is only 2 hours over a 52 hour power on time. This means the heads are definitely unloaded a lot.
2) As you have noticed, the head loading counter is incrementing at a fairly fast rate. The whole reason for this posting.
3) You have not run any SMART tests on the drive. You should run one SMART long test now just to verify the drive is going to pass the test without errors. Then in the TrueNAS GUI, setup routine SMART Short and Long tests. I recommend a weekly Long Test and a Daily Short test. I have my Short Test start at 2:00 AM and my Long Test start at 2:05 AM. This will cover any overlap and make scheduling less complicated. And you can schedule all the drives to do it at the same time of if you really wanted to stagger the Long Test with one drive per day, that is fine. I've seen people do this, I do them all on the same day. The Long Test will take at least 12 hours for your drives. This test also takes a backseat to any requested drive activity meaning the test will take longer if you have drive activity.

@ChrisRJ The command you specified above, does that survive a power cycle or does it need to be applied each time? It sounds like the "-s" would allow it to survive. I understand why you would desire a script if you were replacing hard drives often and I understand that sending the command multiple times has no ill effect.

Important Announcement for the TrueNAS Community.

SOLVED Unexpected HDD behaviour

Cadet

MVP

Cadet

MVP

Cadet

Cadet

MVP

Old Man

MVP

Old Man

MVP

Old Man

Old Man

Cadet

Old Man

MVP

MVP

Cadet

Wizard

Old Man

Similar threads