Aggressive disk activity each 30-60s

Zareon

Dabbler
Joined
Jan 22, 2024
Messages
17
Hello,

First of all, sorry for my noobiness, and my poor english level.
Motherboard: (Don't know the brand) H77H2-em v1.0
CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
RAM: 24GB (2x8GB + 2x4GB)
Disks :
  • Pool 1 (name: Hector_v2): RAIDZ1 -> 3x HDD Barracuda 2To (ST2000DM008-2UB102) (One of them is currently missing, due to a replacement by the manufacturer (which has been going on for too long, the manufacturer not wanting to repair my disk, but either, that's not the reason for the thread), but one of my problems was there before, the second arrived recently).
  • Pool 2 (name: SSD Hector Pool): MIRROR -> 2x SSD Crucial MX500 500Go (CT500MX500SSD1).
  • boot-pool: On a SSD 120Go, connected by USB, directly on motherboard (I didn't found model in TrueNAS. If needed, i can dismantle it to find the model)
You can find in attached files the debug file.

So !

I've had my NAS (Initially TrueNAS Core, upgraded to TrueNAS Scale when I upgraded my disks) for nearly 2 years.

When I switched to TrueNAS Scale (And upgraded my disks to 2Tb), I had a recurring noise that I've seen a few times on the forum. (Like a write burst every 5 seconds or so, but not very noisy) but i was not able to found a solution for this (Probably due to my 'not-so-well' understanding of what I readed)
A few days ago, I switched to Cobia, and I have a new problem on top of the previous one, which makes my NAS experience relatively noisy and unpleasant.

As you can hear in this video (sorry for the teams notification at the end), my disks boot all at once, and then stop after a few seconds.
That occur every 30s to 60s.

I tried to stop all of my apps and vm's, and the problems persists

Everything seems to work fine apart from this noise.
I don't have any errors related to them apart this one :
'boot-pool' is consuming USB devices 'sde' which is not recommended.
Because my system is installed on a ssd that is connected by USB (direct on the motherboard)
Maybe that could cause that ? I don't know

But apart from my discomfort, I fear a reduced lifetime for my disks.

If anyone has experienced something similar and found a solution, or if anyone knows what to do, I'd love to hear from you, thanks!

Zareon
 

Attachments

  • debug-truenas-20240122125559.tgz
    2.3 MB · Views: 38

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
You can safely ignore that USB boot-pool warning... what you're doing is perfectly OK, there's just no effort put into figuring out if your USB disk is a "USB stick" (which is bad) or a real SSD attached over USB (which is fine).

Where is your system dataset? That certainly can generate periodic, but relatively frequent writing and will move to your first data pool if you don't do something about that.

Moving the system dataset to your boot pool looks like a reasonable solution to that issue if that's the case:

System Settings | Advanced | "Storage" section | Configure...
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
ST2000DM008
Hello @Zareon

These disks use Shingled Magnetic Recording (SMR) which is generally considered not suitable for use with ZFS. They also appear to have power savings set, which is causing the spinup/spindown cycles you're hearing.

Please check this community list of SMR drives and use it as an "avoidance" list:

 

Zareon

Dabbler
Joined
Jan 22, 2024
Messages
17
Thanks for your fast replies !
Where is your system dataset? That certainly can generate periodic, but relatively frequent writing and will move to your first data pool if you don't do something about that.

Moving the system dataset to your boot pool looks like a reasonable solution to that issue if that's the case:

System Settings | Advanced | "Storage" section | Configure...
Currently, it's on boot-pool. But this morning, that was on Hector_V2 (My hard drives pool), and I changed it accordingly to a thread that i saw on this forum.
But that seems to change exaclty.. Nothing :(
These disks use Shingled Magnetic Recording (SMR) which is generally considered not suitable for use with ZFS. They also appear to have power savings set, which is causing the spinup/spindown cycles you're hearing.
Does this mean that if I don't want to hear those sounds anymore, I have to change disks? That's not very good news :(

But, despite this, before the Cobia update, I didn't have the big disk activity for all the 30/60s. Does this mean there's another problem?
 

Zareon

Dabbler
Joined
Jan 22, 2024
Messages
17
And I forgot to mention that the SSD Pool is also affected by this activity, accordingly to these graphs :

HDD Pool :
1705950319990.png


SSD Pool (Here, I have some services running, so that's a bit irrelevant, but i'm pretty sure pikes are comming in same time as hdd ones) :
1705950368354.png
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Does this mean that if I don't want to hear those sounds anymore, I have to change disks? That's not very good news :(
Not necessarily, while your drives are SMR and the noise could be coming from them writing data (optimizing and reorganizing), the more important part is the drive can drop offline due to waiting on the SMR drives to write data, hence the incompatibility with SMR drives.

It is also possible that when you changed to SCALE, you coincidentally have filled up your SMR drives with more data (not related to the upgrade) and you are just feeling the effects of SMR drive optimization, which is done by the drive electronics, not TrueNAS. I don't know if that is your specific case but it is quite possible.

You should replace your SMR drives if you desire to use TrueNAS, not for the noise factor but for the data reliability. And I know it's not good news for you to purchase CMR drives after you already have hard drives in your system. When (if) you replace your drives, you are in a predicament where you cannot resilver until after you first replace the drive out of your pool so do not replace any drive except the drive you pulled. Once that drive has been resilvered, you can start replacing the other drives one at a time.

And if you do have a RAIDZ1 and have one drive missing due to failure (RMA), your pool is in great risk. I'm curious why your drive was RMA'd, did it fall out of the pool? If yes, that is an SMR issue most likely.

Good luck.
 

Zareon

Dabbler
Joined
Jan 22, 2024
Messages
17
Not necessarily, while your drives are SMR and the noise could be coming from them writing data (optimizing and reorganizing), the more important part is the drive can drop offline due to waiting on the SMR drives to write data, hence the incompatibility with SMR drives.

It is also possible that when you changed to SCALE, you coincidentally have filled up your SMR drives with more data (not related to the upgrade) and you are just feeling the effects of SMR drive optimization, which is done by the drive electronics, not TrueNAS. I don't know if that is your specific case but it is quite possible.

You should replace your SMR drives if you desire to use TrueNAS, not for the noise factor but for the data reliability. And I know it's not good news for you to purchase CMR drives after you already have hard drives in your system. When (if) you replace your drives, you are in a predicament where you cannot resilver until after you first replace the drive out of your pool so do not replace any drive except the drive you pulled. Once that drive has been resilvered, you can start replacing the other drives one at a time.

And if you do have a RAIDZ1 and have one drive missing due to failure (RMA), your pool is in great risk. I'm curious why your drive was RMA'd, did it fall out of the pool? If yes, that is an SMR issue most likely.

Good luck.
Hey ! Thank you for your reply!
As I said in the first post, I'm not particularly well versed in this area. I don't know exactly what happened to my drive, but the SCRUB was no longer ok, so I looked into replacing it. I didn't detect any particular problems.

As for replacing the hard drives with CMRs, could I do it as I go along? I'll start by replacing the missing drive, but can the other 2 wait for a while, or do they have to be replaced straight away?

Thanks a lot :)
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Don't worry about not being familiar, we all start out somewhere.

I strongly suspect that you failed drive was not actually bad, it was knocked offline or degraded due to being SMR. As for replacing the drives, I recommend you install only CMR drives of course, add a drive to replace the failed drive. As for how important it is to replace the other SMR drives, that depends on how important your data is. I recommend you backup your data and then there is no rush, however should the drives drop offline, you could cause data harm to your pool. I'd have to say that 99.9% of the experienced users here would tell you to replace the drives as soon as possible.

Good luck
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
TLDR,
From the video's audio, the HDD start spinning and a few seconds later, after a few clicks (actuation of the heads as reads or writes), the HDD is spinning down.
Either one of the disk has become faulty and tries to initalize itself or the power-down/sleep/stand-by time is too short.
I think this is caused by a very aggressive low power management that only allow the drive to be up and running only when data need to be accessed.
I would look at extending or disabling the feature altogether.
One thing I can't make sense of it, is the Windows like jingle which can be heard prior to the drive shutting down. Seems to me a SMB connection is dropped.
 

Zareon

Dabbler
Joined
Jan 22, 2024
Messages
17
TLDR,
From the video's audio, the HDD start spinning and a few seconds later, after a few clicks (actuation of the heads as reads or writes), the HDD is spinning down.
Either one of the disk has become faulty and tries to initalize itself or the power-down/sleep/stand-by time is too short.
I think this is caused by a very aggressive low power management that only allow the drive to be up and running only when data need to be accessed.
I would look at extending or disabling the feature altogether.
One thing I can't make sense of it, is the Windows like jingle which can be heard prior to the drive shutting down. Seems to me a SMB connection is dropped.
Hey,
Thanks a lot for your answer.
Now, i know what's next on m'y shopping list !
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
If you can post the output of smartctl -x /dev/sda and smartctl -x dev/sdb, that would show the head loading count and any other possible issues that are physically wrong with the drives. But the drive noises I suspect are normal for this drive.

When you post this output, take note of the head loading value. Check the data again after you hear the drive make that noise and see if the count has gone up. But yes, the shopping list needs 3 hard drives added. :wink:
 

Zareon

Dabbler
Joined
Jan 22, 2024
Messages
17
If you can post the output of smartctl -x /dev/sda and smartctl -x dev/sdb, that would show the head loading count and any other possible issues that are physically wrong with the drives. But the drive noises I suspect are normal for this drive.

When you post this output, take note of the head loading value. Check the data again after you hear the drive make that noise and see if the count has gone up. But yes, the shopping list needs 3 hard drives added. :wink:
Hello, sorry for my late response !
for /dev/sda : smartctl sda.txt
for /dev/sdb : smartctl sdb.txt

But, actually, theses one are my ssd's. I'm pretty sure that they are not the one that makes the sound aha

so, for /dev/sdd (idle) : smartctl sdd (idle).txt
and for the same one, but with the sound : smartctl sdd (not idle).txt

And for /dev/sdc (idle) : smartctl sdc (idle).txt
and with sound : smartctl sdc (not idle).txt

I'm not really sure what i need to check on these information. I checked, but that seems relatively the same values between two outputs.

Thank you for you help !
I'm gonna buy some disks aha

EDIT : Edited to remove pastebin links for .txt files
 

Attachments

  • smartctl sda.txt
    14.7 KB · Views: 21
  • smartctl sdb.txt
    16.4 KB · Views: 28
  • smartctl sdd (idle).txt
    14.6 KB · Views: 23
  • smartctl sdd (not idle).txt
    14.7 KB · Views: 23
  • smartctl sdc (idle).txt
    14.1 KB · Views: 21
  • smartctl sdc (not idle).txt
    14.1 KB · Views: 25
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Arg, pastebin again. Not again by you. Please post the data into our forum. In 6 months the pastebin could be gone and those links don't work. And there are so many damn popups on pastebin, drives me nuts.

Drive sda:
4 Start_Stop_Count -O--CK 078 078 020 - 22651
193 Load_Cycle_Count -O--CK 087 087 000 - 26298
195 Hardware_ECC_Recovered -O-RC- 081 064 000 - 115272675

0x03 0x008 4 5921 --- Spindle Motor Power-on Hours

This tells me that the drive is spinning down and up a lot. You only have just over 6000 hours on it. And the ECC Recovery, while I'm not sure about SMR drives, this would be a huge warning flag for a CMR drive.

Drive sdb looks fine.

Drive sdc is a SSD, I doubt any noise is coming from it.

drive sdd looks the same as sda, same items I listed above.

I suspect your noise is probably the constant spinning up and down of your drives.
 

Zareon

Dabbler
Joined
Jan 22, 2024
Messages
17
Arg, pastebin again. Not again by you. Please post the data into our forum. In 6 months the pastebin could be gone and those links don't work. And there are so many damn popups on pastebin, drives me nuts.
Arf, sorry ! I will migrate them when I'm on my computer !

This tells me that the drive is spinning down and up a lot. You only have just over 6000 hours on it. And the ECC Recovery, while I'm not sure about SMR drives, this would be a huge warning flag for a CMR drive.
This is one of my ssd's. On this pool (mirror pool), I'm running somes services as Home Assistant, Crafty4, and some others services that running all time. I dont know if that could explain these values.
But for sdd, I don't have so many things that could use it so much..

So, all in all, my "only" viable solution is to change all of my HDD, to CMR ones ?
But, this will not help my ssd (drive sda)?

Thanks a lot for your help and your patience
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
This is one of my ssd's.
Nope, the data you provided is a 7200 RPM SMR drive. You provided data for two HDDs and two SSDs. The HDDs are spinning up and down like crazy.

The math for sda: Power On Hours = 6073 hours, Spindle Motor Power On Hours = 5921. This leaves 152 hours the drive was not spinning. Now the Start_Stop_Count = 22651 which is a lot. This means that if we average this data out (which is not very scientific but just an example) then the drive spins down every about every 15 minutes ( I think it's actually 16 minutes), and then sleeps for a few seconds (whatever .00671 hours results in, .017 = 1 minute) and spins up again. I'm fairly certain that is what you are hearing. Also this does not take into account any data being written as that would not allow the drive to sleep so the sleeping every 15 minutes is likely significantly smaller given the amount of data you have written and read for this drive. Conveyance tests are only 2 minutes but if you were to run routine SMART Long/Extended tests (197 minutes) that too would skew the single data point results.

The scientific way is to pull a SMART report using smartctl -a /dev/sda at a specific time, wait exactly 1 hour (or any increments of a full 60 minutes is preferred however your drive actually records minutes and seconds so this is not that critical, it would be for a drive which records hours only) since the drive only records power on hours in and grab another report. Now we can compare the data.

In actuality you already have one sample posted, we just need another one posted for sda and the data can be compared. Why we can't use just one data point is because we are making assumptions that the drive has been doing that always, and maybe it was but it also could have started when you migrated to SCALE which is why we take the second data point.
 

Zareon

Dabbler
Joined
Jan 22, 2024
Messages
17
Nope, the data you provided is a 7200 RPM SMR drive. You provided data for two HDDs and two SSDs. The HDDs are spinning up and down like crazy.
Accordingly to the screenshots that i posted higher in the thread, i was thinking that it is one of my ssds. Maybe i just reversed the two. I can't check now because i'm not at home.

Same for the second data point, i can't check that now. But, how can you say if he's doing that before the update ? It's to late now ?
 

Zareon

Dabbler
Joined
Jan 22, 2024
Messages
17
Re,
Nope, the data you provided is a 7200 RPM SMR drive. You provided data for two HDDs and two SSDs. The HDDs are spinning up and down like crazy.
I'm back, and indeed, I confirm that I was blind! This is indeed my HDD (Which couldn't be wrong, given what you saw in the log.)

The scientific way is to pull a SMART report using smartctl -a /dev/sda at a specific time, wait exactly 1 hour (or any increments of a full 60 minutes is preferred however your drive actually records minutes and seconds so this is not that critical, it would be for a drive which records hours only) since the drive only records power on hours in and grab another report. Now we can compare the data.
Otherwise, I did a smartctl -a /dev/sda, which I attach in my first file, and I waited an hour before doing it a second time, attached in my second file.

I hope this helps to understand the exact problem.
In the meantime, I've ordered 2 CMR disks. (I'll still have one to buy, but that'll have to wait a bit, for financial reasons).

Thanks again to everyone who participated in this thread!
 

Attachments

  • smartctl(firstOne).txt
    7.6 KB · Views: 29
  • smartctl(secondOne).txt
    7.4 KB · Views: 25

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Thanks for posting.

All the data we have now is listed below for drive "sda":

Power_On_Hours = 6100h+51m
Start_Stop_Count = 24186

Power_On_Hours = 6101h+56m
Start_Stop_Count = 24250

The results: Over 65 minutes the drive spin down and back up 64 times. That is at a rate of one spinup every .9846 seconds (slightly faster than once a second). ~1 minute. That is crazy!

Your drive wants to sleep. I did a quick internet search and there are complaints about Barracuda drives having this property in many systems but not all.

Question Time:
1) What version of SCALE are you running?
2) In the TrueNAS GUI, take a screen shot of the disk drive power management (I want all the data for the drive). GUI -> Storage (left side of screen) -> Disks (right side of screen) -> locate drive "sda" and click on the down arrow (right side of screen). Snap a shot of all that data to the left. I want to know if you have some power management setup issue.

EDIT: Updated as I apparently can't do simple (and obvious) math while tired.
 
Last edited:

Zareon

Dabbler
Joined
Jan 22, 2024
Messages
17
The results: Over 65 minutes the drive spin down and back up 64 times. That is at a rate of one spinup every .9846 seconds (slightly faster than once a second). That is crazy!
That's a lot of times !
1) What version of SCALE are you running?
I'm currently on this version : TrueNAS-SCALE-23.10.1.1

2) In the TrueNAS GUI, take a screen shot of the disk drive power management (I want all the data for the drive). GUI -> Storage (left side of screen) -> Disks (right side of screen) -> locate drive "sda" and click on the down arrow (right side of screen). Snap a shot of all that data to the left. I want to know if you have some power management setup issue.
1707266725505.png

And that's the data of my disk :)

Thanks a lot for your help, really
 
Top