Aggressive disk activity each 30-60s

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Well the problem looks to be a configuration error (I hope). Edit your Adv. Power Management and change the "1" which spins the drive down all the time, change it to either "Disabled" or "Level 128". Either should fix your problem. As for the amount of power saved, with Level 128, not much but if you had a laptop, it would make a difference.

Do this for all your drives. If the problem doesn't stop immediately, power down your system and power back on. Report your results.
 

Zareon

Dabbler
Joined
Jan 22, 2024
Messages
17
Well the problem looks to be a configuration error (I hope). Edit your Adv. Power Management and change the "1" which spins the drive down all the time, change it to either "Disabled" or "Level 128". Either should fix your problem. As for the amount of power saved, with Level 128, not much but if you had a laptop, it would make a difference.

Do this for all your drives. If the problem doesn't stop immediately, power down your system and power back on. Report your results.
OK, that seems to fix my problem of big bursts. But that let me with my first problem of small burst of writing / read. I'm gonna make the same thing as Today with one hour of interval, to see if there is some interesting informations to retrieves.
It's very currently late here, so I'm gonna sleep and make these logs tomorrow.

And thank you so much, I'm (finally) gonna work in a quieter environnement aha
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
What setting did you use? If it was not "Disabled" then I would set it to "Disabled" to see if that fixes it. Disabled is the default setting for TrueNAS.

Enjoy the quiet.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
The results: Over 65 minutes the drive spin down and back up 64 times. That is at a rate of one spinup every .9846 seconds (slightly faster than once a second). That is crazy!
Joe, isn't that 1 per minute, not one per second? Matching the OP's reported frequency?
 

Zareon

Dabbler
Joined
Jan 22, 2024
Messages
17
What setting did you use? If it was not "Disabled" then I would set it to "Disabled" to see if that fixes it. Disabled is the default setting for TrueNAS.
I used disabled !
I'm gonna post in few hours the results for my new tests :)

Joe, isn't that 1 per minute, not one per second? Matching the OP's reported frequency?
Accordingly to the first part of his sentence, I think that's .9846 minutes, but, actually, I think it's 1.0156 minutes. But that doesn't change a lot aha (only inversed 65 and 64)
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Joe, isn't that 1 per minute, not one per second? Matching the OP's reported frequency?
I was tired at the time, but looking at it after sleeping, absolutely. Once per minute. It looks so obvious right now :wink:

Accordingly to the first part of his sentence, I think that's .9846 minutes, but, actually, I think it's 1.0156 minutes. But that doesn't change a lot aha (only inversed 65 and 64)
Same comment.

I'm gonna post in few hours the results for my new tests :)
Hope it looks good. I depart for work in an hour from this posting, hopefully the results are in and all looks 100% better. The Start_Stop_Count hopefully does not increase over that hour period.
 

Zareon

Dabbler
Joined
Jan 22, 2024
Messages
17
Ok, so :
The first log said 24555 for Start_Stop_Count.
The second one (is coming in one hour on the same message. Missclick the send button, woops)
 

Attachments

  • smartctl (firstOne).txt
    7.6 KB · Views: 24

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
You had me hanging, LOL. Okay, well hopefully the count is the same but if the drive spins down a few times, it's not the end of the world, you are replacing them.

Last thing, when your new drives come in, I highly recommend you setup SMART tests for a daily Short test and a weekly Long/Extended test. Maybe setup the short test on all drives to start a 1AM everyday, and the long test for all drives to start at 1:15AM on Sunday or Monday (once a week).

Cheers,
-Joe
 

Zareon

Dabbler
Joined
Jan 22, 2024
Messages
17
You had me hanging, LOL. Okay, well hopefully the count is the same but if the drive spins down a few times, it's not the end of the world, you are replacing them.

Last thing, when your new drives come in, I highly recommend you setup SMART tests for a daily Short test and a weekly Long/Extended test. Maybe setup the short test on all drives to start a 1AM everyday, and the long test for all drives to start at 1:15AM on Sunday or Monday (once a week).

Cheers,
-Joe
AAAAAND, that's it, i have the answer...
****drums*****
24555 on the second take !
I still have a fairly recurrent read/write noise (maybe every 30-45s), much lower this time though. It was already there before I upgraded to Cobia.

But maybe it's due to the SMR disks, maybe it'll be sorted out with the change to CMRs!

For S.M.A.R.T. tests, I have a short one once a day, i'll setup an extended one once a week :)
(But they didn't show up any errors here)

Once again (for the 25th time), a very big thank you. At last I can work more peacefully, and that's quite pleasant! It's hard to concentrate with a lawnmower next to your ears!
 

Attachments

  • smartctl (secondOne).txt
    7.6 KB · Views: 22

Zareon

Dabbler
Joined
Jan 22, 2024
Messages
17
My two new disks are now up and ready !
But my pool is always tagged as degraded on the dashboard, but I don't see anything when i go to the pool.
1707317785637.png
1707317801593.png
1707317813233.png

Do you have any ideas why ? I ran a manual Short S.M.A.R.T. test after I installed new disks, but always seems to be degraded :(
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
24555 on the second take !
I still have a fairly recurrent read/write noise (maybe every 30-45s)
While I don't know this for certain, I suspect it is the SMR drive cleaning house. When the drive must change data on an inner track, it needs to change at least 3 tracks of data minimum, but odds are it much more. This is why SMR drives are great for archiving, write once, read many, but not for lots of writing. The drive will store the new data on the drive and when it has the opportunity, it will reorganize and it's a slow process, even slower if the drive has a lot of data. So, if you have the System Dataset on the HDD pool, and this file is written to every 5 minutes (I'm not sure how often for SCALE), then the drive is constantly rewriting data. Anyway I could be completely wrong, it could be something else going on with the drive.

A test if you desire.
1) Power down the computer.
2) Disconnect the Data Cables to the drives, leaving only the power cable.
3) Power up.
4) Listen, is there still that same noise? If yes, if you can leave your system in that state for a few hours to let it clean up, the issue might go away. But the problem is likely to return once you reconnect the data cables and start TrueNAS again.
5) Power off.
6) Reconnect the data cables.
7) Power on and you are done.

For S.M.A.R.T. tests, I have a short one once a day, i'll setup an extended one once a week :)
(But they didn't show up any errors here)
You are running a Conveyance Test, not a Short Test. Not the same thing although close. As for lack of errors, it means the drives do not recognize any physical errors with the hard drive. A good thing.

But my pool is always tagged as degraded on the dashboard, but I don't see anything when i go to the pool.
Odds are you have some ZFS errors. Run the commands:
1) zpool status -v Odds are you will see some errors.
2) If you have errors then run zpool scrub Hector_v2 -w and wait for the scrub to complete. The command line will be unusable until the scrub is completed. Remove the "-w" if you want the command prompt back immediately and scrub to continue in the background. I'm saying to use the "-w" so you will know when the scrub is complete, the prompt will return.
3) Run another status check (Step 1).
4) Do you still have errors? If the errors are READ/WRITE/CKSUM value not zero only, no files listed as corrupt and must be deleted then go to step 5, otherwise you will need to delete those files first. Post the output of the status command if you have any questions about what you are doing.
5) We should be able to clear the errors now by running zpool clear Hector_v2 and then perform step 1 again. All should be good. If not, post the results of the above steps.

Let's say we clear the errors. Odds are these will return as long as you are using the SMR drives and writing data.

Why does this happen? This means is the drive is asked for data but if fails to return it fast enough then an error is created. It does not mean your data is corrupt in this situation.

Please understand that I generalized some of the stuff I said above, enough that a person can understand what is basically going on.

Best of luck,
-Joe
 

Zareon

Dabbler
Joined
Jan 22, 2024
Messages
17
1) Power down the computer.
2) Disconnect the Data Cables to the drives, leaving only the power cable.
3) Power up.
4) Listen, is there still that same noise? If yes, if you can leave your system in that state for a few hours to let it clean up, the issue might go away. But the problem is likely to return once you reconnect the data cables and start TrueNAS again.
5) Power off.
6) Reconnect the data cables.
7) Power on and you are done.
Maybe i'll try this if I have the time, but for now, I think i'll let this as it is, until i buy a new disk to replace the last SMR in my pool !

1) zpool status -v Odds are you will see some errors.
2) If you have errors then run zpool scrub Hector_v2 -w and wait for the scrub to complete. The command line will be unusable until the scrub is completed. Remove the "-w" if you want the command prompt back immediately and scrub to continue in the background. I'm saying to use the "-w" so you will know when the scrub is complete, the prompt will return.
3) Run another status check (Step 1).
4) Do you still have errors? If the errors are READ/WRITE/CKSUM value not zero only, no files listed as corrupt and must be deleted then go to step 5, otherwise you will need to delete those files first. Post the output of the status command if you have any questions about what you are doing.
5) We should be able to clear the errors now by running zpool clear Hector_v2 and then perform step 1 again. All should be good. If not, post the results of the above steps.
I was going to TrueNAS to do what you said here, and I saw that the error had gone. Maybe it just needed to quietly do a few scans or something.

Thanks for everything, now that my main problem is solved, I'm going to leave the hardware alone for a while until I change my next disk.

In a few days, I'll be able to upgrade the CPU, Motherboard and RAM, which will be great!

Thank you all for your help, this forum is really superb.
 
Top