prevent frequent reads from waking up HDDs

Hafnernuss

Dabbler
Joined
Nov 9, 2020
Messages
14
Today, I got up, only to find that my HDDs have been awake longer than me. The strange thing: only three of them, the other five were still sleeping.
A check in the graphs confirms that:

1664516579442.png


Especially the read behaviour on the first one is strage. Since the other disk are still asleep, this tells me that this was not caused by a normal "file access", since then all eight drives would have spun up, right?
SMART is disable for those 8 drives, and it can't be any scrubbing or syncing task.

I have no ideas, is there any way to check which processes causesd this? Another hint, is, that the CPU load at this time peaked quit a bit, sitting at 100% for almost 4 minutes...
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Today, I got up, only to find that my HDDs have been awake longer than me. The strange thing: only three of them, the other five were still sleeping.
Unfortunately we do not have the entire picture of what is going on with your system. What else could be running? And those three drives, are they all part of the same vdev? Separate vdevs? It's a guessing game on our part without data.

Have you looked at any of the log data less /var/log/messages to see if there is something happening during that time?

Do you have jails/VM's that could be doing something?
 

Hafnernuss

Dabbler
Joined
Nov 9, 2020
Messages
14
Hmm yes, sorry for that, maybe a little more info would have been helpful.
All eight HDDs are part of the same z2 pool, and thats what makes me curious. In my logic, there is no "normal" condition, where some drives get accessed and some not... In the same pool.

I have some apps, but the applications pool is on a single ssd, and none of them have access to the hdd pool. System pool is also not located on the hdds. (If so, all HDDs would report the same r/w activity, at least, thats what I saw back when this was the case...)

No VMs whatsoever, SMART turned off (for the eight hdds), only other active service is SMB, but all possible clients were powered off completly.

I am not at home, but I definitly will look into those logs tomorrow and post my findings, thanks for the suggestion.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
The logs are a long shot but worth at least looking at.

but all possible clients were powered off completly.
This makes me wonder if everything is really not accessing the NAS. The only way to prove it is to find out if you can find a pattern of when the drives are powering on, and then disconnect the network connectivity and see if if stops. So if some of the drives spin up at 0713 every Thursday morning, then you should disconnect the NAS Wednesday night and sometime after the expected event, check to see if the drives spun up or not.

Troubleshooting a problem like this could take considerable time unless you find a smoking gun. It's obvious there was a small write operation that occurred.

So I like to think outside the box because sometimes it actually works. What hard drives are you using? Are the the same model for all of them? Are the drives affected plugged into a different controller, different cable set. I can't figure that out myself because you have not provided enough data about your system and it's physical and TrueNAS configuration. And that is okay because in reality you should be exploring all possibilities and troubleshooting this if it bothers you enough.

I hope the log file provides you some indication.

In my logic, there is no "normal" condition, where some drives get accessed and some not... In the same pool.
Actually I'm pretty sure I've seen someone post about this before, years ago. But I could be wrong or just not remembering it correctly. I mean, it did happen to you so there must be a valid reason. It may have nothing to do with TrueNAS, it could be your hardware, or the underlying OS, but I don't recall you stating if you are running Core or Scale (a very important distinction since Core is based on FreeBSD and Scale is based on Debian).

I'm actually curious what the cause is too.

Well time to grill some pork chops! Dinner is going to be good tonight.
 

Hafnernuss

Dabbler
Joined
Nov 9, 2020
Messages
14
Well time to grill some pork chops! Dinner is going to be good tonight.
Hah, great idea for tonights dinner, thanks ;)

Same game this morning, a short r/w peak at 07:15, same three drives, same heavy cpu load.

The facts:
TrueNAS SCALE 22.02.3
8x Seagate Exos X16 12TB SAS on a Broadcom SAS 3008.
Generic Asus Mainboard with an i5-4k cpu
some generic DDR3 sticks
An nvidia P400 gpu

The logs seem to be clean, apart from kube spamming logs with this:
1664605657046.png


But there is already a thread for that. But since I am all out of ideas, I will simply shutdown all apps tonight and see how that goes...

 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Something to note, if you haven't already rebooted the machine, if you do then the problem may go away. But then if it does go away, you may never know what caused it in the first place.

Good luck troubleshooting the issue. I look forward to hearing if you are able to identify the cause.
 

Hafnernuss

Dabbler
Joined
Nov 9, 2020
Messages
14
Some update from me:

It appears that k3s is indeed responsible for waking up *some* hdds.
I have observed, that it's always the same three disks. k3s seems to read from them, every 10-20 minutes (then for one to three minutes every 20 seconds or so).

All apps are completly stopped, the command that causes the read is k3s server. Unfortunately, it seems very hard to tell which file is actually written to...
 

mindbug

Cadet
Joined
Jan 25, 2023
Messages
1
Another hint, is, that the CPU load at this time peaked quit a bit, sitting at 100% for almost 4 minutes...
I have observed, that it's always the same three disks
Do you have swap enabled?

By default truenas creates 2 GB swap partition on every data drive (which is really stupid, if you ask me), and I have observed that it randomly splits these partitions in groups of 3 drives each, and makes RAID 5 arrays of them, or something.
I also noticed that only one of these arrays was used on my machine, severely degrading performance of 3 particular drives while other drives were unaffected.
With 100% CPU utilization your system probably runs some tasks that are rarely used, so they are stored in swap.

You can run the following command to check: lsblk -o +kname
You will see which drives, if any, are part of the SWAP, and how they are united.

To disable swap in truenas you can do the following:
1. Disable swap creation in the future: midclt call system.advanced.update '{"swapondrive": 0}' (but this will not affect already existing swap partitions)
2. Export your current pool, and delete swap partitions manually.
If you simply run swapoff -a the change will be temporary, truenas will recreate the swap on next reboot.
 
Top