BOOT-Pool continuos writing [HELP]

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
  • We had this choice before, but now the choice is being removed. (It's becoming too "Apple-ly" where more and more flexibility is being removed).
That's one way of putting it :tongue:. On the other hand, maybe they did it to remove unexpected variables out of the equation. We do have this outstanding issue with Core plugins that ixSystems still refuses to remove that really should be removed so new installations don't keep using it only to find out that it's not supported.

I suppose those of us who are still on Core are lucky in two ways:
  1. We still have this choice in the settings.
  2. We don't have K3s that loves constantly writing data.
I run Enterprise SSD's, so I don't really care where the syslogs get written, but I do highly appreciate the lack of k3s on my system though.
 

sfatula

Guru
Joined
Jul 5, 2022
Messages
608
I linked to another discussion. (See my edited post.) I don't agree with the decision to remove the ability to choose where to write your syslog. :confused:

Sorry to say, but looks like this just how it's going to be for SCALE. I really hope it doesn't make its way to Core.
I don't know that I agree either, at least make it configurable. I get the idea though for debugging. But yes, it was an intentional change by IX for Cobia.

If I were you, I wouldn't keep writing this:

"I really hope it doesn't make its way to Core.".

Someone from IX might read it! :grin:
 
Joined
Oct 22, 2019
Messages
3,641
Someone from IX might read it! :grin:
Yup... :oops:


I run Enterprise SSD's, so I don't really care where the syslogs get written, but I do highly appreciate the lack of k3s on my system though.
It's not just that one reason. I listed four reasons... for a reason. Hey. That's kinda cool. "I got four reasons, for a reason! Throw your hands in the air! My name's M.C. Winnie!"

Sorry, where was I? Right.

So users that wish to spare ports on their motherboard (to be used for their storage pool drives) may still benefit from just using mirrored USB sticks for their boot-pool, and place their syslogs and System Dataset on a storage pool (whether these are HDDs or an SSD storage pool that houses their Apps, VMs, logs, system dataset, etc...)

What's the problem? They made the choice to use USB sticks for their boot-pool, and they are decreasing the writes and overall I/O by placing their syslogs and System Dataset elsewhere. Worst-case scenario, their boot device dies. Okay, reinstall and reupload the config. Done. (Or if it's a mirrored boot-pool, just replace the dead stick and continue like normal.) This is a valid use-case. It's kind of odd to tell them "Nope. Sorry. We're going to make it so your boot device undergoes a lot of I/O. Don't like it? Well, you'll just have to sacrifice two extra SATA ports for your boot-pool. Hope you got enough ports and don't plan to expand your storage capacity lol."

And what's with this pretense about system stability? Can't have the syslogs on a storage pool, because if something happens to the storage pool, everything crumbles under its own weight? If that's the case, then don't implement this half-way: go ahead and enforce the System Dataset to reside on the boot-pool too, since it's critical to the live system. The same rationale can be said: "If your System Dataset is on a storage pool, but something happens to your storage pool... your system could violently crash! We must enforce the System Dataset to reside on the boot-pool only."

And why this mandatory enforcement of the syslogs? Why not make it default, and if the user wants to store their syslog on the System Dataset (to avoid using their boot-pool), just hit them with a big red disclaimer and tell them how dangerous it is. Let them decide.

I'm seeing this trend in development of "Can't do this, must be like that" not for reasons of performance nor did anyone ask for it (as far as I'm aware), but because it's easier for iXsystems to maintain. There's a lot of things that can be changed or removed if you want a product that's easier to maintain...
 
Last edited:

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
Can confirm on a fresh install (configured from scratch) of TrueNAS-SCALE-23.10.0.1. I'm certain I didn't have this issue on bluefin. Hadn't had time to investigate. No apps installed.

It's a mean of 230 KiB for the boot pool and 2 MiB for the pool containing the system data set (and some VMs, which I'm almost certain don't continuously write).

I pulled the debug file and syslog doesn't seem to write a lot of data. It's just after midnight here, last record is from 15:00. Yet the writing doesn't stop at all.

Code:
Nov 20 13:00:05 truenas systemd[1]: sysstat-collect.service: Deactivated successfully.
Nov 20 13:00:05 truenas systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Nov 20 13:09:55 truenas smartd[65370]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 115 to 114
Nov 20 13:09:55 truenas smartd[65370]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 115 to 114
Nov 20 13:09:55 truenas smartd[65370]: Device: /dev/sdb [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 68
Nov 20 13:09:55 truenas smartd[65370]: Device: /dev/sdb [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 68
Nov 20 13:09:55 truenas smartd[65370]: Device: /dev/sdj [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 115 to 114
Nov 20 13:09:55 truenas smartd[65370]: Device: /dev/sdj [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 115 to 114
Nov 20 13:09:55 truenas smartd[65370]: Device: /dev/sdg [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 66
Nov 20 13:09:55 truenas smartd[65370]: Device: /dev/sdg [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 66
Nov 20 13:10:12 truenas systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Nov 20 13:10:12 truenas systemd[1]: sysstat-collect.service: Deactivated successfully.
Nov 20 13:10:12 truenas systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Nov 20 13:17:01 truenas CRON[1241970]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Nov 20 13:17:01 truenas nscd[1241971]: 1241971 monitoring file `/etc/hosts` (1)
Nov 20 13:17:01 truenas nscd[1241971]: 1241971 monitoring directory `/etc` (2)
Nov 20 13:17:01 truenas nscd[1241971]: 1241971 monitoring file `/etc/resolv.conf` (3)
Nov 20 13:17:01 truenas nscd[1241971]: 1241971 monitoring directory `/etc` (2)
Nov 20 13:17:01 truenas nscd[1241971]: 1241971 monitoring file `/etc/nsswitch.conf` (4)
Nov 20 13:17:01 truenas nscd[1241971]: 1241971 monitoring directory `/etc` (2)
Nov 20 13:20:05 truenas systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Nov 20 13:20:05 truenas systemd[1]: sysstat-collect.service: Deactivated successfully.
Nov 20 13:20:05 truenas systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Nov 20 13:30:05 truenas systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Nov 20 13:30:05 truenas systemd[1]: sysstat-collect.service: Deactivated successfully.
Nov 20 13:30:05 truenas systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Nov 20 13:39:56 truenas smartd[65370]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 114 to 113
Nov 20 13:39:56 truenas smartd[65370]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 114 to 113
Nov 20 13:40:12 truenas systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Nov 20 13:40:12 truenas systemd[1]: sysstat-collect.service: Deactivated successfully.
Nov 20 13:40:12 truenas systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Nov 20 13:50:05 truenas systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Nov 20 13:50:05 truenas systemd[1]: sysstat-collect.service: Deactivated successfully.
Nov 20 13:50:05 truenas systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Nov 20 14:00:01 truenas CRON[1245693]: (root) CMD (PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" midclt call cloudsync.sync 1 > /dev/null 2> /dev/null)
Nov 20 14:00:05 truenas systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Nov 20 14:00:05 truenas systemd[1]: sysstat-collect.service: Deactivated successfully.
Nov 20 14:00:05 truenas systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Nov 20 14:09:56 truenas smartd[65370]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 113 to 114
Nov 20 14:09:56 truenas smartd[65370]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 113 to 114
Nov 20 14:09:56 truenas smartd[65370]: Device: /dev/sdg [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 66 to 67
Nov 20 14:09:56 truenas smartd[65370]: Device: /dev/sdg [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 66 to 67
Nov 20 14:10:12 truenas systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Nov 20 14:10:12 truenas systemd[1]: sysstat-collect.service: Deactivated successfully.
Nov 20 14:10:12 truenas systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Nov 20 14:17:01 truenas CRON[1247136]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Nov 20 14:17:01 truenas nscd[1247137]: 1247137 monitoring file `/etc/hosts` (1)
Nov 20 14:17:01 truenas nscd[1247137]: 1247137 monitoring directory `/etc` (2)
Nov 20 14:17:01 truenas nscd[1247137]: 1247137 monitoring file `/etc/resolv.conf` (3)
Nov 20 14:17:01 truenas nscd[1247137]: 1247137 monitoring directory `/etc` (2)
Nov 20 14:17:01 truenas nscd[1247137]: 1247137 monitoring file `/etc/nsswitch.conf` (4)
Nov 20 14:17:01 truenas nscd[1247137]: 1247137 monitoring directory `/etc` (2)
Nov 20 14:20:05 truenas systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Nov 20 14:20:05 truenas systemd[1]: sysstat-collect.service: Deactivated successfully.
Nov 20 14:20:05 truenas systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Nov 20 14:30:03 truenas systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Nov 20 14:30:03 truenas systemd[1]: sysstat-collect.service: Deactivated successfully.
Nov 20 14:30:03 truenas systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Nov 20 14:37:05 truenas systemd[1]: Starting apt-daily.service - Daily apt download activities...
Nov 20 14:37:05 truenas apt.systemd.daily[1248894]: /usr/lib/apt/apt.systemd.daily: 325: apt-config: Permission denied
Nov 20 14:37:05 truenas systemd[1]: apt-daily.service: Deactivated successfully.
Nov 20 14:37:05 truenas systemd[1]: Finished apt-daily.service - Daily apt download activities.
Nov 20 14:39:55 truenas smartd[65370]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 114 to 115
Nov 20 14:39:55 truenas smartd[65370]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 114 to 115
Nov 20 14:39:56 truenas smartd[65370]: Device: /dev/sdj [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 114 to 115
Nov 20 14:39:56 truenas smartd[65370]: Device: /dev/sdj [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 114 to 115
Nov 20 14:40:12 truenas systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Nov 20 14:40:12 truenas systemd[1]: sysstat-collect.service: Deactivated successfully.
Nov 20 14:40:12 truenas systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Nov 20 14:50:05 truenas systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Nov 20 14:50:05 truenas systemd[1]: sysstat-collect.service: Deactivated successfully.
Nov 20 14:50:05 truenas systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Nov 20 15:00:01 truenas CRON[1251240]: (root) CMD (PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" midclt call cloudsync.sync 6 > /dev/null 2> /dev/null)
Nov 20 15:00:01 truenas CRON[1251241]: (root) CMD (PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" midclt call cloudsync.sync 5 > /dev/null 2> /dev/null)
Nov 20 15:00:01 truenas CRON[1251242]: (root) CMD (PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" midclt call cloudsync.sync 4 > /dev/null 2> /dev/null)
Nov 20 15:00:12 truenas systemd[1]: Starting dpkg-db-backup.service - Daily dpkg database backup service...
Nov 20 15:00:12 truenas systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Nov 20 15:00:12 truenas systemd[1]: Starting logrotate.service - Rotate log files...
Nov 20 15:00:12 truenas systemd[1]: sysstat-collect.service: Deactivated successfully.
Nov 20 15:00:12 truenas systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Nov 20 15:00:12 truenas systemd[1]: dpkg-db-backup.service: Deactivated successfully.
Nov 20 15:00:12 truenas systemd[1]: Finished dpkg-db-backup.service - Daily dpkg database backup service.
Nov 20 15:00:12 truenas systemd[1]: logrotate.service: Deactivated successfully.
Nov 20 15:00:12 truenas systemd[1]: Finished logrotate.service - Rotate log files.
Nov 20 15:06:56 truenas middlewared[1252049]: [2023/11/21 00:06:56] ([1;32mINFO[1;m) ZFSSnapshot.do_create():205 - Snapshot taken: boot-pool/ROOT/Initial-Install@for-debug
Nov 20 15:06:56 truenas middlewared[1251991]: [2023/11/21 00:06:56] ([1;32mINFO[1;m) ZFSSnapshot.clone():42 - Cloned snapshot boot-pool/ROOT/Initial-Install@for-debug to dataset boot-pool/for-debug
Nov 20 15:06:56 truenas systemd[1]: tmp-tmpy1c4yn_g.mount: Deactivated successfully.


I hoped by the time 23.10.1 this "bug" was fixed. I just skimped over the thread and the one @winnielinnie linked, is this the expected behavior from now on? Both the VM pool and boot pool are mirrors and 2.5 SSDs [no noise, which is most important to me], so I am not *too* concerned, but still..
I just realized I'm writing 170 Gb per day, now I am concerned.
You can add the columns "M_SWAP" and "IO_WRITE_RATE". Then you can sort by those values from highest to lowest to find the culprit.
edit: Oh you have to run it with elevated priviliges:

1700523064198.png


1700523125357.png
 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Nevermind. Your SSD is clearly being slammed by wayyyyyyyyyy too many writes. Nearly 500 GB total written already? For a boot device without a System Dataset...
That was my perplexity as well. After basic troubleshooting I immediately though of a bug, but SCALE not really being my thing I suggested asking here first. Looks some kind of writes amplification to me.

but it's definitely not advisable for a long term solution.
Here I kinda disagree since moving the dataset out of the boot pool basically removes the main cause of concern; granted, USB being there could be and issue, but with a quality stick I don't really see any issues, especially when using an internal port.
 
Last edited:

sfatula

Guru
Joined
Jul 5, 2022
Messages
608
I hoped by the time 23.10.1 this "bug" was fixed. I just skimped over the thread and the one @winnielinnie linked, is this the expected behavior from now on? Both the VM pool and boot pool are mirrors and 2.5 SSDs [no noise, which is most important to me], so I am not *too* concerned, but still..
I just realized I'm writing 170 Gb per day, now I am concerned
Writing to the boot pool is not a bug, read the whole thread esp the parts by IX Systems, they tell you why they made the change.

170GB is a *lot* of data, there is either some log spam happening maybe due to hardware, or, another issue there. But it should be writing to the boot pool but no way that much.

You'll need to track down what is being written, surely your boot pool would fill with 170GB/day? Doesn't seem right. Mine would have filled in an hour or so.
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
Writing to the boot pool is not a bug, read the whole thread esp the parts by IX Systems, they tell you why they made the change.
Will do!
No I didn't mean the writes in general but the amount.
170GB is a *lot* of data, there is either some log spam happening maybe due to hardware, or, another issue there. But it should be writing to the boot pool but no way that much.
To be clear, the writes to the boot pool are somewhat tolerable:
1700549339591.png

The systemdataset worries me more:
1700549377365.png


No the data does not seem to be persistently written, otherwise my pools would be full already. However when I first posted here 7 hours ago, my used space on the dataset pool was: 263.41 GiB and now it's 263.98 GiB.
I wouldn't know that any VM is writing huge amounts of data during normal operation / idle over night.
You'll need to track down what is being written, surely your boot pool would fill with 170GB/day? Doesn't seem right. Mine would have filled in an hour or so.

From htop it seems to be that these processes seem to fill the dataset pool:
Code:
/usr/bin/qemu-system-x86_64 -name guest=5_pfsense,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-21-5_pfsense/master-key.aes"} -blockdev {"driver":"file","filename":"/usr1292163 libvirt-qe 

Although for example home assistant uses a non encrypted dataset, my first idea was that it is somehow related to pool encryption.
The pool is not encrypted, but most, not all, VM zvols are stored under an encrypted dataset.

Any idea which logs / where to check for log spam? Syslog didn't seem suspicios but that's on the boot pool anyway.

Here is the ouput of zfs list -r -t filesystem -o space
The dataset deimos is encrypted.
Code:
NAME                                                                AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
boot-pool                                                           96.3G  2.51G        0B     96K             0B      2.51G
boot-pool/ROOT                                                      96.3G  2.50G        0B     96K             0B      2.50G
boot-pool/ROOT/23.10.0.1                                            96.3G  2.50G     8.08M   2.49G             0B         0B
boot-pool/ROOT/Initial-Install                                      96.3G     8K        0B      8K             0B         0B
boot-pool/grub                                                      96.3G  8.21M        0B   8.21M             0B         0B
mars                                                                 184G   264G      288K     96K             0B       264G
mars/.system                                                         184G   552M     2.23M   2.92M             0B       547M
mars/.system/configs-ae32c386e13840b2bf9c0083275e7941                184G     1M       64K    960K             0B         0B
mars/.system/cores                                                  1024M    96K        0B     96K             0B         0B
mars/.system/ctdb_shared_vol                                         184G    96K        0B     96K             0B         0B
mars/.system/glusterd                                                184G   104K        0B    104K             0B         0B
mars/.system/netdata-ae32c386e13840b2bf9c0083275e7941                184G   545M      131M    414M             0B         0B
mars/.system/rrd-ae32c386e13840b2bf9c0083275e7941                    184G    96K        0B     96K             0B         0B
mars/.system/samba4                                                  184G   336K       88K    248K             0B         0B
mars/.system/services                                                184G    96K        0B     96K             0B         0B
mars/.system/webui                                                   184G    96K        0B     96K             0B         0B
mars/deimos                                                          184G   160G      392K    192K             0B       160G
mars/ix-applications                                                 184G  1.58G     1.57M    184K             0B      1.57G
mars/ix-applications/catalogs                                        184G   204M      126M   78.1M             0B         0B
mars/ix-applications/default_volumes                                 184G    96K        0B     96K             0B         0B
mars/ix-applications/docker                                          184G   699M     64.0M    635M             0B         0B
mars/ix-applications/k3s                                             184G   707M     51.4M    653M             0B      2.53M
mars/ix-applications/k3s/kubelet                                     184G  2.53M     2.28M    256K             0B         0B
mars/ix-applications/releases                                        184G  1.03M      192K     96K             0B       768K
mars/ix-applications/releases/prometheus                             184G   768K        0B    104K             0B       664K
mars/ix-applications/releases/prometheus/charts                      184G   224K        0B    224K             0B         0B
mars/ix-applications/releases/prometheus/volumes                     184G   440K        0B     96K             0B       344K
mars/ix-applications/releases/prometheus/volumes/ix_volumes          184G   344K        0B    104K             0B       240K
mars/ix-applications/releases/prometheus/volumes/ix_volumes/config   184G    96K        0B     96K             0B         0B
mars/ix-applications/releases/prometheus/volumes/ix_volumes/data     184G   144K       32K    112K             0B         0B


zfs list -r -t volume -o space
Code:
NAME                                      AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
mars/deimos/debian-pihole                  184G  5.86G     1.05G   4.81G             0B         0B
mars/deimos/nextcloud-main                 184G  9.78G     1.53G   8.25G             0B         0B
mars/deimos/pfSenseDisk                    184G  10.8G     4.76G   6.00G             0B         0B
mars/deimos/ubuntu_server-wwn8dg           259G   124G     13.2G   35.7G          75.4G         0B
mars/deimos/wg_vol                         184G  9.79G      330M   9.47G             0B         0B
mars/homeassistant                         184G  35.7G     19.2G   16.5G             0B         0B
mars/ubuntu_playground                     184G  15.2G     3.06G   12.1G             0B         0B

Not related, but I wouldn't be sad if I could get rid of USEDREFRESERV for ubuntu server. It's the only dataset not written sparsely.
 
Last edited:

ThEnGI

Contributor
Joined
Oct 14, 2023
Messages
140
I run Enterprise SSD's, so I don't really care where the syslogs get written, but I do highly appreciate the lack of k3s on my system though.
Sorry but it's not something to take into consideration, so let's write 1TB/h because I use 2*2TB 2500 TBW WDs red ssd and they last 6 months....
Not everyone can use enterprise-level SSDs on all pools, especially on boot which uses little or nothing in space. not everyone has large budgets at their disposal. I preferred to spend a little more on the HDD...
My fault for not having read all the changelogs carefully but relying on the feedback read on the forum (old version:frown: )

As I was saying, I have about 1GB/h of written data, it's a lot but more tolerable. Now I have activated kubernet, let's see how it behaves in the next few days (report every 24H)

I cant find the TBW of the SSD i use but compared to other of the same size it should be at least 40TBW, at the current rate it is 40,000 h. 24h*365 = 8760h so 4 year run time, which is also fine.
But the multi_report "wear level" is dropping fast, suggesting the end of the SSD within 365 days.
Kingston A400 480gb seem a good choiche, 160TBW = 160.000 h. It breaks first for something else :smile:

Taking into account this choice by IX, what is the correct procedure to replace a boot SSD? I mirror it and then remove the old one, is there a tutorial?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Taking into account this choice by IX, what is the correct procedure to replace a boot SSD? I mirror it and then remove the old one, is there a tutorial?
What you are experiencing is likely unusual behaviour, not something iX would consider standard (or accettabile): I suggest making a bug report.
For replacing it you just need to nave a config backup: install the version you want on the new drive, import pools, import config, done.
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
You'll need to track down what is being written

This is one of the commands (this time completely) I suspect beeing responsible for a lot of the writes.

Code:
Command of process 16927 - /usr/bin/qemu-system-x86_64 -name guest=1_homeassistant,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-2-1_homeassistant/master-key.aes"} -blockdev {"driver":"file","filename":"/usr/share/OVMF/OVMF_CODE.fd","node-name":"libvirt-pflash0-st...

/usr/bin/qemu-system-x86_64 -name guest=1_homeassistant,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-2-1_homeassistant/master-key.aes"} -blockdev
 {"driver":"file","filename":"/usr/share/OVMF/OVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"} -blockdev
 {"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/1_homeassistant_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"} -machine
 pc-i440fx-7.2,usb=off,dump-guest-core=off,memory-backend=pc.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format -accel kvm -cpu host,migratable=on,host-cache-info=on,l3-cache=off -m 2048 -object {"qom-type":"memory-backend-ram","id":"pc.ram","size":2147483648} -overcommit mem-lock=off -smp
 4,sockets=1,dies=1,cores=2,threads=2 -uuid be41d0ff-481a-4844-8daf-e6e107a3df32 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=35,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot strict=on -device {"driver":"nec-usb-xhci","id":"usb","bus":"pci.0","addr":"0x4"}
 -device {"driver":"virtio-serial-pci","id":"virtio-serial0","bus":"pci.0","addr":"0x5"} -blockdev {"driver":"host_device","filename":"/dev/zvol/mars/homeassistant","aio":"threads","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev
 {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"} -device {"driver":"virtio-blk-pci","bus":"pci.0","addr":"0x6","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1,"write-cache":"on"} -netdev
 {"type":"tap","fd":"37","vhost":true,"vhostfd":"39","id":"hostnet0"} -device {"driver":"virtio-net-pci","netdev":"hostnet0","id":"net0","mac":"00:a0:98:4e:6e:1e","bus":"pci.0","addr":"0x3"} -chardev pty,id=charserial0 -device {"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0} -chardev
 spicevmc,id=charchannel0,name=vdagent -device {"driver":"virtserialport","bus":"virtio-serial0.0","nr":1,"chardev":"charchannel0","id":"channel0","name":"com.redhat.spice.0"} -chardev socket,id=charchannel1,fd=34,server=on,wait=off -device
 {"driver":"virtserialport","bus":"virtio-serial0.0","nr":2,"chardev":"charchannel1","id":"channel1","name":"org.qemu.guest_agent.0"} -device {"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"4"} -audiodev {"id":"audio1","driver":"spice"} -spice port=5900,addr=0.0.0.0,seamless-migration=on -device
 {"driver":"qxl-vga","id":"video0","max_outputs":1,"ram_size":67108864,"vram_size":67108864,"vram64_size_mb":0,"vgamem_mb":16,"xres":1024,"yres":768,"bus":"pci.0","addr":"0x2"} -device {"driver":"usb-host","hostdevice":"/dev/bus/usb/002/005","id":"hostdev0","bus":"usb.0","port":"1"} -device
 {"driver":"usb-host","hostdevice":"/dev/bus/usb/002/061","id":"hostdev1","bus":"usb.0","port":"2"} -device {"driver":"usb-host","hostdevice":"/dev/bus/usb/002/064","id":"hostdev2","bus":"usb.0","port":"3"} -device {"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.0","addr":"0x7"} -sandbox
 on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

I can't make out what exactly here is triggering the writes.

Would the ~ 230 KiB be a reasonable amount of writes for the boot pool without the dataset?

I don't want to hijack the thread, I could create a separate one but I guess it's somewhat related to OPs problem?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Everything reading "debug-something=on" is at least suspicious.
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
@Patrick M. Hausen thanks
Any idea how to check where I can see why it's set to on? Or where I can find relevant logs?

When I go to download logs under Virtualization, the logs don't read anything interesting / isn't flooded with anything. I'm on mobile, I can attach one later.
 

ThEnGI

Contributor
Joined
Oct 14, 2023
Messages
140
I suggest making a bug report.
Can you guide me making One ?
Es. A link to start ?what attach ?
I have something like 250 KiB/s, which gonna use a standard 128 GiB SSD in like 5 years. Dunno why multi_report Say less ?
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
My goodness SCALE is quite "chatty". :oops:

10-15% nonstop CPU usage?
Well, at least we begin to find out what Kubernetes is doing with all these cycles…
Some angry SCALE users call out "FUD" when one dares to quote the "10-15%" figure and question whether their preferred platform might possibly have an issue of excessively high activity at idle.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
I have something like 250 KiB/s, which gonna use a standard 128 GiB SSD in like 5 years. Dunno why multi_report Say less ?
I'd guess it has something to do with the scheduling of the script or the smart tests, but let me summon @joeschmuck.
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
Es. A link to start ?what attach ?
I used my lunchbreak to create a ticket. As I suspect our issues are related, feel free to provide your information / debug as well.
 

ThEnGI

Contributor
Joined
Oct 14, 2023
Messages
140
Open Ticket 155319
I opened the ticket because the problem doesn't seem exactly the same to me. In case the Triage team will merge them
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I have something like 250 KiB/s, which gonna use a standard 128 GiB SSD in like 5 years. Dunno why multi_report Say less ?
Your SSD may report differently since the SMART output is really not very standardized. But I agree, 88% left and ~400GB written sounds way off for a normal SSD. If you run the script using -dump email it will email me a copy of your drive SMART data and I can look into it. If I need to update the script then I will send you an updated version of what I plan to release hopefully in the next month.

As for your original problem, have you looked into your SWAP file? Is any being used? If yes, then where is your SWAP file located? I believe I've heard that it can be on the boot drive if the drive is of sufficient size. But there have been many issues with the latest release of SCALE. Not sure if you can roll back to an earlier version to see if the problem goes away.
 

sfatula

Guru
Joined
Jul 5, 2022
Messages
608
Well, at least we begin to find out what Kubernetes is doing with all these cycles…
Some angry SCALE users call out "FUD" when one dares to quote the "10-15%" figure and question whether their preferred platform might possibly have an issue of excessively high activity at idle.
Because it's not normal and near as I can tell, there is no 10-15% here, if there is, what graph!? As the OP pointed out, that was at a busy moment, not endless! See post 13 where he stated so. More FUD. :confused: What is not FUD is he has a lot of data being written to his boot pool, too much.

And now I see another user has come in and started posting stuff, I assumed it was the OP but actually looking now, it's not. So, back to the OP... I was so confused and thought it was the same guy. When you just randomly read and don't pay enough attention. :smile:

Here's what I read from OP:

And then 1GB/h of log
so let's write 1TB/h (perhaps a typo?)

So, the problem is you have 1GB/h being written to the boot pool, is that correct? That's your sda graph, sda is your boot pool? And it's the wear level report. I got all that but started reading other posts and got mixed up. What drive is your application pool on?

If you have 1GB/h being written, can you not determine which files as a hint which some monitoring? All logging goes to /var/log directory so would think the file(s) are there. Might help the ticket. And that was before you had kubernetes active if I understand correctly from post 28. That's a lot of data and cannot possibly be expected.

While logging is now done on the boot pool, 1GB/h cannot be expected and esp with no apps running. Of course, this is Cobia and it definitely has had it's share of troubles. Glad you submitted a ticket, something is wrong and hopefully with your help they can track it down. I'm still holding off until the .2 release. And I have to rethink my boot device!

To answer the other question, don't see it answered, there isn't really a need per se to mirror the boot pool. You can for uptime which I do as I don't want downtime. As long as you download the config every so often you just reinstall Scale on a new boot drive and restore the config file and you are back same as before.
 
Top