Boot-pool has more snapshots than recommended

aza9156 · Aug 20, 2022

Good Day,
Hope you are all keeping well! My first post, hopefully I've followed the etiquette appropriately. Please ask if you need more details from me, I appreciate the hardware I have isn't the perfect selection, however I'm incrementally upgrading as I find money wedged in the sofa cushions, and learning from my mistakes.

Problem I'm having

I've run into an issue that I can't find any directly related information on and no controls or options to resolve that I am aware of in the web GUI.

Dataset boot-pool/ROOT/22.02.3/***** has more snapshots (767) than recommended (512). Performance or functionality might degrade.

Steps I've taken so far

A good afternoon of googling and forum trawling
Checked I'm all upgraded to the lastest version of the OS
To confirm this is the case by opening the shell and running 'zfs list -t snapshot', I didnt count all the results but can confirm there is a significant list of snapshots relating to the boot pool.
Removed all but the most recent archived boot enviroments.
Checked 'Data Protection > Periodic Snapshots' menu, however all snapshot settings in here relate to data pools and it is not possible to target the boot pool as an option with these setup wizards.
Checked 'System Settings > Boot' menu for any snapshot related controls for the pool, however cant find any.

Some musings to sanity check

My first thought is, will an SSD fix this, however the pool isn't reporting that capacity is the issue, it is just suggesting that there is too many snapshots, I'm guessing the index/database that it uses to track the snapshots could be at capacity. The flash drive still reports 3.7GB of remaining space available.
I've been attempting to get Apps up and running to reduce my resource consumption by running VM's, however the Apps (or more specifically docker images) are spending an incredible amount of time sat in the deploying phase (36hrs+). Are boot pool snapshots part of the app deployment process, is the endless attempts at deploying constantly generating snapshots?

Any help that you can provide would be much appreciated, thank you in advance.
Thanks again
Aaron

Ericloewe · Aug 20, 2022

aza9156 said:
To confirm this is the case by opening the shell and running 'zfs list -t snapshot', I didnt count all the results but can confirm there is a significant list of snapshots relating to the boot pool.

To count lines, pipe output into wc - e.g. zfs list -t snapshot -r bpool | wc. Not that we have any reason to not believe the GUI...

aza9156 said:
My first thought is, will an SSD fix this, however the pool isn't reporting that capacity is the issue, it is just suggesting that there is too many snapshots, I'm guessing the index/database that it uses to track the snapshots could be at capacity. The flash drive still reports 3.7GB of remaining space available.

The reasoning is even simpler: Listing a lot of snapshots can be slow. Beyond that, they don't have an impact. Of course, TrueNAS needs to list snapshots for various reasons, so the location of the performance hit may be unexpected.

At this point, I think we need to see an excerpt of the snapshots list, to figure out what created them. From there, we can decide how to proceed.

Daisuke · Aug 20, 2022

IMO something is definitely wrong, this is what I have now:

Code:

# zfs list -r boot-pool
NAME                                                         USED  AVAIL     REFER  MOUNTPOINT
boot-pool                                                   5.39G  93.5G       96K  none
boot-pool/.system                                           7.53M  93.5G      112K  legacy
boot-pool/.system/configs-c859517f6d7a497888d06ed58c52b7a6    96K  93.5G       96K  legacy
boot-pool/.system/cores                                       96K  1024M       96K  legacy
boot-pool/.system/ctdb_shared_vol                             96K  93.5G       96K  legacy
boot-pool/.system/glusterd                                    96K  93.5G       96K  legacy
boot-pool/.system/rrd-c859517f6d7a497888d06ed58c52b7a6      6.19M  93.5G     6.19M  legacy
boot-pool/.system/samba4                                     320K  93.5G      220K  legacy
boot-pool/.system/services                                    96K  93.5G       96K  legacy
boot-pool/.system/syslog-c859517f6d7a497888d06ed58c52b7a6    364K  93.5G      364K  legacy
boot-pool/.system/webui                                       96K  93.5G       96K  legacy
boot-pool/ROOT                                              5.37G  93.5G       96K  none
boot-pool/ROOT/22.02.2.1                                    2.61G  93.5G     2.61G  legacy
boot-pool/ROOT/22.02.3                                      2.75G  93.5G     2.75G  legacy
boot-pool/grub                                              8.17M  93.5G     8.17M  legacy

# zfs list -t snapshot -r boot-pool
NAME                                      USED  AVAIL     REFER  MOUNTPOINT
boot-pool/.system/samba4@wbc-1660346089   100K      -      152K  -

aza9156 said:
The flash drive still reports 3.7GB of remaining space available.

That's very low, you should not use an USB drive just to start with. I recommend a 128Gb Lexar SSD as replacement (excellent cost/reliability, vs other models), connected with a SKL Tech connector to internal USB (save the SATA port for an additional spinner). The connector is the only model that worked flawlessly with other devices in my tests, including Raspberry Pi, which are notorious for connectivity issues.

You might say, why do I need 128Gb. First, the larger the SSD, the longer will last. Second, Linux requires additional space for various operations, is best to allocate a fair amount. For Scale OS, 128Gb is a great size, even if the OS uses only 6Gb. $18 will not hurt your budget, for a reliable SSD.

aza9156 said:
I've been attempting to get Apps up and running to reduce my resource consumption by running VM's, however the Apps (or more specifically docker images) are spending an incredible amount of time sat in the deploying phase (36hrs+).

You should run the apps on SSDs, it will make a dramatic difference, taking few seconds to deploy/update an app, no video shuttering, etc. Create one default pool with all your slow spinners on a RaidZ2 array (for storage purposes) and one software pool with dual SSDs (for apps only). Don't forget, you run a Kubernetes cluster with all your apps on software pool, you want ultra fast disks.

This is the SSD model I use for software pool (Samsung 870 EVO, best cost/reliability, vs other models):

Code:

# fdisk -l /dev/sdj
Disk /dev/sdj: 465.76 GiB, 500107862016 bytes, 976773168 sectors
Disk model: Samsung SSD 870
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: [removed]

Device       Start       End   Sectors   Size Type
/dev/sdj1      128   4194304   4194177     2G Linux swap
/dev/sdj2  4194432 976773134 972578703 463.8G Solaris /usr & Apple ZFS

Make sure you move your ix-applications dataset to software pool, to take advantage of the SSDs (Apps > Available Applications > Settings > Choose Pool):

With the current situation where thousands of snapshots are generated into software pool, 500Gb is a fair size especially if you run apps that generate a lot of storage use, like Plex, etc. I strongly recommend you use TrueCharts apps and use default PVC (Simple) storage, unless you want to have constant problems. I had a difficult time building a reliable backup solution for apps, that's why I say not to mess with default storage. Read more on TrueCharts documentation.

Time will tell if I need to upgrade to 1Tb, I have 657 undetectable snapshots generated by 8 apps:

Code:

# zfs list -t snapshot -r software | wc
    657

My current storage:

Many people do not know you need to enable Auto TRIM for SSDs, don't forget to enable the option for software pool:

You can check if the service is running in your OS also, it will trim your OS USB connected SSD:

Code:

# systemctl status fstrim.timer
● fstrim.timer - Discard unused blocks once a week
     Loaded: loaded (/lib/systemd/system/fstrim.timer; enabled; vendor preset: enabled)
     Active: active (waiting) since Sat 2022-08-20 18:49:40 EDT; 2h 57min ago
    Trigger: Mon 2022-08-22 01:27:45 EDT; 1 day 3h left
   Triggers: ● fstrim.service
       Docs: man:fstrim

Aug 20 18:49:40 uranus systemd[1]: Started Discard unused blocks once a week.

The only services I run are SMB and SSH. I only keep one additional boot image, in case I need to revert back.

aza9156 · Aug 21, 2022

Ericloewe said:
At this point, I think we need to see an excerpt of the snapshots list, to figure out what created them. From there, we can decide how to proceed.

Here is some exceprts from the list as requested;

# zfs list -t snapshot -r boot-pool
NAME USED AVAIL REFER MOUNTPOINT
boot-pool/ROOT/22.02.2.1/017839dddbc0566556cd8402d905007c0c7386757a23235e022abd93ac22f53f-init@40176105 8K - 21.0M -
boot-pool/ROOT/22.02.2.1/02463460995ad20c95f729c595fae9af92e57b22e500e2609f6d0bed6a39281f-init@738480219 8K - 14.3M -
boot-pool/ROOT/22.02.2.1/11e68dc1c17b37b677f85589ef97246b261ddd032dfacab616c2c059992c288e@230215847 8K - 4.38M -
boot-pool/ROOT/22.02.2.1/145be091aaf407441fdacba595860ecd5cd84158fe90417e5aca2926839691c6@679229378 8K - 219M -
boot-pool/ROOT/22.02.2.1/20a218db8b71197d38bf5419af34c03a26c898d8c8c17fc6a8d24779b24f0509-init@31829158 8K - 516K -
boot-pool/ROOT/22.02.2.1/22367ac05477871785a481ce6710d0216cb1316ccfb45558b3cfea5ce9ec5108@313129712 88K - 44.7M -
boot-pool/ROOT/22.02.2.1/2bd03e6dc0bf123668d3414ce6fc389883e5f79e3e76e29c629a7f561b3fa0d8@308664044 8K - 96.5M -
boot-pool/ROOT/22.02.2.1/2e2e1fcfd0c740a91e3ce0ef0c1ef0238c37382ad1768f9fc55a76240b492b73-init@151303776 8K - 27.4M -
*** more of the same ***
boot-pool/ROOT/22.02.2.1/e62d43710a62566602108f2f8fce73c19cbefdd90c887f3a1978e116648de2a4@190781142 0B - 29.0M -
boot-pool/ROOT/22.02.2.1/e8501eb770036ac6d9f4f70ba6f9ab26d343608227a96a74e9d85d6fa158dad8-init@757991193 8K - 516K -
boot-pool/ROOT/22.02.2.1/e937319f0eadadf51eea88dfb1b431a7d0e00812ed5ee5e856bb567b3aa678e7@271148348 8K - 49.9M -
boot-pool/ROOT/22.02.2.1/eb8190127f55a5c14216559c42774ffc6976237978a0af9dead923ccfd1a7e2a@982664762 8K - 52.3M -
boot-pool/ROOT/22.02.2.1/f208d5261755e80405bdae2ddd3aad54f8b13c8a78cef4805699d75c7dd000d2-init@39205498 8K - 29.0M -
boot-pool/ROOT/22.02.2.1/f5a9f63555aea67c6b7520420ad892ad3a78dd305f7d5ff73d54af2078c1e7a9@471920395 0B - 32.4M -
boot-pool/ROOT/22.02.2.1/f5a9f63555aea67c6b7520420ad892ad3a78dd305f7d5ff73d54af2078c1e7a9@366304349 0B - 32.4M -
boot-pool/ROOT/22.02.2.1/f784d8ab5c48a5e910454bece9342abe9d52544255436ddc8428fd53cdad089e@596595857 8K - 44.7M -
boot-pool/ROOT/22.02.3/0025bc6bc3f4d35b0074f69f5fc65eedc0173db99b1b5cad1834d60d17aba53f-init@289760693 0B - 516K -
boot-pool/ROOT/22.02.3/00357aeb22773ff3474cbe9c8eeab3688176c5c97f4d97f3a61de77365bdc446-init@500659479 8K - 516K -
boot-pool/ROOT/22.02.3/008c39ba53a162f674891c74f2a132558e50ce5159ecf912a0ec290927555f6b-init@931518409 0B - 516K -
boot-pool/ROOT/22.02.3/009dbe6b1bc24eba48c1ce1380b54e939e5524ee294784eea2530ef0453347e4-init@936241160 0B - 516K -
boot-pool/ROOT/22.02.3/00c69a20e8f2ff7b5f35dfad498243123e4bdc9b927208abe8396d37f3422a76-init@869884068 8K - 516K -
boot-pool/ROOT/22.02.3/0110f127916951f70347ab96cd305a2583db1cbddf956dda484b1df28e148cd0-init@624963741 8K - 516K -
boot-pool/ROOT/22.02.3/0161e9b077059fc39edd77c652b3e4d5349012bdc45a99d5955be3997d9db7c7-init@574828341 8K - 516K -
boot-pool/ROOT/22.02.3/01f91c192aa5c851c00e74a8c35b83e6f164a527b7effd6d4cce53731dd26d3a-init@370262585 8K - 516K -
*** more of the same ***
boot-pool/ROOT/22.02.3/de6cfb2d14140f88c154980f373a0e62d01f3988418d7af9d03b753f9e1684ba-init@595133048 8K - 516K -
boot-pool/ROOT/22.02.3/dea50124fb2779cbe871f403ccff7060916d16bb5a683de52da40a689e83bd99-init@800591410 8K - 516K -
boot-pool/ROOT/22.02.3/dec887f47e99a54eba9e1e3435fe623cbdf5a946d8f678f9f665ae5714781f90-init@342767531 0B - 516K -
boot-pool/ROOT/22.02.3/df400224d4131ea1e57e0cc4f13c2e365373397e0ffebe8bbc2261e45537e07c-init@950064991 8K - 516K -
boot-pool/ROOT/22.02.3/df4daf7bed90063d743d92b3ebc35a8b75fdcb5c0c7232d34373268944849804-init@588703273 0B - 516K -
boot-pool/ROOT/22.02.3/dfb9ca67bf2c18b7445537d79bd4281ca2312448e5bd3c030b9c1eb6a5c7d4d4-init@892126707 8K - 516K -
boot-pool/ROOT/22.02.3/e011371fb8dabc609c6b0ebec457ba3ab4ea3264c63fda41b506671d1a359b5e-init@795160923 0B - 516K -
boot-pool/ROOT/22.02.3/e0403140d892c07e519559ba5f983955ea4ffcdc2985aff621b775807d13d6f4-init@657807443 8K - 516K -
boot-pool/ROOT/22.02.3/e04a0230dc536981d2018923ba0f8a46518cad10ec5a829e16fab640240a8ade@321230712 8K - 500K -
boot-pool/ROOT/22.02.3/e04a0230dc536981d2018923ba0f8a46518cad10ec5a829e16fab640240a8ade@950008066 0B - 500K -
boot-pool/ROOT/22.02.3/e04a0230dc536981d2018923ba0f8a46518cad10ec5a829e16fab640240a8ade@46476846 0B - 500K -
boot-pool/ROOT/22.02.3/e04a0230dc536981d2018923ba0f8a46518cad10ec5a829e16fab640240a8ade@192037070 0B - 500K -
boot-pool/ROOT/22.02.3/e04a0230dc536981d2018923ba0f8a46518cad10ec5a829e16fab640240a8ade@828799270 0B - 500K -
boot-pool/ROOT/22.02.3/e04a0230dc536981d2018923ba0f8a46518cad10ec5a829e16fab640240a8ade@664818332 0B - 500K -
boot-pool/ROOT/22.02.3/e04a0230dc536981d2018923ba0f8a46518cad10ec5a829e16fab640240a8ade@144427001 0B - 500K -
boot-pool/ROOT/22.02.3/e04a0230dc536981d2018923ba0f8a46518cad10ec5a829e16fab640240a8ade@456819590 0B - 500K -
*** more of the same ***
boot-pool/ROOT/22.02.3/e04a0230dc536981d2018923ba0f8a46518cad10ec5a829e16fab640240a8ade@236456254 0B - 500K -
boot-pool/ROOT/22.02.3/e04a0230dc536981d2018923ba0f8a46518cad10ec5a829e16fab640240a8ade@517629562 0B - 500K -
boot-pool/ROOT/22.02.3/e04a0230dc536981d2018923ba0f8a46518cad10ec5a829e16fab640240a8ade@525688946 0B - 500K -
boot-pool/ROOT/22.02.3/e04a0230dc536981d2018923ba0f8a46518cad10ec5a829e16fab640240a8ade@368507572 0B - 500K -
boot-pool/ROOT/22.02.3/e04a0230dc536981d2018923ba0f8a46518cad10ec5a829e16fab640240a8ade@334063432 0B - 500K -
boot-pool/ROOT/22.02.3/e04a0230dc536981d2018923ba0f8a46518cad10ec5a829e16fab640240a8ade@281581304 0B - 500K -
boot-pool/ROOT/22.02.3/e04a0230dc536981d2018923ba0f8a46518cad10ec5a829e16fab640240a8ade@75563045 0B - 500K -
boot-pool/ROOT/22.02.3/e04a0230dc536981d2018923ba0f8a46518cad10ec5a829e16fab640240a8ade@131484857 0B - 500K -
boot-pool/ROOT/22.02.3/e06dc36b0fa086b648065f6ef9ec64d09074b0570e75733077d3a80a13d196da-init@636416905 0B - 516K -
boot-pool/ROOT/22.02.3/e075eebb693bbf55339d38e7a0c4dac2fa657ce74bcb0124efb690375f4b61cc-init@88610311 8K - 516K -
boot-pool/ROOT/22.02.3/e0c47a940048ab6a5e3bc2e31965b17975cb0d16409899a84b4dca04c64f4ba6-init@343881544 8K - 516K -
boot-pool/ROOT/22.02.3/e0f61ee8a73560eae495dc195ae68238666cb5a6894199269356de7c1af4332f-init@658089837 0B - 516K -
boot-pool/ROOT/22.02.3/e156b0df52f913ee57fcf83c77a79ada6ad30e2a5cf2af4df11f7eed879ada1a-init@204278265 0B - 516K -
boot-pool/ROOT/22.02.3/e16d909444586f20006db8be3225b27a0e1ea670777af80840d0b62d4b7fee2e-init@649932155 8K - 516K -
boot-pool/ROOT/22.02.3/e1e0370936e92ad82352d0539a1f6390565168330dc44140120a930159468f09-init@379625080 0B - 516K -
boot-pool/ROOT/22.02.3/e1eb834563dc9bba45bae217ba9e52044b89ea62e9868a736b41436017ce1ca1-init@766138156 0B - 516K -
*** more of the same ***
boot-pool/ROOT/22.02.3/febacc0b1117e635eb5a72ec5648ab1a6335708632acfa5930f2476d006b16ac-init@924660874 0B - 508K -
boot-pool/ROOT/22.02.3/ff46f342665905dc74d8bcf897219ed026137365da5ccaff7357c2a75614d708-init@494654459 8K - 516K -
boot-pool/ROOT/22.02.3/ff492dbf8c806881847242c4c6c655451bbbfc3c7b077be11130c805da049d46-init@393651198 8K - 516K -
boot-pool/ROOT/22.02.3/ff5e947a38c3fdc046f8b7d57273c009edd7483b64837f73a361a734906b7783-init@258911230 8K - 508K -
boot-pool/ROOT/22.02.3/ff7a1b95667c2886473cafe83e081a6ec36cc89b050cf6ae3e1b2c7cf0d75698-init@420211879 8K - 516K -
boot-pool/ROOT/22.02.3/ff7f02da08496142349b3a618372d260b1875ffed09f9aa853931538906cedb0-init@507275586 0B - 516K -
boot-pool/ROOT/22.02.3/ffc80e5a274237935294df1a3b14df75e848d80b1cc22c1630ae8943e6cc031d-init@78707456 0B - 516K -
boot-pool/ROOT/22.02.3/ffe85d868478b27cf6149886ba8737e039e35dd4ae59f8c5f903c0d99c1161b6-init@932087613 8K - 516K -

TECK:
That's very low, you should not use an USB drive just to start with. I recommend a 128Gb Lexar SSD as replacement (excellent cost/reliability, vs other models), connected with a SKL Tech connector to internal USB (save the SATA port for an additional spinner). The connector is the only model that worked flawlessly with other devices in my tests, including Raspberry Pi, which are notorious for connectivity issues.

Understood, using the SATA ports up was part of my reservations with doing it, hadn't considered the idea of using a USB to SATA adapter and using one of the internal headers. Like you say, an SSD isn't going to break the budget so I will implement that this week.

TECK:
You should run the apps on SSDs, it will make a dramatic difference, taking few seconds to deploy/update an app, no video shuttering, etc. Create one default pool with all your slow spinners on a RaidZ2 array (for storage purposes) and one software pool with dual SSDs (for apps only).

This is already the case, I have a pair of Samsung 830's striped currently as my apps and VM's pool, so read and write speed for those shouldn't be the issue affecting these. The VM's are running fine and I am still relying on those, however I have setup a couple of docker images in the last few weeks to test with, one being Unifi Controller and another as a BudiBase, and these are having the endless deploying issues. However I'm not too concerned with resolving those immediately as the VM's still work fine and I can tinker with apps. I was just mentioning it in case it is relevant to the issue.

TECK:
Many people do not know you need to enable Auto TRIM for SSDs, don't forget to enable the option for software pool:

Also thank you for the headsup, apologies if I missed that step from a guide. Now done!

Ericloewe · Aug 21, 2022

Ok, those are some terribly unwieldy dataset names. Is that a TrueNAS Core thing? And the snapshot names are about as bad, but that does not look like the zettarepl format TrueNAS uses, so I don't know who is taking them.

Etorix · Aug 21, 2022

These must be docker-related automatic snapshots.
Running docker system prune --all --force --volumes from time to time may mitigate the issue, but there's no solution.

Ericloewe · Aug 21, 2022

Etorix said:
These must be docker-related automatic snapshots.
Running docker system prune --all --force --volumes from time to time may mitigate the issue, but there's no solution.

They sort of look like it, but on the boot pool? Did someone forget to move /var/lib/docker to a very separate dataset?

Daisuke · Aug 21, 2022

aza9156 said:
Here is some exceprts from the list as requested

I never saw something like that before, are you using the TrueNAS-SCALE-Angelfish train? How did you installed the OS, are you mirroring the boot-pool? I'm no ZFS expert but something generates these snapshots.

Ericloewe said:
Did someone forget to move /var/lib/docker to a very separate dataset?

If I'm not mistaken, /var is not mounted separately, all Docker apps run on <pool>/ix-applications/docker.

Code:

# df -ahT /var
Filesystem             Type  Size  Used Avail Use% Mounted on
boot-pool/ROOT/22.02.3 zfs    97G  2.8G   94G   3% /

# df -ahT /mnt/software/ix-applications/docker
Filesystem                      Type  Size  Used Avail Use% Mounted on
software/ix-applications/docker zfs   404G   33M  404G   1% /mnt/software/ix-applications/docker

Ericloewe · Aug 22, 2022

What about /var/lib/docker? On my systems (not TrueNAS), I typically just mount a separate dataset at /var/lib/docker to avoid this sort of thing.

aza9156 · Aug 22, 2022

TECK:
I never saw something like that before, are you using the TrueNAS-SCALE-Angelfish train? How did you installed the OS, are you mirroring the boot-pool? I'm no ZFS expert but something generates these snapshots.

Yes I am running 22.02.3 Angelfish Train.
This is a config from a previous core install. The 'Upgrade Install' option was used during the iso install process.
I'm not mirroring the boot-pool.

I have stopped all running docker images and left it a day. There are no additional snapshots, so I can only assume it is related to this docker deploying problem I'm having.

aza9156 · Aug 22, 2022

Etorix:
Running docker system prune --all --force --volumes from time to time may mitigate the issue, but there's no solution.

I have run this, some items were pruned however the number of snapshots on the boot-pool Alert has not reduced, so at the moment I am assuming these were pruned specifically from the pool for VM's and Applications. However I will give it a little bit of time, because I'm not sure that the Alerts statistics/numbers are dynamically updated.

Ericloewe · Aug 22, 2022

So, my questions here are:

Where does Docker come into play, given that TrueNAS Scale was supposed to be using Kubernetes?
Assuming Docker was installed or otherwise enabled through user interaction, is there a guide somewhere which neglects to mention the need to mount /var/lib/docker in its own dataset?
If Docker is available through semi-official means, should TrueNAS automatically set up this dataset to prevent foot shooting, given that there’s zero cost to having it present even if unused?

aza9156 · Aug 23, 2022

Ericloewe:
So, my questions here are:

Where does Docker come into play, given that TrueNAS Scale was supposed to be using Kubernetes?

Assuming Docker was installed or otherwise enabled through user interaction, is there a guide somewhere which neglects to mention the need to mount /var/lib/docker in its own dataset?

If Docker is available through semi-official means, should TrueNAS automatically set up this dataset to prevent foot shooting, given that there’s zero cost to having it present even if unused?

So for docker the only thing I did in the setup process was go to apps tab, click 'Launch Docker Image' and set the ix-applications pool upon first setup. So I am using the built in system which uses Kubernetes but is able to pull images from the docker hub.
The Guide that I followed mentions nothing to do with /var/lib/docker.

I can report that since putting in an SSD boot drive, the docker images have booted near instantly and I have had no problems with the deploying issues that I've been having. What I've summised from reading through a bunch of the logs, is that with the lack of performance from the USB boot drive the kubernetes or docker implementation struggles to get the virtual os enviroments up and running before hitting a timeout due to the slow device data rates. Because of the timeout the process starts again causing a new snapshot to be created in an endless cycle. It might be worth adding an addendum to the Hardware Recommendation Guide that suggests that for a workload other than basic file sharing, an SSD needs to be used to meet the requirements of the app's/docker service.

Ericloewe · Aug 23, 2022

From the sound of it, this needs a bug report to have /var/lib/docker mounted elsewhere, that's the rational solution to the immediate issue.

aza9156 said:
What I've summised from reading through a bunch of the logs, is that with the lack of performance from the USB boot drive the kubernetes or docker implementation struggles to get the virtual os enviroments up and running before hitting a timeout due to the slow device data rates. Because of the timeout the process starts again causing a new snapshot to be created in an endless cycle. It might be worth adding an addendum to the Hardware Recommendation Guide that suggests that for a workload other than basic file sharing, an SSD needs to be used to meet the requirements of the app's/docker service.

It might be worth clarifying some of the language around boot devices in light of Docker using the boot pool (which is not a bad thing to do, in general, since all persistent data should be mounted outside of Docker images). That said, and without checking, I'm pretty sure the recommendation already is to use SSDs for reliability and - to some extent - performance.

aza9156 · Aug 23, 2022

# find / -name docker
/mnt/Tungsten System/ix-applications/docker
/mnt/Tungsten\ System/ix-applications/docker
/mnt/Tungsten\ System/ix-applications/docker/zfs/graph/2e101e2915bf3af1855322becfdc0d970a2c8a466e114e46b7a8b90f3e9331a8/etc/dpkg/dpkg.cfg.d/docker
/run/docker
/etc/init.d/docker
/etc/docker
/etc/default/docker
/usr/libexec/docker
/usr/bin/docker
/usr/share/bash-completion/completions/docker
/usr/lib/python3/dist-packages/docker

Sorry maybe I'm being dense, but struggling to find the /var/lib/docker that you are referencing.

Ericloewe · Aug 23, 2022

That's kind of weird. It'd be interesting to see in which directory Docker is storing its stuff, is it that ix-applications dataset? But then why would the snapshots be taken of everything else?

aza9156 · Aug 23, 2022

root@tungsten[/mnt/Tungsten\ System/ix-applications/docker]# ls
buildkit containers network runtimes tmp volumes
containerd image plugins swarm trust zfs

Correct all the images and containers etc relating to the docker instances are based in the ix-applications dataset

Does the kubernetes backend which is presumably based in the boot-pool due to it using the linux os as the host, trigger a snapshot of the boot-pool just as a safety measure every time an instance is deployed?

Important Announcement for the TrueNAS Community.

Boot-pool has more snapshots than recommended

aza9156

Cadet

Problem I'm having

Steps I've taken so far

Some musings to sanity check

Ericloewe

Server Wrangler

Daisuke

Contributor

aza9156

Cadet

Ericloewe

Server Wrangler

Etorix

Wizard

Ericloewe

Server Wrangler

Daisuke

Contributor

Ericloewe

Server Wrangler

aza9156

Cadet

aza9156

Cadet

Ericloewe

Server Wrangler

aza9156

Cadet

Ericloewe

Server Wrangler

aza9156

Cadet

Ericloewe

Server Wrangler

aza9156

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

Boot-pool has more snapshots than recommended

Cadet

Problem I'm having​

Steps I've taken so far​

Some musings to sanity check​

Server Wrangler

Contributor

Cadet

Server Wrangler

Wizard

Server Wrangler

Contributor

Server Wrangler

Cadet

Cadet

Server Wrangler

Cadet

Server Wrangler

Cadet

Server Wrangler

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Boot-pool has more snapshots than recommended"

Similar threads

Problem I'm having

Steps I've taken so far

Some musings to sanity check