Container "old" image prunning frequency/interval settings

mkarwin · Sep 8, 2021

Is there an option/parameter/setting where we could specify when we want the downloaded application images prunning? I mean either to manually eg. clean of old images (without related containers/pods, or eg. some old images downloaded/created during app deployment or left after the app upgrades, or some that were used for testing and might no longer even be required etc.) or to define specific frequency per day/week/month or intervals when the unused images are to be checked if no longer in use/required and in turn deleted? Alternatively, does the Scale 21.08 have any internally defined timeframes of container images clean-up? I haven't seen this info in the documentation pages in TrueNAS Scale and Truecharts projects.

waqarahmed · Sep 9, 2021

@mkarwin that's a really nice suggestion. Can you please create a suggestion ticket at https://jira.ixsystems.com outlining your suggestion ? For now, this can be done manually which is obviously not convenient/straight forward from the Docker Images section. Thanks!

Zain · Feb 7, 2022

Is this something that has been implemented? @mkarwin did you happen to put in the suggestion ticket for this? I'd like to +1 it, if so. Thanks.

crkinard · Feb 7, 2022

Making a scheduled task that runs something like this might work.

Code:

docker image prune -a -f --filter "until=24h"

mkarwin · Feb 8, 2022

Zain said:
Is this something that has been implemented? @mkarwin did you happen to put in the suggestion ticket for this? I'd like to +1 it, if so. Thanks.

I haven't seen it implemented yet, and frankly I don't even remember if I finally went through with the feature request - there were a few other features/functionalities that I found more important to be ironed out for the stable releases ;) Perhaps once we reach 22.02 stable and such a feature is not present there, and thus code freeze ends, I'll create such a suggestion and link it here...

mkarwin · Feb 8, 2022

crkinard said:
Making a scheduled task that runs something like this might work.

Code:
docker image prune -a -f --filter "until=24h"

Yeah that's an alternative I used to use in my test system, but I was wondering whether such a tiny Quality of Life functionality is hidden somewhere in the UI or not, and if not and is not present, whether there is some constraint I may not be aware of against adding such an UI feature somewhere in the docker images menu/section...

Zain · Feb 8, 2022

I found a newer thread where someone else was asking about this same functionality. Sounds like it will be a feature in the future, just don't know when.

It's more of a convenience/housekeeping thing that would be nice to have anyways. Images build up rather quickly when you have several apps. Thanks.

truecharts · Feb 9, 2022

Zain said:
It's more of a convenience/housekeeping thing that would be nice to have anyways. Images build up rather quickly when you have several apps. Thanks.

It's actually super vital, as without pruning your storage interaction (creating and deleting datasets, snapshots etc), will slow down to a crawl over time.
That's also why we added the prunning feature to TrueTool as well :)

mkarwin · Mar 20, 2022

I've added this as suggestion in JIRA in case it's forgotten as after upgrade to RELEASE I still cannot see this feature added already ;) Hopefully this will allow the feature be properly implemented and added.

truecharts · Mar 20, 2022

mkarwin said:
I've added this as suggestion in JIRA in case it's forgotten as after upgrade to RELEASE I still cannot see this feature added already ;) Hopefully this will allow the feature be properly implemented and added.

As you might've noticed, last posts about this where during RC status.
There was no chance of this still being added that late in the release pipeline ;-)

JIRA ticket is nice though :)

mkarwin · Mar 20, 2022

truecharts said:
It's actually super vital, as without pruning your storage interaction (creating and deleting datasets, snapshots etc), will slow down to a crawl over time.
That's also why we added the prunning feature to TrueTool as well :)

Yeah, after recent upgrade to RELEASE I ended up with a boot taking more than 24h (as in from the moment reboot started to when all the selected for autostart services became available), just the boot stage was taking a few hours due to job.import_pool_onboot hanging at 80% for hours. I tried to check TrueTool to eg attempt to clear any older than some time application backups or prune docker images, but currently all I get is:

Code:

MKNAS#    truetool -l           
Starting TrueCharts TrueTool...

Generating Backup list...

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/midcli/command/call_mixin/__init__.py", line 36, in call
    rv = c.call(name, *args, job=job, callback=self._job_callback)
  File "/usr/lib/python3/dist-packages/midcli/command/call_mixin/__init__.py", line 36, in call
    rv = c.call(name, *args, job=job, callback=self._job_callback)
  File "/usr/lib/python3/dist-packages/middlewared/client/client.py", line 458, in call
    raise CallTimeout("Call timeout")
middlewared.client.client.CallTimeout: Call timeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/cli", line 12, in <module>
    sys.exit(main())
  File "/usr/lib/python3/dist-packages/midcli/__main__.py", line 267, in main
    cli.run()
  File "/usr/lib/python3/dist-packages/midcli/__main__.py", line 176, in run
    self.context.process_input(self.command)
  File "/usr/lib/python3/dist-packages/midcli/context.py", line 310, in process_input
    namespace = self.current_namespace.process_input(text)
  File "/usr/lib/python3/dist-packages/midcli/context.py", line 78, in process_input
    return i.process_input(rest)
  File "/usr/lib/python3/dist-packages/midcli/context.py", line 78, in process_input
    return i.process_input(rest)
  File "/usr/lib/python3/dist-packages/midcli/context.py", line 81, in process_input
    i.process_input(rest)
  File "/usr/lib/python3/dist-packages/midcli/command/common_syntax/command.py", line 24, in process_input
    self.run(args, kwargs, interactive)
  File "/usr/lib/python3/dist-packages/midcli/command/generic_call/__init__.py", line 132, in run
    self._run_with_args(args, kwargs)
  File "/usr/lib/python3/dist-packages/midcli/command/generic_call/__init__.py", line 170, in _run_with_args
    self.call(self.method["name"], *call_args, job=self.method["job"])
  File "/usr/lib/python3/dist-packages/midcli/command/call_mixin/__init__.py", line 47, in call
    if (error := self._handle_error(e)) is not None:
  File "/usr/lib/python3/dist-packages/midcli/command/call_mixin/__init__.py", line 77, in _handle_error
    return format_error(self.context, e)
  File "/usr/lib/python3/dist-packages/midcli/middleware.py", line 8, in format_error
    if e.trace["class"] == "CallError":
TypeError: 'NoneType' object is not subscriptable

guess once you're impacted by some issues, that influence performanceot the storage layer interactions this tool is already too late ;)

mkarwin · Mar 20, 2022

truecharts said:
As you might've noticed, last posts about this where during RC status.
There was no chance of this still being added that late in the release pipeline ;-)

JIRA ticket is nice though :)

Yeah I knew it was late in the pipeline, the code freeze etc., that's why I haven't created any JIRA ticket earlier. Now that RELEASE is ready I guess it'd be better to have this as a JIRA ticket to track any progress of implementations ;)

truecharts · Mar 20, 2022

mkarwin said:
guess once you're impacted by some issues, that influence performanceot the storage layer interactions this tool is already too late ;)

Quite likely, as it's mostly a prevention tool...

Zain · Mar 24, 2022

Thanks for putting in the Jira, I'm surprised I'm only the second one to +1 it...

mkarwin · Mar 25, 2022

It appears that either this only impacts some of us, and then should be considered a big bug, or is a feature most of the users are really interested in retaining as-is ;) Or maybe they're just using high performance SSD pool for apps on a high performance compute components and then they're not impacted as much... yet. Or maybe they've got some secret sauce uber useful script/solution to take care of the things before they become the issues we might be experiencing...

Etorix · Mar 25, 2022

When I checked I found over 170 snapshots in one month of testing just two containers. Not enough to cause issues yet as I do not have any other share or storage on this testbed, but indicative of the issue.
I have manually brought the number down to 66 snapshots, which still seems too much, and have now set the 'docker image prune' task as a cron job to run every six hours. Let's see what it gives…

mkarwin · Mar 27, 2022

Etorix said:
When I checked I found over 170 snapshots in one month of testing just two containers. Not enough to cause issues yet as I do not have any other share or storage on this testbed, but indicative of the issue.

Given that the snapshots are taken whenever eg. a new layer of the docker image is applied, it might mean those using more apps with more layers each will have even more snaps, and over time this number just grows...

Etorix said:
I have manually brought the number down to 66 snapshots, which still seems too much, and have now set the 'docker image prune' task as a cron job to run every six hours. Let's see what it gives…

Which method, may I ask, have you used to get it there to less than half? Have you by any chance used some automata or have you used the WebUI to delete snaps by hand?

Etorix · Mar 27, 2022

mkarwin said:
Which method, may I ask, have you used to get it there to less than half? Have you by any chance used some automata or have you used the WebUI to delete snaps by hand?

First the webUI (obviously not satisfactory and nor scalable…) and then all sorts of variations on the shell commands here:

How to delete all but last [n] ZFS snapshots?

I'm currently snapshotting my ZFS-based NAS nightly and weekly, a process that has saved my ass a few times. However, while the creation of the snapshot is automatic (from cron), the deletion of old

serverfault.com

(This is only a test platform, so I wasn't concerned if I blew it in the process…)

mkarwin · Mar 28, 2022

FYI, while it's not an official solution, given how some SCALE users eg. suggested going for

Code:

docker image prune --all --force --filter "until=24h"

once in a while to "manually" clean the system of any non-needed images, and some reported some success in improving the situation just from running

Code:

docker container prune --force --filter "until=24h"

once in a while (though if you want to combine both you'd need to first prune containers and then images, and then run

Code:

docker volume prune --force

cause volumes do not know the until filter), I've went through with docker specific but more combined/thorough solution...
I've used

Code:

docker system prune --all --force --volumes

(unfortunately if I want to clean volumes within this command I cannot use until filter as the same as for volume domain specific applies here as well) in a daily cron job (though I think I'll move it to something like a bi-weekly or monthly job after some testing period) - sure it's a workaround, it's rough around the edges, it's not allowing a combined volumes and date filters, and most importantly it somewhat "breaks" the application's instant rollbacking capability (seems like TrueNAS connects "previous entries" for the app with their on-filesystem/on-dataset sets of snapshots/clones that are removed upon docker prune within the "previous pod versions), but I can kind of live with that as I'm testing app (re)deployments usually between 16:00 and 24:00 and have the daily prune run set for 03:00, so there's a lot of time to test and then rollback if needed before the prune runs.
Of course, one could also combine a full stack of docker * prune for each subdomain - container, image, volume, network, cache - chained together and use the respective best options/switches to clean safer and better and granularlier, but I went with a single basic command instead. Whether you like the single though limited command or a combo of domain specifics is up to you of course.
In effect:

all the stopped/finished containers upon run have been removed/deleted and now the app restarts for the remaining active containers/pods have greatly improved in snappiness (back to how it was) - overall reported container count dropped from 2k+ to nearly 100...
thousands upon thousands of snaps, clones and filesystems/volumes have been removed along with the containers - i'm from 46.6k down to 0.6k of snaps, and in storage terms that's nearly 100GB freed...
hundreds of hanging/older versions of images have been deleted - i'm down from way over 2k to less than 20 now...
network routing to cluster and through, to and from respective containers has also improved...
docker build caches have dramatically reduced the storage aspect - down from over 3GB used to less than 1GB...
my CPU now reports under 10% on *idle* (consider idle as no active compute from running containers) with temps then quickly dropping at idle times to 40C - previously even on *idle* CPU in my server was hovering around 70% usage with temps at similar number though in degs C...

Overall, I think I'm going to become a bit happier user with this primitive workaround until a proper smarter approach is offered in TrueNAS Scale (I'm thinking like an option to mark a container/pod as test or dev or prod one to eg. keep the containers and snapshots for debug analysis or have them pruned upon container failure or after a few hours/daily for already tested and presumably working PROD ones). But since docker support's approach is basically this is by design behaviour to enable docker volumes/layers analysis on failing/broken containers (which I honestly did and still do use a lot upon building/testing my own docker images with my own docker repository) and any maintenance is to be organised elsewhere, and TrueNAS team's approach currently remains at this is docker dogmatic issue (they are right about that, it's how docker devs thought out docker, more in line with ad hoc start/stop quickly apps than those running for longer periods of time) that it's not properly cleaning after itself in the long run, I think this will do for now until better solution is devised/offered in some future version of TrueNAS (as in periodic housekeeping of any trash docker/kubernetes leftovers).
I've also replaced all the aliases for

Code:

docker run

with

Code:

docker run --rm

for any stable docker image/container for my users (to reduce stopped/finished but hanging in docker lists containers counts and reduce chances of my users generating noise/trash from impromptu/ad hoc container runs), and left the regular not clean after self on fail docker run command for the small subset of build/test deployments for debug purposes.
Hopefully my set of workarounds will help others.
Please bear in mind that this workaround clears anything docker created for non-currently running containers, so if you have some stopped app that you start only now and then you need to have it running when the prune command analyses the containers, otherwise the container/image/snaps/volumes/networks created by docker will get purged. I currently have only 2 docker images and corresponding pods that I build from scratch/source in my CI/CD pipeline for a one time single somewhat short action docker apps that are stopped most of the time, other apps are constantly running/in use, so this solution works for me. But your mileage may vary...

Just FYI, this is not a complete solution, the result is still about 500 snaps remaining, some of which are the the ones from the Data protection feature - so the automated daily snaps that are automatically cleared correctly.
There are still droves of snaps taken seemingly upon reboot/kubernetes restart that are non-removable due to snapshot has dependent clones returned message on attempting of their zfs destroy runs by hand (these are on <pool>/ix-applications/<docker>/<dockerimagelayerfilesystem>), and multiples are even removable (these are on <pool>/ix-applications/release/<appname>/<numberedversion> subdataset), both of which are snaps taken by docker/kubernetes for apps in the environment. The latter of which with their contents refering to some specific versions of the app deployments (historically) in use on the machine/server (I've had more than 50 release snaps for some of the apps deployed) - these are not in use per se, but are "recipes/snaps" for specific application deployment rollbacks as used in Installed Applications -> App -> Rollback functionality, if I recall/understand correctly. I'm not sure which docker/kubernetes mechanism manages this, but over time even only a handful of running apps will grow the number of these 2 types of snaps to over a hundred or easily over 500. Sure it's still a reduction thanks to the daily docker prune runs compared to multiples of thousands before, but these are hundreds of snaps not taken by user managed/deployed automatas, nor are they related to/pruned by aforementioned commands for cron jobs but instead these are snaps taken for the Kubernetes/docker environment by those application tools.
Perhaps in future TrueNAS Scale dev team will offer some intelligent cleaning utility to take care of these in a smart manner automatically or give an option to trim creation of these in app deployment forms. Sure, they're not taking a lot space on the storage per se thanks to the snapshotting nature, but these are still hundreds of snapshots that should be better managed by the system.

Etorix · Mar 31, 2022

Etorix said:
First the webUI (obviously not satisfactory and nor scalable…) and then all sorts of variations on the shell commands here:

(This is only a test platform, so I wasn't concerned if I blew it in the process…)

I restarted my Scale test server… and apps did NOT came up. So a warning: Don't try to savagely trim these snapshots!

Important Announcement for the TrueNAS Community.

Container "old" image prunning frequency/interval settings

Dabbler

iXsystems

Contributor

Explorer

Dabbler

Dabbler

Contributor

Guru

Dabbler

Guru

Dabbler

Dabbler

Guru

Contributor

Dabbler

Wizard

Dabbler

Wizard

Dabbler

Wizard

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Container "old" image prunning frequency/interval settings"

Similar threads