FYI, while it's not an official solution, given how some SCALE users eg. suggested going for
Code:
docker image prune --all --force --filter "until=24h"
once in a while to "manually" clean the system of any non-needed images, and some reported some success in improving the situation just from running
Code:
docker container prune --force --filter "until=24h"
once in a while (though if you want to combine both you'd need to first prune containers and then images, and then run
Code:
docker volume prune --force
cause volumes do not know the
until filter), I've went through with docker specific but more combined/thorough solution...
I've used
Code:
docker system prune --all --force --volumes
(unfortunately if I want to clean
volumes within this command I cannot use
until filter as the same as for volume domain specific applies here as well) in a daily cron job (though I think I'll move it to something like a bi-weekly or monthly job after some testing period) - sure it's a workaround, it's rough around the edges, it's not allowing a combined volumes and date filters, and most importantly it somewhat "breaks" the application's instant rollbacking capability (seems like TrueNAS connects "previous entries" for the app with their on-filesystem/on-dataset sets of snapshots/clones that are removed upon docker prune within the "previous pod versions), but I can kind of live with that as I'm testing app (re)deployments usually between 16:00 and 24:00 and have the daily prune run set for 03:00, so there's a lot of time to test and then rollback if needed before the prune runs.
Of course, one could also combine a full stack of
docker * prune for each subdomain - container, image, volume, network, cache - chained together and use the respective best options/switches to clean safer and better and granularlier, but I went with a single basic command instead. Whether you like the single though limited command or a combo of domain specifics is up to you of course.
In effect:
- all the stopped/finished containers upon run have been removed/deleted and now the app restarts for the remaining active containers/pods have greatly improved in snappiness (back to how it was) - overall reported container count dropped from 2k+ to nearly 100...
- thousands upon thousands of snaps, clones and filesystems/volumes have been removed along with the containers - i'm from 46.6k down to 0.6k of snaps, and in storage terms that's nearly 100GB freed...
- hundreds of hanging/older versions of images have been deleted - i'm down from way over 2k to less than 20 now...
- network routing to cluster and through, to and from respective containers has also improved...
- docker build caches have dramatically reduced the storage aspect - down from over 3GB used to less than 1GB...
- my CPU now reports under 10% on *idle* (consider idle as no active compute from running containers) with temps then quickly dropping at idle times to 40C - previously even on *idle* CPU in my server was hovering around 70% usage with temps at similar number though in degs C...
Overall, I think I'm going to become a bit happier user with this primitive workaround until a proper smarter approach is offered in TrueNAS Scale (I'm thinking like an option to mark a container/pod as
test or
dev or
prod one to eg. keep the containers and snapshots for debug analysis or have them pruned upon container failure or after a few hours/daily for already tested and presumably working PROD ones). But since docker support's approach is basically
this is by design behaviour to enable docker volumes/layers analysis on failing/broken containers (which I honestly did and still do use a lot upon building/testing my own docker images with my own docker repository) and any maintenance is to be organised elsewhere, and TrueNAS team's approach currently remains at
this is docker dogmatic issue (they are right about that, it's how docker devs thought out docker, more in line with ad hoc start/stop quickly apps than those running for longer periods of time) that it's not properly cleaning after itself in the long run, I think this will do for now until better solution is devised/offered in some future version of TrueNAS (as in periodic housekeeping of any trash docker/kubernetes leftovers).
I've also replaced all the aliases for
with
for any stable docker image/container for my users (to reduce stopped/finished but hanging in docker lists containers counts and reduce chances of my users generating noise/trash from impromptu/ad hoc container runs), and left the regular
not clean after self on fail docker run command for the small subset of build/test deployments for debug purposes.
Hopefully my set of workarounds will help others.
Please bear in mind that this workaround
clears anything docker created for non-currently running containers, so if you have some stopped app that you start only now and then you need to have it running when the prune command analyses the containers, otherwise the container/image/snaps/volumes/networks created by docker will get purged. I currently have only 2 docker images and corresponding pods that I build from scratch/source in my CI/CD pipeline for a
one time single somewhat short action docker apps that are stopped most of the time, other apps are constantly running/in use, so this solution works for me. But your mileage may vary...
Just FYI, this is not a complete solution, the result is still about 500 snaps remaining, some of which are the the ones from the
Data protection feature - so the automated daily snaps that are automatically cleared correctly.
There are still droves of snaps taken seemingly upon reboot/kubernetes restart that are non-removable due to
snapshot has dependent clones
returned message on attempting of their zfs destroy runs by hand (these are on
<pool>/ix-applications/<docker>/<dockerimagelayerfilesystem>
), and multiples are even removable (these are on
<pool>/ix-applications/release/<appname>/<numberedversion>
subdataset), both of which are snaps taken by docker/kubernetes for apps in the environment. The latter of which with their contents refering to some specific versions of the app deployments (historically) in use on the machine/server (I've had more than 50 release snaps for some of the apps deployed) - these are not in use per se, but are "recipes/snaps" for specific application deployment rollbacks as used in
Installed Applications -> App -> Rollback functionality, if I recall/understand correctly. I'm not sure which docker/kubernetes mechanism manages this, but over time even only a handful of running apps will grow the number of these 2 types of snaps to over a hundred or easily over 500. Sure it's still a reduction thanks to the daily docker prune runs compared to multiples of thousands before, but these are hundreds of snaps not taken by user managed/deployed automatas, nor are they related to/pruned by aforementioned commands for cron jobs but instead these are snaps taken for the Kubernetes/docker environment by those application tools.
Perhaps in future TrueNAS Scale dev team will offer some intelligent cleaning utility to take care of these in a smart manner automatically or give an option to trim creation of these in app deployment forms. Sure, they're not taking a lot space on the storage per se thanks to the snapshotting nature, but these are still hundreds of snapshots that should be better managed by the system.