Published: 2025-11-15, Revised: 2025-11-15

TL;DR Running apt upgrade on hosts with rootless Docker services can break them. A version mismatch occurs between the running user daemon and the newly upgraded system binaries, causing containers to fail on restart. This post provides an ansible playbook that detects critical package changes and automatically restarts only the necessary rootless user daemons, preventing downtime and manual intervention.
Info
This playbook is the result of a deep-dive into a specific failure mode of the rootless Docker architecture. For context on the initial setup, please see my previous posts on setting up rootless Docker for services like Mastodon.
Motivation
As detailed in my rootless Docker setup guide, this architecture provides pretty good security isolation. However, it has a vulnerability: When system packages provided by docker-ce-rootless-extras (like containerd and its shims) are upgraded via apt, the running user-level Docker daemons become outdated.
This leads to a version mismatch. When a container is restarted, the old daemon tries to use the new on-disk shim, which causes a fatal error (e.g. unsupported shim version (3): not implemented). The solution is to restart the user's Docker daemon after every critical update, which is a perfect task for automation via ansible.
Ansible Playbook
This playbook automates the entire update and remediation process. It's designed to be run daily via a cron job, staying silent unless it needs to take action.
The Playbook: apt.yaml
Click to view
---
- hosts: ubuntu, debian
become: yes
become_method: sudo
vars:
ansible_pipelining: true
ansible_ssh_common_args: '-o ControlMaster=auto -o ControlPersist=60s'
# This forces Ansible to use /tmp for its temporary files, avoiding
# any permission issues when becoming a non-root user.
ansible_remote_tmp: /tmp
# critical package that require a docker restart
critical_docker_packages:
- docker-ce
- docker-ce-cli
- containerd.io
- docker-ce-rootless-extras
- docker-buildx-plugin
- docker-compose-plugin
- systemd
tasks:
- name: Ensure en_US.UTF-8 locale is present on target hosts
ansible.builtin.locale_gen:
name: en_US.UTF-8
state: present
# We only run this once per host, not for every user
run_once: true
- name: "Update cache & Full system update"
ansible.builtin.apt:
update_cache: true
upgrade: dist
cache_valid_time: 3600
force_apt_get: true
autoremove: true
autoclean: true
environment:
NEEDRESTART_MODE: automatically
register: apt_result
changed_when: "'0 upgraded, 0 newly installed, 0 to remove' not in apt_result.stdout"
no_log: true
- name: Report on any packages that were kept back
ansible.builtin.debug:
msg: "WARNING: Packages were kept back on {{ inventory_hostname }}. Manual review may be needed. Output: {{ apt_result.stdout }}"
when:
- apt_result.changed
- "'packages have been kept back' in apt_result.stdout"
- name: Find all users with lingering enabled
ansible.builtin.find:
paths: /var/lib/systemd/linger
file_type: file
register: lingered_users_find
when:
- apt_result.changed
# This checks if any of the critical package names appear in the apt output
- critical_docker_packages | select('in', apt_result.stdout) | list | length > 0
no_log: true
- name: Create a list of lingered usernames
ansible.builtin.set_fact:
lingered_usernames: "{{ lingered_users_find.files | map(attribute='path') | map('basename') | list }}"
when: lingered_users_find.matched is defined and lingered_users_find.matched > 0
no_log: true
- name: Check for existence of rootless Docker service for each user
ansible.builtin.systemd:
name: docker
scope: user
become: true
become_user: "{{ item }}"
loop: "{{ lingered_usernames }}"
when: lingered_usernames is defined and lingered_usernames | length > 0
register: service_checks
ignore_errors: true
no_log: true
- name: Identify which services were actually found
ansible.builtin.set_fact:
restart_list: "{{ service_checks.results | selectattr('status.LoadState', 'defined') | selectattr('status.LoadState', '!=', 'not-found') | map(attribute='item') | list }}"
when: lingered_usernames is defined and lingered_usernames | length > 0
no_log: true
- name: Restart existing rootless Docker daemons
ansible.builtin.systemd:
name: docker
state: restarted
scope: user
become: true
become_user: "{{ item }}"
loop: "{{ restart_list }}"
when: restart_list is defined and restart_list | length > 0
register: restart_results
changed_when: false
no_log: true
- name: Report on restarted services
ansible.builtin.debug:
msg: "Successfully restarted rootless Docker daemon for user '{{ item.item }}'."
loop: "{{ restart_results.results }}"
when:
- restart_list is defined and restart_list | length > 0
- item.changed
- name: Check if reboot required
ansible.builtin.stat:
path: /var/run/reboot-required
register: reboot_required_file
- name: Reboot if required
ansible.builtin.reboot:
when: reboot_required_file.stat.exists == true
Configuration: ansible.cfg
For the clean output, update your ansible.cfg:
[defaults]
stdout_callback = yaml
display_skipped_hosts = no
display_ok_hosts = no
Summary of tasks
The playbook is designed to be relatively intelligent and robust:
- Update packages: Runs a full
apt dist-upgrade. - Check for change: Determine if any packages actually changed.
- Check for criticality: If changes occurred, check if any of the packages are in the
critical_docker_packageslist. - Find targets: Only if a critical package update happened, proceed to find users who have rootless services enabled via
systemd-linger. - Act selectively: Checks which of those users actually run a
docker.serviceand restart only those specific daemons. - Short report: The script is silent by default. It only prints a one-line report for each daemon it restarted or if it detects that
apthas kept critical packages back.
Result
On a day with no relevant changes, the output is minimal:
PLAY RECAP *********************************************************************
docker_services : ok=3 changed=0 unreachable=0 failed=0 skipped=8 rescued=0 ignored=0
hass_iotdocker : ok=4 changed=0 unreachable=0 failed=0 skipped=8 rescued=0 ignored=0
...
On a another day when a Docker package update happens, it reports on docker daemon restarts:
TASK [Report on restarted services] ********************************************
ok: [docker_services] => (item=...) => {
"msg": "Successfully restarted rootless Docker daemon for user 'mastodon'."
}
...
This is a small automation, but it prevents a critical vulnerability (that hit me recently..).
Ansible workflow
For security and ease of management, I run Ansible from a dedicated management VM in my local network. This host contains the Ansible installation, the playbook files, and the inventory of servers to manage.
This setup allows me to trigger multi-host updates with a single command from anywhere.
Inventory
The heart of the setup is the inventory file (inventories/hosts), which tells Ansible which servers to target.
# /srv/ansible/inventories/hosts
[ubuntu]
# Local network VMs
hass_iotdocker ansible_host=192.168.60.15
nextcloud ansible_host=192.168.40.22
node_gitlab ansible_host=192.168.70.11
docker_services ansible_host=192.168.40.81
# A public cloud VM that requires a specific user to connect
aws_vm ansible_host=130.61.20.105 ansible_user=ubuntu
[debian]
iot_influx ansible_host=192.168.60.34
# The management host targets itself to stay updated
management ansible_host=localhost
[all:vars]
# Explicitly set the python interpreter for compatibility
ansible_python_interpreter=/usr/bin/python3
Authentication via SSH Agent Forwarding
One important part of this workflow is authentication. I use SSH Agent Forwarding instead of storing my private keys on the management host.
By connecting to the management host with ssh -A, I allow Ansible to securely use my local machine's SSH agent keys for the duration of the connection. Of course, this means that you must make sure that your management host is properly protected and isolated.
Running the Playbook
To make this a daily one-liner, I use an alias in my local machine's shell configuration (~/.bashrc or ~/.zshrc):
alias daily='ssh -A [email protected] "sh /srv/ansible/update.sh"'
The update.sh script on the management host is a simple wrapper that ensures the playbook runs from the correct directory:
#!/bin/sh
# Purpose: CD into local directory of script
# and run Ansible playbook with local configuration
SCRIPT=$(readlink -f "$0")
SCRIPTPATH=$(dirname "$SCRIPT")
cd "$SCRIPTPATH" || exit
/root/.local/bin/ansible-playbook -vv "apt.yaml"
This setup means I can type daily in my local terminal, and my entire fleet of VMs (local and public) will be updated.
Here is a simple diagram of the flow:
+--------------+ +--------------------+ +-----------------+
| | | | | |
| Laptop |----->| Management Host |----->| Target VMs |
| (SSH Agent) | | (Ansible + Playb.) | |(nextcloud, etc.)|
+--------------+ +--------------------+ +-----------------+
| |
| ssh -A alex@... | ansible-playbook ...
+---------------------+
Changelog
2025-11-15