展示 HN:用于虚拟机 (VM) 的 Docker Compose
Show HN: Holos – QEMU/KVM with a compose-style YAML, GPUs and health checks

原始链接: https://github.com/zeroecco/holos

## Holos:简化的KVM管理 Holos简化了在单个主机上运行多虚拟机堆栈的操作,使用KVM,提供类似Docker Compose的体验,*无需* libvirt、XML配置或分布式控制平面的复杂性。它将虚拟机视为基本单元,为每个虚拟机提供专用的内核、磁盘叠加(qcow2)和cloud-init种子。 您在`holos.yaml`文件中定义堆栈,指定带有镜像、资源分配(vCPU、内存)、依赖项、端口映射和cloud-init配置以进行自定义的服务(虚拟机)。`holos up`、`down`、`ps`、`start`、`stop`、`console`和`exec`等命令管理堆栈生命周期。 主要功能包括:通过专用内部网络实现自动网络连接、持久化卷、用于管理服务依赖项的健康检查、通过自动生成的密钥进行SSH访问,以及能够直通PCI设备,如GPU。Holos还支持从Dockerfile构建镜像,并与systemd集成以实现跨重启的持久运行。 **重要的是,Holos *不是* Kubernetes。** 它专注于简化单主机KVM的使用,避免了编排和集群的复杂性。

## Holos:一个简化的虚拟机运行时 一个名为 Holos 的新项目旨在通过在 QEMU/KVM 之上直接提供 Compose 风格的 YAML 配置,来简化单主机虚拟机管理。它的创建者将其构建为 libvirt XML 和 Vagrant 的替代方案,专注于易用性和现代功能。 主要特性包括简化的 GPU 直通、通过 SSH 控制虚拟机依赖关系的健康检查、虚拟机之间无需 root 权限即可实现的直接 L2 网络,以及通过 cloud-init 或 Dockerfile 进行配置。 值得注意的是,Holos *不是* Kubernetes 的替代品——它仅设计用于单主机设置,缺乏集群和实时迁移功能。它目前是一个正在真实硬件上测试的原型,开发者正在寻求反馈。用户正在询问它与现有工具(如 virt-manager)的兼容性以及与 Proxmox 的潜在集成。
相关文章

原文

Docker compose for KVM. Define multi-VM stacks in a single YAML file. No libvirt, no XML, no distributed control plane.

The primitive is a VM, not a container. Every workload instance gets its own kernel boundary, its own qcow2 overlay, and its own cloud-init seed.

Write a holos.yaml:

name: my-stack

services:
  db:
    image: ubuntu:noble
    vm:
      vcpu: 2
      memory_mb: 1024
    cloud_init:
      packages:
        - postgresql
      runcmd:
        - systemctl enable postgresql
        - systemctl start postgresql

  web:
    image: ubuntu:noble
    replicas: 2
    depends_on:
      - db
    ports:
      - "8080:80"
    volumes:
      - ./www:/srv/www:ro
    cloud_init:
      packages:
        - nginx
      write_files:
        - path: /etc/nginx/sites-enabled/default
          content: |
            server {
                listen 80;
                location / { proxy_pass http://db:5432; }
            }
      runcmd:
        - systemctl restart nginx

Bring it up:

That's it. Two nginx VMs and a postgres VM, all on the same host, all talking to each other by name.

holos up [-f holos.yaml]             start all services
holos down [-f holos.yaml]           stop and remove all services
holos ps                             list running projects
holos start [-f holos.yaml] [svc]    start a stopped service or all services
holos stop [-f holos.yaml] [svc]     stop a service or all services
holos console [-f holos.yaml] <inst> attach serial console to an instance
holos exec [-f holos.yaml] <inst> [cmd...]
                                     ssh into an instance (project's generated key)
holos logs [-f holos.yaml] <svc>     show service logs
holos validate [-f holos.yaml]       validate compose file
holos pull <image>                   pull a cloud image (e.g. alpine, ubuntu:noble)
holos images                         list available images
holos devices [--gpu]                list PCI devices and IOMMU groups
holos install [-f holos.yaml] [--system] [--enable]
                                     emit a systemd unit so the project survives reboot
holos uninstall [-f holos.yaml] [--system]
                                     remove the systemd unit written by `holos install`

The holos.yaml format is deliberately similar to docker-compose:

  • services - each service is a VM with its own image, resources, and cloud-init config
  • depends_on - services start in dependency order
  • ports - "host:guest" syntax, auto-incremented across replicas
  • volumes - "./source:/target:ro" for bind mounts, "name:/target" for top-level named volumes
  • replicas - run N instances of a service
  • cloud_init - packages, write_files, runcmd -- standard cloud-init
  • stop_grace_period - how long to wait for ACPI shutdown before SIGTERM/SIGKILL (e.g. "30s", "2m"); defaults to 30s
  • healthcheck - test, interval, retries, start_period, timeout to gate dependents
  • top-level volumes block - declare named data volumes that persist across holos down

holos stop and holos down send QMP system_powerdown to the guest (equivalent to pressing the power button), then wait up to stop_grace_period for QEMU to exit on its own. If the guest doesn't halt in time — or QMP is unreachable — the runtime falls back to SIGTERM, then SIGKILL, matching docker-compose semantics.

services:
  db:
    image: ubuntu:noble
    stop_grace_period: 60s    # flush DB buffers before hard stop

Top-level volumes: declares named data stores that live under state_dir/volumes/<project>/<name>.qcow2 and are symlinked into each instance's work directory. They survive holos down — tearing down a project only removes the symlink, never the backing file.

name: demo
services:
  db:
    image: ubuntu:noble
    volumes:
      - pgdata:/var/lib/postgresql

volumes:
  pgdata:
    size: 20G

Volumes attach as virtio-blk devices with a stable serial=vol-<name>, so inside the guest they appear as /dev/disk/by-id/virtio-vol-pgdata. Cloud-init runs an idempotent mkfs.ext4 + /etc/fstab snippet on first boot so there's nothing to configure by hand.

Healthchecks and depends_on

A service with a healthcheck blocks its dependents from starting until the check passes. The probe runs via SSH (same key holos exec uses):

services:
  db:
    image: postgres-cloud.qcow2
    healthcheck:
      test: ["pg_isready", "-U", "postgres"]
      interval: 2s
      retries: 30
      start_period: 10s
      timeout: 3s
  api:
    image: api.qcow2
    depends_on: [db]     # waits for db to be healthy

test: accepts either a list (exec form) or a string (wrapped in sh -c). Set HOLOS_HEALTH_BYPASS=1 to skip the actual probe — handy for CI environments without in-guest SSHD.

Every holos up auto-generates a per-project SSH keypair under state_dir/ssh/<project>/ and injects the public key via cloud-init. A host port is allocated for each instance and forwarded to guest port 22, so you can:

holos exec web-0                 # interactive shell
holos exec db-0 -- pg_isready    # one-off command

-u <user> overrides the login user (defaults to the service's cloud_init.user, or ubuntu).

Emit a systemd unit so a project comes back up after the host reboots:

holos install --enable           # per-user, no sudo needed
holos install --system --enable  # host-wide, before any login
holos install --dry-run          # print the unit and exit

User units land under ~/.config/systemd/user/holos-<project>.service; system units under /etc/systemd/system/. holos uninstall reverses it (and is idempotent — safe to call twice).

Every service can reach every other service by name. Under the hood:

  • Each VM gets two NICs: user-mode (for host port forwarding) and socket multicast (for inter-VM L2)
  • Static IPs are assigned automatically on the internal 10.10.0.0/24 segment
  • /etc/hosts is populated via cloud-init so db, web-0, web-1 all resolve
  • No libvirt. No bridge configuration. No root required for inter-VM networking.

Pass physical GPUs (or any PCI device) directly to a VM via VFIO:

services:
  ml:
    image: ubuntu:noble
    vm:
      vcpu: 8
      memory_mb: 16384
    devices:
      - pci: "01:00.0"       # GPU
      - pci: "01:00.1"       # GPU audio
    ports:
      - "8888:8888"

What holos handles:

  • UEFI boot is enabled automatically when devices are present (OVMF firmware)
  • kernel-irqchip=on is set on the machine for NVIDIA compatibility
  • Per-instance OVMF_VARS copy so each VM has its own EFI variable store
  • Optional rom_file for custom VBIOS ROMs

What you handle (host setup):

  • Enable IOMMU in BIOS and kernel (intel_iommu=on or amd_iommu=on)
  • Bind the GPU to vfio-pci driver
  • Run holos devices --gpu to find PCI addresses and IOMMU groups

Use pre-built cloud images instead of building your own:

services:
  web:
    image: alpine           # auto-pulled and cached
  api:
    image: ubuntu:noble     # specific tag
  db:
    image: debian           # defaults to debian:12

Available: alpine, arch, debian, ubuntu, fedora. Run holos images to see all tags.

Use a Dockerfile to provision a VM. RUN, COPY, ENV, and WORKDIR instructions are converted into a shell script that runs via cloud-init:

services:
  api:
    dockerfile: ./Dockerfile
    ports:
      - "3000:3000"
FROM ubuntu:noble

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y nodejs npm
COPY server.js /opt/app/
WORKDIR /opt/app
RUN npm init -y && npm install express

When image is omitted, the base image is taken from the Dockerfile's FROM line. The Dockerfile's instructions run before any cloud_init.runcmd entries.

Supported: FROM, RUN, COPY, ENV, WORKDIR. Unsupported instructions (CMD, ENTRYPOINT, EXPOSE, etc.) are silently skipped. COPY sources are resolved relative to the Dockerfile's directory and must be files, not directories — use volumes for directory mounts.

Pass arbitrary flags straight to qemu-system-x86_64 with extra_args:

services:
  gpu:
    image: ubuntu:noble
    vm:
      vcpu: 4
      memory_mb: 8192
      extra_args:
        - "-device"
        - "virtio-gpu-pci"
        - "-display"
        - "egl-headless"

Arguments are appended after all holos-managed flags. No validation -- you own it.

Field Default
replicas 1
vm.vcpu 1
vm.memory_mb 512
vm.machine q35
vm.cpu_model host
cloud_init.user ubuntu
image_format inferred from extension
go build -o bin/holos ./cmd/holos

Build a guest image (requires mkosi):

  • /dev/kvm
  • qemu-system-x86_64
  • qemu-img
  • One of cloud-localds, genisoimage, mkisofs, or xorriso
  • mkosi (only for building the base image)

This is not Kubernetes. It does not try to solve:

  • Multi-host clustering
  • Live migration
  • Service meshes
  • Overlay networks
  • Scheduler, CRDs, or control plane quorum

The goal is to make KVM workable for single-host stacks without importing the operational shape of Kubernetes.

联系我们 contact @ memedata.com