Use NixOS and Colmena for infrastructure management
Context
The application runs on a multi-node Hetzner cluster. Initial deployments used a custom deploy.nix script, but coordinating deploys across the cluster — ordering, error handling, partial failures — was getting unwieldy.
Decision
NixOS for all cluster nodes with Colmena for coordinated remote deployment. Colmena builds configurations locally and pushes closures to nodes, handling the multi-node coordination that the custom script couldn't.
OpenTofu (Terraform fork) provisions Hetzner resources (nodes, networks, DNS). Node addresses are generated into nodes.json and consumed by the Colmena hive definition.
Each node gets the same base modules with node-specific overrides where needed. Deployment: just deploy from infra/.
Consequences
Server configuration is reviewable in PRs. Rebuilding a node from scratch produces an identical system. Multi-node deploys are coordinated rather than ad-hoc.
The cost is NixOS's learning curve. Debugging module interactions can be opaque. But "the server is defined in code" eliminates the class of problems where something was changed on a node and nobody knows what.