I migrated 12 services from Docker Swarm to a Talos Linux Kubernetes cluster in a single day with zero downtime, same NFS data, direct BGP routing. Here is exactly how, command by command.
Why Move
Portainer published a technical advisory about Docker v29 breaking Swarm. Swarm is soft-abandoned. My setup: 8 VMs (3 managers + 5 workers), 32 vCPU, 93 GB RAM, 45 services on Proxmox VE with NFS on TrueNAS and HAProxy (BGP anycast) as frontend.
The Target Stack
OS: Talos Linux v1.12.6 (immutable, no SSH)
K8s: v1.35.2 (native to Talos)
CNI: Cilium 1.17.3 (eBPF, replaces kube-proxy)
LB: MetalLB 0.15.3 (BGP mode, ASN 65020)
Storage: NFS CSI -> TrueNAS (same data, zero migration)
Ingress: HAProxy (external, TLS termination, BGP ASN 65010)
GitOps: Flux v2.8.3 -> self-hosted GitLab
Mgmt: Portainer Business + talosctl
BGP: UDM (ASN 65000) peers with HAProxy (65010) + K8s (65020)
All MetalLB IPs routable from any VLAN
Step 1: Create VLAN 10
Dedicated VLAN for K8s isolation. Created via Unifi UDM API:
# Unifi API - create VLAN 10 network
POST /proxy/network/api/s/default/rest/networkconf
{
"name": "K8s",
"purpose": "corporate",
"vlan": 10,
"vlan_enabled": true,
"ip_subnet": "10.10.0.1/24",
"dhcpd_enabled": false,
"networkgroup": "LAN"
}All 3 Proxmox hosts already had VLAN-aware bridges (vmbr0). No change needed.
Step 2: Provision VMs on Proxmox
Download Talos ISO and create 6 VMs via the Proxmox API:
# Download ISO to shared NFS storage
POST /nodes/prox01/storage/ISOs/download-url
url: https://github.com/siderolabs/talos/releases/download/v1.12.6/metal-amd64.iso
filename: talos-v1.12.6-amd64.iso
# Create each VM (example: master01)
POST /nodes/prox01/qemu
vmid: 130
name: k8s-master01
machine: q35
bios: ovmf
efidisk0: local-lvm:1,efitype=4m,pre-enrolled-keys=0
cores: 2
memory: 4096
cpu: host
net0: virtio,bridge=vmbr0,tag=10
scsi0: local-lvm:40
scsihw: virtio-scsi-single
cdrom: ISOs:iso/talos-v1.12.6-amd64.iso
boot: order=scsi0;ide26 VMs total: 3 masters (2 vCPU/4GB/40GB) + 3 workers (4 vCPU/16GB/80GB). One master + one worker per Proxmox host.
Plus a k8s-admin VM (Ubuntu 24.04 cloud-init, dual-homed on VLAN 10 + LAN) for running talosctl, kubectl, and helm.

Step 3: Bootstrap Talos
Talos has no SSH. Everything is declarative YAML applied via talosctl.
Generate cluster secrets
talosctl gen secrets --output-file secrets.yamlCreate per-node network patches
Each node gets a YAML patch with its static IP (no hostname -- causes conflicts with Talos v1.12):
# patch-master01.yaml
machine:
network:
interfaces:
- interface: eth0
addresses:
- 10.10.0.10/24
routes:
- network: 0.0.0.0/0
gateway: 10.10.0.1
nameservers:
- 10.10.0.1
- 1.1.1.1Generate per-node configs with shared secrets
# For each control plane node:
talosctl gen config k8s-homelab https://10.10.0.10:6443 \
--output-dir ./master01 \
--output-types controlplane \
--with-docs=false --with-examples=false \
--with-secrets secrets.yaml \
--config-patch-control-plane @patch-master01.yaml
# For each worker:
talosctl gen config k8s-homelab https://10.10.0.10:6443 \
--output-dir ./worker01 \
--output-types worker \
--with-docs=false --with-examples=false \
--with-secrets secrets.yaml \
--config-patch-worker @patch-worker01.yamlEnable DHCP temporarily and apply configs
Talos boots in maintenance mode and needs a network config. Enable DHCP on VLAN 10 temporarily, then apply via the DHCP IPs:
# Apply controlplane config to master01 (via its temp DHCP IP)
talosctl apply-config --insecure \
--nodes 10.10.0.107 \
--file master01/controlplane.yaml
# Repeat for all 6 nodes...
# Bootstrap the cluster on master01
talosctl bootstrap \
--talosconfig ./talosconfig \
-e 10.10.0.107 -n 10.10.0.107
# Get kubeconfig
talosctl kubeconfig ./kubeconfig \
--talosconfig ./talosconfig \
-e 10.10.0.107 -n 10.10.0.107Result: 6 nodes Ready, Kubernetes v1.35.2, all running Talos v1.12.6.
Step 4: Install Cilium CNI
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium --version 1.17.3 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=true \
--set securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
--set securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
--set cgroup.autoMount.enabled=false \
--set cgroup.hostRoot=/sys/fs/cgroup \
--set k8sServiceHost=localhost \
--set k8sServicePort=744514 Cilium pods (agent + envoy + operator on each node). kube-proxy fully replaced.

Step 5: MetalLB with BGP
MetalLB assigns stable IPs to K8s services. We use BGP mode so the UDM router advertises these IPs across all VLANs.
helm repo add metallb https://metallb.github.io/metallb
helm install metallb metallb/metallb \
--namespace metallb-system --create-namespaceFix PodSecurity (required for Talos)
Talos enforces strict PodSecurity by default. MetalLB speakers need privileged access:
kubectl label namespace metallb-system \
pod-security.kubernetes.io/enforce=privileged \
pod-security.kubernetes.io/audit=privileged \
pod-security.kubernetes.io/warn=privileged --overwrite
kubectl -n metallb-system rollout restart daemonset metallb-speakerRemove control-plane exclusion
for node in $(kubectl get nodes -o name); do
kubectl label $node node.kubernetes.io/exclude-from-external-load-balancers-
doneConfigure BGP peering
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: k8s-services
namespace: metallb-system
spec:
addresses:
- 10.10.0.40-10.10.0.69
---
apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
name: udm
namespace: metallb-system
spec:
myASN: 65020
peerASN: 65000
peerAddress: 10.10.0.1
---
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
name: bgp-default
namespace: metallb-system
spec:
ipAddressPools:
- k8s-servicesUDM BGP config (FRR)
router bgp 65000
neighbor 10.10.0.105 remote-as 65020
neighbor 10.10.0.105 description k8s-master03
neighbor 10.10.0.107 remote-as 65020
neighbor 10.10.0.107 description k8s-master01
neighbor 10.10.0.108 remote-as 65020
neighbor 10.10.0.108 description k8s-worker01
neighbor 10.10.0.109 remote-as 65020
neighbor 10.10.0.109 description k8s-master02
neighbor 10.10.0.116 remote-as 65020
neighbor 10.10.0.116 description k8s-worker02
neighbor 10.10.0.117 remote-as 65020
neighbor 10.10.0.117 description k8s-worker03
address-family ipv4 unicast
maximum-paths 6Result: 8 BGP sessions established. All MetalLB IPs routable from any VLAN. HAProxy talks directly to pods -- no proxies.
Step 6: NFS CSI Driver
The data does not move. TrueNAS keeps serving NFS. We just change how it is mounted.
helm install nfs-provisioner \
nfs-subdir/nfs-subdir-external-provisioner \
--namespace kube-system \
--set nfs.server=10.0.0.30 \
--set nfs.path=/mnt/DATA/swarmnfs \
--set storageClass.name=nfs-truenas \
--set storageClass.defaultClass=true \
--set storageClass.reclaimPolicy=RetainFor existing Swarm data, static PVs point to the exact same NFS paths:
apiVersion: v1
kind: PersistentVolume
metadata:
name: twenty-data
spec:
capacity:
storage: 10Gi
accessModes: [ReadWriteMany]
nfs:
server: 10.0.0.30
path: /mnt/DATA/swarmnfs/twenty
persistentVolumeReclaimPolicy: RetainImportant: add 10.10.0.0/24 to the NFS share allowed networks on TrueNAS.
Step 7: Flux GitOps
curl -s https://fluxcd.io/install.sh | bash
GITLAB_TOKEN=xxx flux bootstrap gitlab \
--hostname=gitlab.example.com \
--owner=Stephane \
--repository=k8s-infrastructure \
--branch=main \
--path=flux \
--token-auth --personalPush to GitLab = auto-deploy to the cluster. Every manifest versioned. Rollback = git revert.
Step 8: Migrate Services
For each service, the pattern is identical:
- Get config from Portainer API (image, env, NFS mounts)
- Create K8s manifests: Namespace (privileged label) + PV/PVC + Deployment (enableServiceLinks: false) + Service (MetalLB annotation)
- Verify pod Running and MetalLB IP responds
- HAProxy: put docker servers in maint, add k8s server
- Verify via domain name
- Stop Swarm stack via Portainer API
- Push manifests to GitLab (Flux applies)
HAProxy runtime commands
# Put old Swarm backends in maintenance
echo 'set server BACKEND/docker01 state maint' | socat stdio TCP:127.0.0.1:9999
echo 'set server BACKEND/docker02 state maint' | socat stdio TCP:127.0.0.1:9999
echo 'set server BACKEND/docker03 state maint' | socat stdio TCP:127.0.0.1:9999
# Add K8s backend (direct MetalLB IP via BGP)
echo 'add server BACKEND/k8s METALLB_IP:PORT check' | socat stdio TCP:127.0.0.1:9999
echo 'set server BACKEND/k8s state ready' | socat stdio TCP:127.0.0.1:9999
echo 'enable health BACKEND/k8s' | socat stdio TCP:127.0.0.1:9999Important: use 'state maint' not 'state drain'. Drain still accepts health check traffic and HAProxy may route to it.

Step 9: Portainer Management UI
Getting Portainer to work on Talos K8s turned into a 2-hour rabbit hole. If you're reading this, save yourself the pain and skip straight to the Helm chart.
The TLS Trap
My first attempt was the manual route: a Deployment for the server, a DaemonSet for the agent, separate Services, proper RBAC. The pods came up fine. Portainer even detected it was running on Kubernetes. But it couldn't talk to its own agent.
Here's the thing nobody tells you: the Portainer agent generates a self-signed TLS certificate that's only valid for localhost and its own pod IP (e.g. 10.244.5.151). When the server tries to reach the agent through a Service DNS name or ClusterIP, the hostname doesn't match the cert. Connection refused. Every. Single. Time.
I tried TLSSkipVerify (ignored for agent connections), --tlsskipverify CLI flag (only for Docker), AGENT_SECRET (doesn't change TLS), agent as sidecar (self-signed CA rejected), kubeconfig upload (undocumented multipart field). After 2 hours and 15+ attempts: Portainer EE 2.39 hardcodes TLS verification for agent connections with no bypass.
The Solution
helm repo add portainer https://portainer.github.io/k8s/
helm install portainer portainer/portainer -n portainer --create-namespace \
--set enterpriseEdition.enabled=true \
--set service.type=LoadBalancerThe Helm chart handles TLS trust between server and agent automatically. Within 90 seconds, Portainer detected the local Kubernetes cluster (current_platform=Kubernetes) and the environment appeared as local with full access. Authentik SSO was configured via the existing OAuth2 provider (pk=8), with redirect URI updated to https://portainer.homelab.local:9443.
Step 10: InfluxDB Migration
InfluxDB 2.8.0 was the last service on Swarm. The concern was BoltDB file locking on NFS. With Swarm otherwise empty, it was time to try.
# Stop on Swarm
docker service rm influxdb
# Deploy on K8s with the same NFS PVC
# NFS path: 10.0.0.30:/mnt/DATA/swarmnfs/influxdb
# Result: 54 shards loaded successfully
# BoltDB on NFS works fine
# MetalLB IP: 10.10.0.57Instead of updating every client, I replaced the AdGuard DNS rewrite for swarm.homelab.local, removing the 3 old Swarm IPs (.182, .198, .233) and pointing to the K8s MetalLB IP. Docker Swarm is now officially empty.
Step 11: Prometheus Data Recovery
During the initial K8s deployment, Prometheus used emptyDir for storage. All 55 GB of historical metrics sat on the old NFS volume while the new instance had only 18 hours of data.
# Old TSDB on NFS: 10.0.0.30:/mnt/DATA/swarmnfs/prometheus/data (55 GB)
kubectl patch deploy prometheus -n monitoring --type=json \
-p='[{"op":"replace","path":"/spec/template/spec/volumes/1",
"value":{"name":"data","persistentVolumeClaim":{"claimName":"prometheus-data"}}}]'
# WAL replay: 2.3 seconds. All historical metrics recovered.Prometheus warns about NFS_SUPER_MAGIC but with only 13 scrape targets, performance is fine. Also fixed: kube-state-metrics broken ServiceAccount and node-exporter control-plane toleration for all 6 nodes.
Step 12: HAProxy Persistence
The most critical post-migration task. All 18 K8s backend servers existed in memory only. A HAProxy restart would lose everything.
# Data Plane API v3 transactions on BOTH haproxy01 (.210) and haproxy02 (.211)
API="http://127.0.0.1:5555/v3/services/haproxy"
VERSION=$(curl -s -u admin:PASSWORD $API/configuration/version)
TXN=$(curl -s -u admin:PASSWORD -X POST "$API/transactions?version=$VERSION" | jq -r .id)
curl -s -u admin:PASSWORD -X POST \
"$API/configuration/backends/authentik_backend/servers?transaction_id=$TXN" \
-H "Content-Type: application/json" \
-d '{"name":"k8s","address":"10.10.0.41","port":9000,"check":"enabled","inter":20000}'
curl -s -u admin:PASSWORD -X PUT "$API/transactions/$TXN"54 Docker servers removed. traefik_backend deleted (careful ordering: use_backend rule, deny rule, ACL, backend). Health checks enabled (inter 20s). tmpfiles.d for boot persistence.
Step 13: Flux Image Automation
flux install --components-extra=image-reflector-controller,image-automation-controller \
--export | kubectl apply -f -
# 21 ImageRepository (scan 6h), 11 ImagePolicy (semver)
# Telegram Alert for notificationsNo auto-update. Flux detects new version, Telegram notification, manual approval. Weave GitOps (v4.0.36) provides a visual dashboard at http://flux.homelab.local:9001.

Step 14: Monitoring Stack
- Prometheus: 13 active targets, 55 GB TSDB on NFS, full historical data from the Swarm era
- Grafana: custom Kubernetes Cluster Overview dashboard with CPU, memory, disk, network per node, deployments table, pod restarts
- Uptime Kuma: 24 monitors in k8s-talos group: 6 node pings, K8s API, 15 HTTP checks. Created via uptime-kuma-api Python library
- kube-state-metrics + node-exporter: DaemonSet on all 6 nodes with control-plane toleration
Step 15: GitLab Repository Restructure
Refactored from bundled JSON List manifests in wave directories to 1 directory per app with proper multi-document YAML. 23 app directories. Flux reconciled without issues.
apps/
authentik/ chromadb/ drawio/ ghost/
homepage/ influxdb/ karakeep/ kms/
linkstack/ monitoring/ n8n/ navidrome/
nightscout/ open-webui/ outline/ owncloud/
pulse/ searxng/ teslamate/
twenty/ uptime-kuma/Final Results
- 28 services running on Talos K8s (6 nodes)
- Docker Swarm: decommissioned, zero services remaining
- Zero downtime during the entire 2-day migration
- BGP direct routing via MetalLB (ASN 65020) to UDM (ASN 65000)
- HAProxy fully persisted. 18 backends with health checks
- GitOps pipeline: GitLab + Flux + Telegram notifications
- Full monitoring: Prometheus (55 GB) + Grafana + Uptime Kuma + Portainer
What I'd Do Differently
- Use Helm charts first for complex apps (Portainer).
- Persist HAProxy from day one. Runtime-only changes are a ticking time bomb.
- Set up Prometheus with PVC immediately. emptyDir for metrics = data loss.
- Use sealed-secrets. Plain text credentials in GitLab is the next fix.
The entire migration took 2 days. Total cluster: 3 control planes (2 vCPU, 4 GB) + 3 workers (4 vCPU, 16 GB) on Proxmox, running Talos Linux v1.12.6 with Cilium CNI and MetalLB BGP. From 8 Docker Swarm nodes to 6 Talos nodes, cleaner, more resilient, and properly managed with GitOps.