vmkit-agent Reference
vmkit-agent is a Go binary that runs as a systemd daemon on every VM provisioned by VMKit. It maintains a persistent outbound WebSocket connection to the VMKit control plane and executes RPC commands sent down that connection.
The agent is the exclusive channel between the VMKit backend and your VM. The backend never opens inbound connections to the VM’s IP.
Installation
Automatic (standard path). Cloud-init runs on first boot and handles the full installation: Docker CE, the vmkit-agent binary, and the systemd unit. No manual step is required.
Manual (advanced). See Manual Installation below.
WebSocket Protocol
| Property | Value |
|---|---|
| Endpoint | wss://gateway.vmkit.dev/api/daemon |
| Auth | Authorization: Bearer <agent-token> header on the initial upgrade request |
| Token source | Written to /etc/vmkit/agent.env by cloud-init at provision time |
| Direction | Pull model — agent calls out; backend never initiates inbound connections |
| Reconnect | Exponential backoff with jitter; indefinite retry |
| Heartbeat | Agent sends a ping frame every 30 seconds; backend expects a pong within 10 seconds |
The agent token is a workspace-scoped secret generated by the backend at provision time. It is distinct from user vmk_ API keys and cannot be used against the REST API.
Daemon Handlers
These are the RPC commands the backend can send over the WebSocket connection.
| Handler | Trigger | Inputs | Outputs |
|---|---|---|---|
register | Agent startup | version: string, instance_id: string, ip: string, os: string, arch: string | { "ok": true } |
logs_fetch | Dashboard log view, MCP get_logs, GET /api/deploys/{id}/logs | container: string, lines: integer (default 100, max 5000) | { "lines": [{ "ts": "<RFC3339>", "text": "<string>" }] } |
metrics_collect | Backend polling (every 60 s) | (none) | See Health Reporting |
agent_upgrade | Backend-initiated upgrade | version: string, download_url: string, checksum_sha256: string | { "status": "upgrading" } — agent restarts; backend detects reconnect |
Handler details
register — sent automatically by the agent on every WebSocket connection (initial and reconnects). The backend uses this to mark the instance as agent_connected: true and record the agent version.
logs_fetch — the agent calls docker logs --tail <lines> <container> against the Docker daemon on the VM and streams the output back as newline-delimited JSON. Requires the Docker socket to be accessible to the vmkit-agent process (configured during hardening).
metrics_collect — the agent reads /proc/meminfo, /proc/stat, df, and docker ps to collect the metrics payload. See Health Reporting.
agent_upgrade — the agent downloads the binary from download_url, verifies SHA-256 against checksum_sha256, atomically replaces /usr/local/bin/vmkit-agent, and issues systemctl restart vmkit-agent. The backend waits for the reconnect register call to confirm the upgrade succeeded.
Health Reporting
The agent pushes a health snapshot to the backend after every metrics_collect call. The backend stores the latest snapshot per instance and exposes it at GET /api/instances/{id}.
| Metric | Source | Units |
|---|---|---|
cpu_pct | /proc/stat (1-second sample) | Percentage (0–100) |
mem_pct | /proc/meminfo (MemUsed / MemTotal) | Percentage (0–100) |
disk_pct | df / (used / total) | Percentage (0–100) |
container_count | docker ps -q | wc -l | Integer |
container_health | docker inspect --format '{{.State.Health.Status}}' for each container | Map of container_name → "healthy" | "unhealthy" | "starting" | "none" |
A missing heartbeat (no pong response within 10 seconds) marks the instance as agent_connected: false in the backend and triggers a dashboard warning. The next successful reconnect clears the warning.
Self-Upgrade
- Backend calls
agent_upgradewithversion,download_url, andchecksum_sha256. - Agent downloads the binary to
/tmp/vmkit-agent-<version>. - Agent verifies SHA-256. Aborts and returns an error if the checksum does not match.
- Agent copies the new binary to
/usr/local/bin/vmkit-agent(atomic rename). - Agent calls
systemctl restart vmkit-agent(systemd restarts it as a new process). - The new process starts, connects to the WebSocket, and sends
registerwith the newversion. - Backend records the updated
agent_versionon the instance.
The old process is terminated by systemd as part of the restart. There is a brief (typically < 5 s) window where the agent is not connected.
Manual Installation
Use this if you are provisioning a VM outside the normal VMKit flow, or recovering a VM whose cloud-init failed.
cloud-init snippet
#cloud-config
packages:
- docker.io
- curl
runcmd:
- mkdir -p /etc/vmkit
- echo "VMKIT_AGENT_TOKEN=<your-agent-token>" > /etc/vmkit/agent.env
- echo "VMKIT_GATEWAY_URL=wss://gateway.vmkit.dev/api/daemon" >> /etc/vmkit/agent.env
- curl -fsSL https://github.com/vmkit/vmkit-agent/releases/latest/download/vmkit-agent-linux-amd64 \
-o /usr/local/bin/vmkit-agent
- chmod +x /usr/local/bin/vmkit-agent
- systemctl enable vmkit-agent
- systemctl start vmkit-agentReplace <your-agent-token> with the token from your environment’s compute target configuration. Contact support if you need to retrieve this token for an existing environment.
systemd service unit
/etc/systemd/system/vmkit-agent.service:
[Unit]
Description=vmkit-agent daemon
After=network-online.target docker.service
Wants=network-online.target
Requires=docker.service
[Service]
Type=simple
EnvironmentFile=/etc/vmkit/agent.env
ExecStart=/usr/local/bin/vmkit-agent
Restart=always
RestartSec=5s
StandardOutput=journal
StandardError=journal
SyslogIdentifier=vmkit-agent
[Install]
WantedBy=multi-user.targetAfter placing this file, run:
systemctl daemon-reload
systemctl enable vmkit-agent
systemctl start vmkit-agentLogs
Agent logs are written to the systemd journal under the vmkit-agent identifier.
# Follow live
journalctl -u vmkit-agent -f
# Last 200 lines
journalctl -u vmkit-agent -n 200
# Since last boot
journalctl -u vmkit-agent -bLog level is INFO by default. Set VMKIT_LOG_LEVEL=debug in /etc/vmkit/agent.env and restart the service for verbose output.
Troubleshooting
Agent not connecting
Symptom: Dashboard shows agent_connected: false for the instance. GET /api/instances/{id} returns "agent_connected": false.
Checklist:
- Verify outbound port 443 is allowed. The agent only needs outbound TCP 443 — no inbound ports.
- Check the agent token is correct:
grep VMKIT_AGENT_TOKEN /etc/vmkit/agent.env. - Check agent logs:
journalctl -u vmkit-agent -n 50. - Confirm the service is running:
systemctl status vmkit-agent. - Test DNS resolution from the VM:
curl -I https://gateway.vmkit.dev/health.
The VMKit control plane does not need to reach your VM’s IP. If you can confirm outbound 443 is open from the VM, connectivity issues are almost always a bad agent token or a misconfigured VMKIT_GATEWAY_URL.
Agent not reporting metrics
Symptom: metrics in GET /api/instances/{id} is null or stale (> 5 minutes old).
Checklist:
- Verify the
vmkit-agentprocess user has access to the Docker socket:ls -la /var/run/docker.sock # Should show docker group ownership; vmkit-agent must be in the docker group - If the socket permissions are wrong, re-run hardening by calling
POST /api/instances/{id}/hardenfrom the dashboard, or calling theharden_vmMCP tool with the instance ID. - Check agent logs for
permission deniedon/var/run/docker.sock.
Agent upgrade stuck
Symptom: agent_version in GET /api/instances/{id} shows the old version after an upgrade was issued.
Checklist:
- Check agent logs around the time of the upgrade:
journalctl -u vmkit-agent --since "10 minutes ago". - Look for checksum mismatch errors — this aborts the upgrade and leaves the old binary in place.
- If systemd failed to restart:
systemctl status vmkit-agentandjournalctl -u vmkit-agent -n 20. - To force-upgrade manually: download the binary, verify the checksum, and restart the service.