Docker Deployment

Overview

Voicebox can run as a Docker container with a full web UI -- no desktop app required. This is ideal for headless servers, shared GPU machines, or self-hosted deployments.

Quick Start

git clone https://github.com/jamiepine/voicebox.git
cd voicebox
docker compose up

Open http://localhost:17493 in your browser. The full Voicebox UI is served directly from the backend.

The first build takes a few minutes (compiling the frontend, installing Python dependencies). Subsequent starts are fast thanks to Docker layer caching.

How It Works

The Docker image uses a 3-stage build:

  1. Frontend -- builds the React SPA with Bun and Vite
  2. Backend -- installs Python dependencies and TTS model packages
  3. Runtime -- combines both into a minimal image running the FastAPI server

The backend serves the web UI automatically when the built frontend is present. All API routes work exactly as they do in the desktop app.

Configuration

docker-compose.yml

The default docker-compose.yml binds to localhost only, mounts persistent volumes for data and model cache, and sets sensible resource limits:

services:
  voicebox:
build: .
container_name: voicebox
restart: unless-stopped
ports:
  - "127.0.0.1:17493:17493"
volumes:
  - ./output:/app/data/generations
  - voicebox-data:/app/data
  - huggingface-cache:/home/voicebox/.cache/huggingface
environment:
  - LOG_LEVEL=info
deploy:
  resources:
    limits:
      cpus: '4'
      memory: 8G

Exposing to Your Network

By default the container only listens on 127.0.0.1. To allow other machines on your network to connect, change the port binding:

ports:
  - "0.0.0.0:17493:17493"
The API has no built-in authentication. Only expose to trusted networks, or put a reverse proxy with auth in front of it.

Environment Variables

Variable Default Description
LOG_LEVEL info Logging verbosity (debug, info, warning, error)
VOICEBOX_MODELS_DIR (HuggingFace cache) Custom path for model storage
VOICEBOX_CORS_ORIGINS (local origins) Additional CORS origins, comma-separated

Resource Limits

The default compose file limits the container to 4 CPUs and 8GB RAM. Adjust these based on your hardware:

deploy:
  resources:
limits:
  cpus: '8'
  memory: 16G
TTS model inference is memory-intensive. 8GB is the minimum for running a single engine. 16GB+ is recommended if you want multiple engines loaded simultaneously.

Volumes

Volume Container Path Purpose
./output /app/data/generations Generated audio files (bind-mount, easy access from host)
voicebox-data /app/data Profiles, database, cache
huggingface-cache /home/voicebox/.cache/huggingface Downloaded models (persists across rebuilds)

The huggingface-cache volume is important -- without it, models would be re-downloaded every time the container is rebuilt.

GPU Acceleration

NVIDIA GPU (CUDA)

To use your NVIDIA GPU inside the container, install the NVIDIA Container Toolkit and add GPU access to your compose file:

services:
  voicebox:
build: .
# ... existing config ...
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]

AMD GPU (ROCm)

For AMD GPUs, use the ROCm runtime:

services:
  voicebox:
build: .
# ... existing config ...
devices:
  - /dev/kfd
  - /dev/dri
group_add:
  - video

CPU Only

The default configuration runs on CPU. This works fine but generation will be slower. LuxTTS is the fastest engine on CPU (150x realtime).

Security

The Docker image follows security best practices:

  • Non-root user -- the server runs as voicebox, not root
  • Localhost binding -- only accessible from the host machine by default
  • Health checks -- automatic restart if the server hangs (/health endpoint polled every 30s)
  • CORS restricted -- only local origins allowed by default

Running Behind a Reverse Proxy

For production deployments, put Voicebox behind nginx or Caddy with TLS and authentication:

server {
listen 443 ssl;
server_name voicebox.example.com;

ssl_certificate /etc/ssl/certs/voicebox.pem;
ssl_certificate_key /etc/ssl/private/voicebox.key;

auth_basic "Voicebox";
auth_basic_user_file /etc/nginx/.htpasswd;

location / {
    proxy_pass http://127.0.0.1:17493;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
}
}

Troubleshooting

Container starts but UI shows JSON

If you see {"message": "voicebox API", ...} instead of the web UI, the frontend build may have failed during the Docker build. Check the build logs:

docker compose build --no-cache

Look for errors in the "Build frontend" stage.

Models downloading on every restart

Make sure the huggingface-cache volume is configured. Without it, the model cache is lost when the container stops:

volumes:
  - huggingface-cache:/home/voicebox/.cache/huggingface

Out of memory

TTS models are large. If the container is killed by the OOM killer, increase the memory limit:

deploy:
  resources:
limits:
  memory: 16G

Port already in use

# Check what's using port 17493
lsof -i :17493

# Or use a different port
ports:
  - "127.0.0.1:8080:17493"

Prebuilt Images (Coming Soon)

We plan to publish prebuilt Docker images to GitHub Container Registry so you won't need to build locally:

# Not available yet — coming in a future release
docker run -p 17493:17493 ghcr.io/jamiepine/voicebox:latest

The CPU image will be 3-4 GB (Python + PyTorch + TTS packages). A separate CUDA tag (6-8 GB) will be available for NVIDIA GPU users. This is normal for ML containers.

For now, use docker compose up to build from source as described above.

Connecting the Desktop App

You can also use the desktop app as a frontend for a Docker-hosted backend. In the desktop app, go to Settings -> Server, enable Remote Mode, and enter http://<server-ip>:17493.

See the Remote Mode guide for details.