DOCKER_DEPLOYMENT

Docker Deployment Guide

Status: In Development for v0.2.0 Requested By: Reddit community (thread)

Overview

Docker support makes Voicebox easier to deploy, especially for:

  • Consistent Environments: Same setup across dev/staging/prod
  • GPU Passthrough: Easy NVIDIA/AMD GPU access
  • Server Deployments: Run on headless Linux servers
  • Multi-User Setups: Isolate instances per user/team
  • Cloud Platforms: Deploy to AWS, GCP, Azure, DigitalOcean

Quick Start

Using Pre-Built Images (Recommended)

# CPU-only version
docker run -p 8000:8000 -v voicebox-data:/app/data \
  ghcr.io/jamiepine/voicebox:latest

# NVIDIA GPU version
docker run --gpus all -p 8000:8000 -v voicebox-data:/app/data \
  ghcr.io/jamiepine/voicebox:latest-cuda

# AMD GPU version (experimental)
docker run --device=/dev/kfd --device=/dev/dri -p 8000:8000 \
  -v voicebox-data:/app/data \
  ghcr.io/jamiepine/voicebox:latest-rocm

Then open: http://localhost:8000

Using Docker Compose (Easiest)

Create docker-compose.yml:

version: '3.8'

services:
  voicebox:
image: ghcr.io/jamiepine/voicebox:latest-cuda
ports:
  - "8000:8000"
volumes:
  - voicebox-data:/app/data
  - huggingface-cache:/root/.cache/huggingface
environment:
  - GPU_MEMORY_FRACTION=0.8  # Use 80% of GPU memory
  - TTS_MODE=local
  - WHISPER_MODE=local
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]

volumes:
  voicebox-data:
  huggingface-cache:

Run:

docker compose up -d

Building From Source

Basic Dockerfile

# Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
git \
build-essential \
ffmpeg \
&& rm -rf /var/lib/apt/lists/*

# Copy application
COPY backend/ /app/backend/
COPY requirements.txt /app/

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
RUN pip install --no-cache-dir git+https://github.com/QwenLM/Qwen3-TTS.git

# Create data directory
RUN mkdir -p /app/data

# Expose port
EXPOSE 8000

# Run server
CMD ["uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "8000"]

Build and run:

docker build -t voicebox .
docker run -p 8000:8000 -v $(pwd)/data:/app/data voicebox

Multi-Stage Build (Optimized)

Smaller image size by separating build and runtime:

# Dockerfile.optimized
# Stage 1: Build dependencies
FROM python:3.11-slim AS builder

WORKDIR /build

RUN apt-get update && apt-get install -y \
git build-essential && \
rm -rf /var/lib/apt/lists/*

COPY backend/requirements.txt .
RUN pip install --no-cache-dir --target=/build/packages \
-r requirements.txt

RUN pip install --no-cache-dir --target=/build/packages \
git+https://github.com/QwenLM/Qwen3-TTS.git

# Stage 2: Runtime
FROM python:3.11-slim

WORKDIR /app

# Install only runtime dependencies
RUN apt-get update && apt-get install -y \
ffmpeg \
&& rm -rf /var/lib/apt/lists/*

# Copy installed packages from builder
COPY --from=builder /build/packages /usr/local/lib/python3.11/site-packages/

# Copy application code
COPY backend/ /app/backend/

# Create data directory
RUN mkdir -p /app/data

EXPOSE 8000

CMD ["uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "8000"]

Build:

docker build -f Dockerfile.optimized -t voicebox:slim .

GPU Support

NVIDIA GPUs (CUDA)

Dockerfile:

FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04

# Install Python
RUN apt-get update && apt-get install -y \
python3.11 python3-pip git ffmpeg && \
rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Install PyTorch with CUDA support
COPY backend/requirements.txt .
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install other dependencies
RUN pip3 install -r requirements.txt
RUN pip3 install git+https://github.com/QwenLM/Qwen3-TTS.git

COPY backend/ /app/backend/

EXPOSE 8000
CMD ["uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "8000"]

Run with GPU:

docker run --gpus all -p 8000:8000 \
  -v voicebox-data:/app/data \
  voicebox:cuda

Docker Compose with GPU:

services:
  voicebox:
image: voicebox:cuda
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: all
          capabilities: [gpu]

AMD GPUs (ROCm) - Experimental

Dockerfile:

FROM rocm/dev-ubuntu-22.04:6.0

# Install Python
RUN apt-get update && apt-get install -y \
python3.11 python3-pip git ffmpeg && \
rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Install PyTorch with ROCm support
COPY backend/requirements.txt .
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0

# Install other dependencies
RUN pip3 install -r requirements.txt
RUN pip3 install git+https://github.com/QwenLM/Qwen3-TTS.git

# Set ROCm environment variables
ENV HSA_OVERRIDE_GFX_VERSION=10.3.0
ENV ROCM_PATH=/opt/rocm

COPY backend/ /app/backend/

EXPOSE 8000
CMD ["uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "8000"]

Run with AMD GPU:

docker run --device=/dev/kfd --device=/dev/dri \
  --group-add video --ipc=host --cap-add=SYS_PTRACE \
  --security-opt seccomp=unconfined \
  -p 8000:8000 -v voicebox-data:/app/data \
  voicebox:rocm

Note: ROCm support varies by GPU model. Works best on Linux. See AMD ROCm docs for compatibility.

Volume Mounts

Essential Volumes

docker run -v voicebox-data:/app/data \           # Profiles, generations, history
       -v huggingface-cache:/root/.cache/huggingface \  # Downloaded models
       -p 8000:8000 voicebox

Development Volume Mounts

For development with hot-reload:

docker run -v $(pwd)/backend:/app/backend \       # Live code changes
       -v voicebox-data:/app/data \
       -e RELOAD=true \
       -p 8000:8000 voicebox

Custom Model Storage

Use external model directory:

docker run -v /path/to/models:/models \
       -e MODELS_DIR=/models \
       -v voicebox-data:/app/data \
       -p 8000:8000 voicebox

Environment Variables

Configure Voicebox via environment variables:

docker run -e TTS_MODE=local \
       -e WHISPER_MODE=openai-api \
       -e OPENAI_API_KEY=sk-... \
       -e GPU_MEMORY_FRACTION=0.8 \
       -e LOG_LEVEL=info \
       -p 8000:8000 voicebox

Available Variables

Variable Default Description
TTS_MODE local TTS provider: local, remote
TTS_REMOTE_URL - URL for remote TTS server
WHISPER_MODE local Whisper provider: local, openai-api, remote
WHISPER_REMOTE_URL - URL for remote Whisper server
OPENAI_API_KEY - OpenAI API key (if using OpenAI Whisper)
GPU_MEMORY_FRACTION 0.9 Fraction of GPU memory to use (0.0-1.0)
DATA_DIR /app/data Directory for profiles/generations
MODELS_DIR /app/models Directory for local models
LOG_LEVEL info Logging level: debug, info, warning, error
RELOAD false Enable hot-reload for development

Complete Docker Compose Examples

Production Deployment

# docker-compose.prod.yml
version: '3.8'

services:
  voicebox:
image: ghcr.io/jamiepine/voicebox:latest-cuda
container_name: voicebox
restart: unless-stopped
ports:
  - "8000:8000"
volumes:
  - voicebox-data:/app/data
  - huggingface-cache:/root/.cache/huggingface
environment:
  - TTS_MODE=local
  - WHISPER_MODE=local
  - GPU_MEMORY_FRACTION=0.8
  - LOG_LEVEL=info
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 40s

volumes:
  voicebox-data:
driver: local
  huggingface-cache:
driver: local

Run:

docker compose -f docker-compose.prod.yml up -d

Development Setup

# docker-compose.dev.yml
version: '3.8'

services:
  voicebox:
build:
  context: .
  dockerfile: Dockerfile
ports:
  - "8000:8000"
volumes:
  - ./backend:/app/backend:ro
  - voicebox-data:/app/data
  - huggingface-cache:/root/.cache/huggingface
environment:
  - RELOAD=true
  - LOG_LEVEL=debug
  - TTS_MODE=local
command: uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reload

volumes:
  voicebox-data:
  huggingface-cache:

Multi-Service Stack

Full stack with reverse proxy and monitoring:

# docker-compose.stack.yml
version: '3.8'

services:
  # Main Voicebox app
  voicebox:
image: ghcr.io/jamiepine/voicebox:latest-cuda
restart: unless-stopped
volumes:
  - voicebox-data:/app/data
  - huggingface-cache:/root/.cache/huggingface
environment:
  - TTS_MODE=local
  - WHISPER_MODE=local
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]

  # Nginx reverse proxy
  nginx:
image: nginx:alpine
ports:
  - "80:80"
  - "443:443"
volumes:
  - ./nginx.conf:/etc/nginx/nginx.conf:ro
  - ./ssl:/etc/nginx/ssl:ro
depends_on:
  - voicebox

  # Prometheus monitoring (optional)
  prometheus:
image: prom/prometheus
ports:
  - "9090:9090"
volumes:
  - ./prometheus.yml:/etc/prometheus/prometheus.yml
  - prometheus-data:/prometheus

volumes:
  voicebox-data:
  huggingface-cache:
  prometheus-data:

Cloud Deployment

AWS EC2

  1. Launch GPU Instance (g4dn.xlarge or p3.2xlarge)
  2. Install Docker + nvidia-docker:
   # Amazon Linux 2
   sudo yum install -y docker
   sudo systemctl start docker
   distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
   curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
   curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
 sudo tee /etc/apt/sources.list.d/nvidia-docker.list
   sudo apt-get update && sudo apt-get install -y nvidia-docker2
   sudo systemctl restart docker
  1. Deploy:
   docker run --gpus all -d -p 80:8000 \
 -v voicebox-data:/app/data \
 --restart unless-stopped \
 ghcr.io/jamiepine/voicebox:latest-cuda

DigitalOcean

Use GPU Droplet + Docker:

# Create droplet via CLI
doctl compute droplet create voicebox \
  --size gpu-h100x1-80gb \
  --image ubuntu-22-04-x64 \
  --region nyc3

# SSH and deploy
ssh root@<droplet-ip>
curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh
docker run --gpus all -d -p 80:8000 voicebox:cuda

Google Cloud Run (CPU-only)

# Build and push
docker build -t gcr.io/your-project/voicebox .
docker push gcr.io/your-project/voicebox

# Deploy to Cloud Run
gcloud run deploy voicebox \
  --image gcr.io/your-project/voicebox \
  --platform managed \
  --region us-central1 \
  --memory 4Gi \
  --cpu 2 \
  --port 8000

Fly.io

Create fly.toml:

app = "voicebox"

[build]
  image = "ghcr.io/jamiepine/voicebox:latest"

[[services]]
  http_checks = []
  internal_port = 8000
  protocol = "tcp"

  [[services.ports]]
port = 80
handlers = ["http"]

  [[services.ports]]
port = 443
handlers = ["tls", "http"]

[mounts]
  source = "voicebox_data"
  destination = "/app/data"

Deploy:

fly launch
fly deploy

Troubleshooting

GPU Not Detected

Check NVIDIA Docker:

docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

If this fails, reinstall nvidia-docker2.

Check AMD ROCm:

docker run --rm --device=/dev/kfd --device=/dev/dri rocm/dev-ubuntu-22.04:6.0 rocminfo

Permission Errors

Container can't write to volumes:

# Fix permissions
docker run --user $(id -u):$(id -g) -v $(pwd)/data:/app/data voicebox

Out of Memory

Reduce GPU memory usage:

docker run -e GPU_MEMORY_FRACTION=0.5 voicebox

Or use CPU-only:

docker run -e DEVICE=cpu voicebox

Model Download Fails

Ensure HuggingFace cache is writable:

docker run -v huggingface-cache:/root/.cache/huggingface voicebox

Or use host cache:

docker run -v ~/.cache/huggingface:/root/.cache/huggingface voicebox

Port Already in Use

Change host port:

docker run -p 8080:8000 voicebox  # Use port 8080 instead

Security Best Practices

1. Don't Run as Root

Create non-root user in Dockerfile:

RUN useradd -m -u 1000 voicebox
USER voicebox

2. Use Secrets for API Keys

Don't put API keys in docker-compose.yml:

# Use Docker secrets
echo "sk-your-key" | docker secret create openai_key -

docker service create \
  --secret openai_key \
  -e OPENAI_API_KEY_FILE=/run/secrets/openai_key \
  voicebox

3. Network Isolation

Use internal networks for multi-container setups:

services:
  voicebox:
networks:
  - internal
  nginx:
networks:
  - internal
  - external
ports:
  - "80:80"

networks:
  internal:
internal: true
  external:

4. Resource Limits

Prevent resource exhaustion:

services:
  voicebox:
deploy:
  resources:
    limits:
      cpus: '4'
      memory: 8G
    reservations:
      cpus: '2'
      memory: 4G

Performance Tuning

GPU Memory Management

# Use 80% of GPU (default 90%)
docker run -e GPU_MEMORY_FRACTION=0.8 voicebox

# Allow GPU memory growth (prevents OOM)
docker run -e TF_FORCE_GPU_ALLOW_GROWTH=true voicebox

Model Caching

Pre-download models to volume:

# Download models first
docker run --rm -v huggingface-cache:/root/.cache/huggingface \
  voicebox python -c "
from transformers import WhisperProcessor, WhisperForConditionalGeneration
WhisperProcessor.from_pretrained('openai/whisper-base')
WhisperForConditionalGeneration.from_pretrained('openai/whisper-base')
"

# Then run normally
docker run -v huggingface-cache:/root/.cache/huggingface voicebox

Multi-Worker Setup

Use uvicorn workers for better throughput:

CMD ["uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]

Monitoring

Health Checks

Built-in health endpoint:

curl http://localhost:8000/health

Docker health check:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
  interval: 30s
  timeout: 10s
  retries: 3

Prometheus Metrics

Add metrics exporter:

# backend/main.py
from prometheus_fastapi_instrumentator import Instrumentator

Instrumentator().instrument(app).expose(app)

Then scrape /metrics with Prometheus.

Logs

View container logs:

docker logs -f voicebox

# Or with compose
docker compose logs -f voicebox

Next Steps

  • Publish official images to GitHub Container Registry
  • Add Kubernetes Helm charts
  • Create Docker Desktop extension
  • Add automated vulnerability scanning
  • Support ARM64 builds for Raspberry Pi / Apple Silicon

Contributing

Help improve Docker support:

  1. Test on different platforms (AMD GPU, ARM64, etc.)
  2. Submit Dockerfile optimizations
  3. Share deployment configurations
  4. Report issues: GitHub Issues

Resources