Remote Mode

Overview

Remote Mode allows you to run the Voicebox backend on a separate machine (like a GPU server) while using the desktop app on your local machine.

Use Cases

  • No local GPU - Use a cloud GPU or remote workstation
  • Faster generation - Leverage powerful remote hardware
  • Shared infrastructure - Multiple users connect to one server
  • Laptop workflows - Keep your laptop cool and battery-efficient

Architecture

In Remote Mode, the Voicebox desktop app (running on your local machine) communicates with the backend server (running on a remote machine) via HTTP. The local app provides only the user interface, while the remote server handles all the heavy processing including the TTS models, API endpoints, and audio generation.

Setting Up Remote Mode

On the Server

Install Dependencies

# Clone the repo
git clone https://github.com/jamiepine/voicebox.git
cd voicebox/backend

Install Python dependencies

pip install -r requirements.txt

Engines with incompatible transitive pins — install with --no-deps

pip install --no-deps chatterbox-tts pip install --no-deps hume-tada

Qwen3-TTS from source

pip install git+https://github.com/QwenLM/Qwen3-TTS.git

Or just run just setup from the repo root, which handles all of this.

Start the Server

# Allow external connections
uvicorn main:app --host 0.0.0.0 --port 17493
This exposes the server to your network. Use a firewall or VPN for security.

Open Firewall

# Ubuntu/Debian
sudo ufw allow 17493

Or use your cloud provider's firewall settings

On the Client

Open Settings

In Voicebox, go to Settings → Server

Enable Remote Mode

Toggle Use Remote Server

Enter Server URL

http://<server-ip>:17493

Replace <server-ip> with your server's IP address

Test Connection

Click Test Connection to verify

Cloud Deployment

AWS EC2

# Launch a GPU instance (e.g., g4dn.xlarge)
# Install dependencies
# Start server with --host 0.0.0.0

Vast.ai

# Rent a GPU instance
# SSH in and clone repo
# Start server

RunPod

# Deploy a pod with CUDA support
# Install Voicebox backend
# Expose port 17493

Security Considerations

The API currently has no authentication. Only use on trusted networks or with a VPN.

Best Practices:

  • Use a VPN (WireGuard, Tailscale) instead of exposing to the internet
  • Run behind a reverse proxy with authentication (nginx + basic auth)
  • Use HTTPS with SSL certificates
  • Firewall rules to limit access to specific IPs

Performance

Expected performance on various GPUs:

GPU Generation Speed
RTX 4090 ~2-3s per 10 words
RTX 3090 ~3-4s per 10 words
RTX 3060 ~5-7s per 10 words
CPU (12-core) ~20-30s per 10 words
A GPU with 8GB+ VRAM is recommended for best performance.

Troubleshooting

See the Troubleshooting Guide for common issues.