Overview
Remote Mode allows you to run the Voicebox backend on a separate machine (like a GPU server) while using the desktop app on your local machine.
Use Cases
- No local GPU - Use a cloud GPU or remote workstation
- Faster generation - Leverage powerful remote hardware
- Shared infrastructure - Multiple users connect to one server
- Laptop workflows - Keep your laptop cool and battery-efficient
Architecture
In Remote Mode, the Voicebox desktop app (running on your local machine) communicates with the backend server (running on a remote machine) via HTTP. The local app provides only the user interface, while the remote server handles all the heavy processing including the TTS models, API endpoints, and audio generation.
Setting Up Remote Mode
On the Server
Install Dependencies
# Clone the repo
git clone https://github.com/jamiepine/voicebox.git
cd voicebox/backend
Install Python dependencies
pip install -r requirements.txt
Engines with incompatible transitive pins — install with --no-deps
pip install --no-deps chatterbox-tts
pip install --no-deps hume-tada
Qwen3-TTS from source
pip install git+https://github.com/QwenLM/Qwen3-TTS.git
Or just run just setup from the repo root, which handles all of this.
Start the Server
# Allow external connections
uvicorn main:app --host 0.0.0.0 --port 17493
Open Firewall
# Ubuntu/Debian
sudo ufw allow 17493
Or use your cloud provider's firewall settings
On the Client
Open Settings
In Voicebox, go to Settings → Server
Enable Remote Mode
Toggle Use Remote Server
Enter Server URL
http://<server-ip>:17493
Replace <server-ip> with your server's IP address
Test Connection
Click Test Connection to verify
Cloud Deployment
AWS EC2
# Launch a GPU instance (e.g., g4dn.xlarge)
# Install dependencies
# Start server with --host 0.0.0.0
Vast.ai
# Rent a GPU instance
# SSH in and clone repo
# Start server
RunPod
# Deploy a pod with CUDA support
# Install Voicebox backend
# Expose port 17493
Security Considerations
Best Practices:
- Use a VPN (WireGuard, Tailscale) instead of exposing to the internet
- Run behind a reverse proxy with authentication (nginx + basic auth)
- Use HTTPS with SSL certificates
- Firewall rules to limit access to specific IPs
Performance
Expected performance on various GPUs:
| GPU | Generation Speed |
|---|---|
| RTX 4090 | ~2-3s per 10 words |
| RTX 3090 | ~3-4s per 10 words |
| RTX 3060 | ~5-7s per 10 words |
| CPU (12-core) | ~20-30s per 10 words |
Troubleshooting
See the Troubleshooting Guide for common issues.