Remote Mode | Voicebox

Overview

Remote Mode allows you to run the Voicebox backend on a separate machine (like a GPU server) while using the desktop app on your local machine.

Use Cases

No local GPU - Use a cloud GPU or remote workstation
Faster generation - Leverage powerful remote hardware
Shared infrastructure - Multiple users connect to one server
Laptop workflows - Keep your laptop cool and battery-efficient

Architecture

In Remote Mode, the Voicebox desktop app (running on your local machine) communicates with the backend server (running on a remote machine) via HTTP. The local app provides only the user interface, while the remote server handles all the heavy processing including the TTS models, API endpoints, and audio generation.

Setting Up Remote Mode

On the Server

Install Dependencies

# Clone the repo git clone https://github.com/jamiepine/voicebox.git cd voicebox/backend Install Python dependencies pip install -r requirements.txt Engines with incompatible transitive pins — install with --no-deps pip install --no-deps chatterbox-tts pip install --no-deps hume-tada Qwen3-TTS from source

pip install git+https://github.com/QwenLM/Qwen3-TTS.git

Or just run just setup from the repo root, which handles all of this.

Start the Server

# Allow external connections
uvicorn main:app --host 0.0.0.0 --port 17493

This exposes the server to your network. Use a firewall or VPN for security.

Open Firewall

# Ubuntu/Debian sudo ufw allow 17493 Or use your cloud provider's firewall settings

On the Client

Open Settings

In Voicebox, go to Settings → Server

Enable Remote Mode

Toggle Use Remote Server

Enter Server URL

http://<server-ip>:17493

Replace <server-ip> with your server's IP address

Test Connection

Click Test Connection to verify

Cloud Deployment

AWS EC2

# Launch a GPU instance (e.g., g4dn.xlarge)
# Install dependencies
# Start server with --host 0.0.0.0

Vast.ai

# Rent a GPU instance
# SSH in and clone repo
# Start server

RunPod

# Deploy a pod with CUDA support
# Install Voicebox backend
# Expose port 17493

Security Considerations

The API currently has no authentication. Only use on trusted networks or with a VPN.

Best Practices:

Use a VPN (WireGuard, Tailscale) instead of exposing to the internet
Run behind a reverse proxy with authentication (nginx + basic auth)
Use HTTPS with SSL certificates
Firewall rules to limit access to specific IPs

Performance

Expected performance on various GPUs:

GPU	Generation Speed
RTX 4090	~2-3s per 10 words
RTX 3090	~3-4s per 10 words
RTX 3060	~5-7s per 10 words
CPU (12-core)	~20-30s per 10 words

A GPU with 8GB+ VRAM is recommended for best performance.

Troubleshooting

See the Troubleshooting Guide for common issues.