Overview
Voicebox auto-detects available accelerators on first launch and picks the fastest backend it can use. For most people this just works — open the app and you're already on the right backend.
This page is for the cases where it doesn't:
- You have a GPU but Voicebox is running on CPU
- You upgraded GPUs (especially to RTX 50-series / Blackwell) and generation broke
- You want to switch backends manually (e.g. force MLX over PyTorch on Apple Silicon)
- You see
[UNSUPPORTED - see logs]next to your GPU in Settings
Backend Matrix
| Platform | Auto-selected backend | Notes |
|---|---|---|
| macOS Apple Silicon | MLX (Metal) | 4-5x faster than PyTorch via Apple Neural Engine |
| macOS Intel | PyTorch CPU | No GPU acceleration available; PyTorch ≥ 2.2 only |
| Windows + NVIDIA | PyTorch CUDA (cu128) | Auto-downloads the CUDA backend binary on first use |
| Windows + Intel Arc | PyTorch XPU (IPEX) | New in 0.4 — works with Arc A-series and B-series |
| Windows generic GPU | DirectML | Universal Windows GPU support; slower than CUDA |
| Linux + NVIDIA | PyTorch CUDA (cu128) | Same auto-download flow as Windows |
| Linux + AMD | PyTorch ROCm | Auto-configures HSA_OVERRIDE_GFX_VERSION |
| Linux + Intel Arc | PyTorch XPU (IPEX) | |
| Any (no GPU) | PyTorch CPU | Works everywhere; expect 5-50x slower than GPU |
The detected backend is shown in Settings → GPU. Logs at startup also print the chosen backend and the device name.
Apple Silicon — MLX vs PyTorch
On M-series Macs, Voicebox ships an MLX-optimized backend that uses the Apple Neural Engine. It's 4-5x faster than the PyTorch (CPU/Metal) path for supported engines.
| Engine | MLX support | Notes |
|---|---|---|
| Qwen3-TTS | ✅ Native | Uses MLX exclusively when available |
| Chatterbox / Turbo | PyTorch MPS | Falls back to Metal via PyTorch |
| LuxTTS | PyTorch MPS | |
| TADA | PyTorch MPS | |
| Kokoro | PyTorch MPS | Requires PYTORCH_ENABLE_MPS_FALLBACK=1 |
| Qwen CustomVoice | PyTorch MPS | |
| Whisper (transcribe) | ✅ Native | MLX-Whisper is the default on Apple Silicon |
The Whisper Turbo + MLX combo dropped transcription latency from ~20s to ~2-3s on M-series chips (see CHANGELOG entry for v0.1.10).
Windows / Linux + NVIDIA — The CUDA Backend Swap
Voicebox doesn't bundle CUDA into the main installer (it would balloon downloads to multi-gigabyte territory for users who don't have an NVIDIA GPU). Instead, when you first need it, the app downloads a separate CUDA backend binary that contains the PyTorch + CUDA runtime.
Open Settings → GPU
If an NVIDIA GPU is detected, you'll see "Install CUDA backend" in the GPU panel
Click Install
The app downloads two archives separately:
- Server core (~200-400 MB) — versioned with each Voicebox release
- CUDA libs (~4 GB) — the heavy PyTorch + CUDA DLLs, versioned independently
Restart
Voicebox restarts to swap in the CUDA backend
Auto-update
When a new Voicebox release ships, the GPU panel checks if the bundled server-core matches the installed CUDA version. If only the core changed (typical), it pulls the new core in the background. If the libs version changed (rare — only happens on cu126 → cu128 type bumps), you'll be prompted to confirm the larger download.
RTX 50-series / Blackwell
Voicebox 0.4 added explicit RTX 50-series support:
- CUDA toolkit upgraded to cu128 (previous releases used cu126 which lacks Blackwell kernels)
- Build pinned with
TORCH_CUDA_ARCH_LIST=...12.0+PTXfor forward-compatibility
If you're on an RTX 5070 / 5080 / 5090 and you see "no kernel image is available" errors:
- Make sure you're on Voicebox ≥ 0.4.0 (Settings → About)
- Reinstall the CUDA backend (Settings → GPU → Reinstall CUDA backend) — older installs may have stale cu126 libs
- If errors persist, see the GPU compatibility warnings section below
Intel Arc (XPU)
New in 0.4. Works with both Arc A-series (Alchemist: A380, A580, A750, A770) and B-series (Battlemage).
Setup
Voicebox auto-detects Arc GPUs and routes through Intel's PyTorch XPU backend (powered by IPEX — Intel Extension for PyTorch). No extra installation step beyond the standard Voicebox install.
Verify it's working:
- Settings → GPU should show XPU followed by your Arc model name (e.g.
XPU (Intel Arc A770)) - Startup logs print
Backend: PYTORCHandGPU: XPU (Intel Arc ...)
Engines on XPU
All PyTorch-based engines work on XPU. Performance is generally between CPU and CUDA — expect ~2-3x speedup over CPU for the larger models.
DirectML
The fallback for Windows users with non-NVIDIA, non-Intel-Arc GPUs (older AMD discrete, integrated GPUs, etc.). Slower than CUDA and XPU but provides some acceleration over CPU.
Auto-selected when no other GPU backend is available.
AMD ROCm (Linux)
ROCm provides PyTorch GPU acceleration on AMD discrete GPUs. Voicebox auto-configures HSA_OVERRIDE_GFX_VERSION for common cards that need the override.
Verifying
# In a terminal
echo $HSA_OVERRIDE_GFX_VERSION
# Should show e.g. 10.3.0 for RX 6000 series
If detection fails, set the variable manually before launching Voicebox:
export HSA_OVERRIDE_GFX_VERSION=10.3.0
voicebox
Common values:
10.3.0— RX 6000 series (RDNA 2)11.0.0— RX 7000 series (RDNA 3)9.0.0— Older Vega cards
GPU Compatibility Warnings
Voicebox 0.4 added a runtime check that compares your GPU's compute capability against the architectures the bundled PyTorch was compiled for. If they don't match, you'll see:
- A startup log line:
WARNING: GPU COMPATIBILITY: <your GPU> is not supported by this PyTorch build... - The GPU label in Settings shows
[UNSUPPORTED - see logs] - The
/healthAPI returns a populatedgpu_compatibility_warningfield
What to do
The most common trigger is a brand-new GPU architecture that pre-built PyTorch wheels don't yet cover natively. In order of preference:
- Update Voicebox — newer releases ship newer PyTorch with broader arch support
- Reinstall the CUDA backend — Settings → GPU → Reinstall CUDA backend
- For bleeding-edge GPUs (newer than current Blackwell): install PyTorch nightly manually:
pip install torch --index-url https://download.pytorch.org/whl/nightly/cu128 --force-reinstall
Then point Voicebox at that environment via Remote Mode until stable PyTorch catches up.
4. Fall back to CPU temporarily — set VOICEBOX_FORCE_CPU=1 before launching
CPU-Only Fallback
When no GPU is available (or you've forced it off), Voicebox runs the PyTorch CPU backend. Expect:
- 5-50x slower generation depending on engine and text length
- Heavy CPU usage during generation
- Some engines work better than others on CPU:
- Kokoro 82M — runs at realtime on modern CPUs
- LuxTTS — exceeds 150x realtime on CPU
- Chatterbox Turbo (350M) — usable but slow
- Larger models (Qwen 1.7B, Chatterbox Multilingual, TADA 3B) — painful
For CPU-bound use cases, prefer the smaller, lighter engines.
Verifying Your Setup
Three places to check that the right backend is being used:
Settings → GPU
Shows the detected backend, GPU model, and VRAM (when applicable). Look for the [UNSUPPORTED - see logs] suffix
Settings → Logs
The "Server logs" tab shows the startup banner with Backend: <type> and GPU: <name>
Health endpoint
curl http://localhost:17493/health returns a JSON payload with backend_type, backend_variant, and gpu_compatibility_warning (when applicable)
Troubleshooting
Settings shows CPU instead of my GPU
- On NVIDIA: install the CUDA backend (Settings → GPU)
- On Intel Arc: confirm IPEX detection in startup logs; restart the app after a driver update
- On AMD Linux: check
HSA_OVERRIDE_GFX_VERSIONis set
'no kernel image is available' / 'CUDA error'
Almost always means the bundled PyTorch doesn't have kernels for your GPU's compute capability.
- Update to Voicebox ≥ 0.4.0 (Blackwell support added there)
- Reinstall the CUDA backend
- If still broken, install PyTorch nightly via Remote Mode
Out of memory (CUDA)
- Switch to a smaller model size (e.g. Qwen3 0.6B instead of 1.7B)
- Use Settings → Models to unload other engines you're not using
- Enable
low_cpu_mem_usageis already on for CPU; for CUDA, the engine'sdevice_maphandles offload automatically - Close other GPU applications
MPS fallback errors on macOS
Some operations don't have a Metal implementation. Voicebox sets PYTORCH_ENABLE_MPS_FALLBACK=1 for engines that need it (notably Kokoro), but if you launch from a custom env, set it manually:
export PYTORCH_ENABLE_MPS_FALLBACK=1
Generation works but is slow on my GPU
- Check Settings → GPU shows your GPU (not CPU)
- Check VRAM usage — you may be paging to system memory
- Try a smaller model
- For NVIDIA: confirm cu128 is installed (Settings → GPU → version)
Next Steps
Run the backend on a different machine with a stronger GPU
Unload models to free GPU memory
General troubleshooting beyond GPU