Quick Start | Voicebox

This guide will walk you through creating your first voice profile and generating speech.

Prerequisites

Make sure you have installed Voicebox and launched the app.

Step 1: Create a Voice Profile

Voice profiles are the foundation of Voicebox. Each profile contains voice samples that the AI uses to clone the voice.

Navigate to Profiles

Click the Profiles tab in the sidebar

Create New Profile

Click the + New Profile button

Fill in the details:

Name: A descriptive name (e.g., "John Smith")
Language: Select the primary language
Description: Optional notes about the voice

Add Voice Sample

You have two options:

Option A: Upload Audio

Click Upload Sample
Select an audio file (WAV, MP3, or M4A)
Ideal length: 10-30 seconds of clear speech

Option B: Record Live

Click Record Sample
Speak clearly for 10-30 seconds
Click stop when finished

Save Profile

Click Create Profile to save

For best results, use clean audio with minimal background noise and consistent speaking tone.

Step 2: Generate Speech

Now let's use your new voice profile to generate speech.

Go to Generation

Click the Generate tab in the sidebar

Select Voice Profile

Choose your newly created profile from the dropdown

Enter Text

Type or paste the text you want to generate:

Hello! This is my first voice generation with Voicebox.

Paralinguistic tags like `[laugh]`, `[sigh]`, and `[gasp]` only work with **Chatterbox Turbo**. Qwen3-TTS, LuxTTS, Chatterbox Multilingual, and HumeAI TADA will read those tags literally instead of turning them into expressive sounds.

To insert supported tags, select Chatterbox Turbo and type / in the text input to open the tag inserter.

Generate

Click Generate and wait a few seconds

First generation may take longer due to model initialization. Subsequent generations will be faster.

Play & Download

Click Play to preview the audio
Click Download to save the audio file
The generation is also saved to your History

Step 3: Build a Story (Optional)

The Stories Editor lets you create multi-voice narratives with a timeline-based interface.

Create New Story

Navigate to Stories and click + New Story

Add Voice Tracks

Click + Add Track to create tracks for different speakers

Add Audio Clips

Drag generated audio from your History
Or generate new clips directly in the timeline
Arrange clips on the timeline

Edit & Export

Trim clips by dragging edges
Adjust timing and spacing
Click Export to render the final audio

What's Next?

Voice Cloning Guide

Learn advanced techniques for high-quality voice cloning

API Integration

Integrate Voicebox into your own applications

Stories Editor

Master the multi-track timeline editor

Remote Mode

Connect to a GPU server for faster generation

Tips for Success

Getting the Best Voice Quality

Use 10-30 seconds of clear, consistent speech
Avoid background noise and echo
Multiple samples from the same speaker improve quality
Match the speaking style you want to generate

Improving Generation Speed

Use a CUDA-capable GPU for 5-10x faster generation
Enable voice prompt caching for repeated generations
Consider running the backend on a remote GPU server

Troubleshooting Common Issues

Server won't start: Check if port 17493 is available
Poor audio quality: Try adding more voice samples
Slow generation: Verify GPU acceleration is enabled
See the full Troubleshooting Guide for more