Quick Start

This guide will walk you through creating your first voice profile and generating speech.

Prerequisites

Make sure you have installed Voicebox and launched the app.

Step 1: Create a Voice Profile

Voice profiles are the foundation of Voicebox. Each profile contains voice samples that the AI uses to clone the voice.

Navigate to Profiles

Click the Profiles tab in the sidebar

Create New Profile

Click the + New Profile button

Fill in the details:

  • Name: A descriptive name (e.g., "John Smith")
  • Language: Select the primary language
  • Description: Optional notes about the voice

Add Voice Sample

You have two options:

Option A: Upload Audio

  • Click Upload Sample
  • Select an audio file (WAV, MP3, or M4A)
  • Ideal length: 10-30 seconds of clear speech

Option B: Record Live

  • Click Record Sample
  • Speak clearly for 10-30 seconds
  • Click stop when finished

Save Profile

Click Create Profile to save

For best results, use clean audio with minimal background noise and consistent speaking tone.

Step 2: Generate Speech

Now let's use your new voice profile to generate speech.

Go to Generation

Click the Generate tab in the sidebar

Select Voice Profile

Choose your newly created profile from the dropdown

Enter Text

Type or paste the text you want to generate:

Hello! This is my first voice generation with Voicebox.
Paralinguistic tags like `[laugh]`, `[sigh]`, and `[gasp]` only work with **Chatterbox Turbo**. Qwen3-TTS, LuxTTS, Chatterbox Multilingual, and HumeAI TADA will read those tags literally instead of turning them into expressive sounds.

To insert supported tags, select Chatterbox Turbo and type / in the text input to open the tag inserter.

Generate

Click Generate and wait a few seconds

First generation may take longer due to model initialization. Subsequent generations will be faster.

Play & Download

  • Click Play to preview the audio
  • Click Download to save the audio file
  • The generation is also saved to your History

Step 3: Build a Story (Optional)

The Stories Editor lets you create multi-voice narratives with a timeline-based interface.

Create New Story

Navigate to Stories and click + New Story

Add Voice Tracks

Click + Add Track to create tracks for different speakers

Add Audio Clips

  • Drag generated audio from your History
  • Or generate new clips directly in the timeline
  • Arrange clips on the timeline

Edit & Export

  • Trim clips by dragging edges
  • Adjust timing and spacing
  • Click Export to render the final audio

What's Next?

Remote Mode

Connect to a GPU server for faster generation

Tips for Success

Getting the Best Voice Quality
  • Use 10-30 seconds of clear, consistent speech
  • Avoid background noise and echo
  • Multiple samples from the same speaker improve quality
  • Match the speaking style you want to generate
Improving Generation Speed
  • Use a CUDA-capable GPU for 5-10x faster generation
  • Enable voice prompt caching for repeated generations
  • Consider running the backend on a remote GPU server
Troubleshooting Common Issues
  • Server won't start: Check if port 17493 is available
  • Poor audio quality: Try adding more voice samples
  • Slow generation: Verify GPU acceleration is enabled
  • See the full Troubleshooting Guide for more