This guide will walk you through creating your first voice profile and generating speech.
Prerequisites
Make sure you have installed Voicebox and launched the app.
Step 1: Create a Voice Profile
Voice profiles are the foundation of Voicebox. Each profile contains voice samples that the AI uses to clone the voice.
Navigate to Profiles
Click the Profiles tab in the sidebar
Create New Profile
Click the + New Profile button
Fill in the details:
- Name: A descriptive name (e.g., "John Smith")
- Language: Select the primary language
- Description: Optional notes about the voice
Add Voice Sample
You have two options:
Option A: Upload Audio
- Click Upload Sample
- Select an audio file (WAV, MP3, or M4A)
- Ideal length: 10-30 seconds of clear speech
Option B: Record Live
- Click Record Sample
- Speak clearly for 10-30 seconds
- Click stop when finished
Save Profile
Click Create Profile to save
Step 2: Generate Speech
Now let's use your new voice profile to generate speech.
Go to Generation
Click the Generate tab in the sidebar
Select Voice Profile
Choose your newly created profile from the dropdown
Enter Text
Type or paste the text you want to generate:
Hello! This is my first voice generation with Voicebox.
To insert supported tags, select Chatterbox Turbo and type / in the
text input to open the tag inserter.
Generate
Click Generate and wait a few seconds
Play & Download
- Click Play to preview the audio
- Click Download to save the audio file
- The generation is also saved to your History
Step 3: Build a Story (Optional)
The Stories Editor lets you create multi-voice narratives with a timeline-based interface.
Create New Story
Navigate to Stories and click + New Story
Add Voice Tracks
Click + Add Track to create tracks for different speakers
Add Audio Clips
- Drag generated audio from your History
- Or generate new clips directly in the timeline
- Arrange clips on the timeline
Edit & Export
- Trim clips by dragging edges
- Adjust timing and spacing
- Click Export to render the final audio
What's Next?
Learn advanced techniques for high-quality voice cloning
Integrate Voicebox into your own applications
Master the multi-track timeline editor
Connect to a GPU server for faster generation
Tips for Success
Getting the Best Voice Quality
- Use 10-30 seconds of clear, consistent speech
- Avoid background noise and echo
- Multiple samples from the same speaker improve quality
- Match the speaking style you want to generate
Improving Generation Speed
- Use a CUDA-capable GPU for 5-10x faster generation
- Enable voice prompt caching for repeated generations
- Consider running the backend on a remote GPU server
Troubleshooting Common Issues
- Server won't start: Check if port 17493 is available
- Poor audio quality: Try adding more voice samples
- Slow generation: Verify GPU acceleration is enabled
- See the full Troubleshooting Guide for more