Overview
A voice profile is a saved voice you can reuse across generations, stories, and the API. As of 0.4, Voicebox profiles come in two flavors that map to two different ways of getting a voice:
| Profile type | What it stores | Use when… |
|---|---|---|
| Cloned | One or more reference audio samples + a voice embedding | You want to replicate a specific person's voice |
| Preset | A reference to a pre-built voice in a specific engine | You want a curated, production-ready voice with no audio prep |
Both types live in the same Profiles tab and behave the same way at generation time — pick the type that matches your goal and follow the workflow below.
Workflow A — Cloned Profiles
Use this when you want to replicate a specific person's voice from a recording.
Prepare Audio
10-30 seconds of clear speech, minimal background noise. See Voice Cloning for the engine catalog.
Create Profile
Profiles → + New Profile → choose a cloning engine (Qwen3-TTS, Chatterbox Multilingual, Chatterbox Turbo, LuxTTS, or TADA)
Upload or Record Sample
Drag in an audio file, or record directly with the in-app recorder
Generate to Test
Use the profile to generate a test phrase. If quality is poor, add more samples
Audio Requirements (Cloning Only)
10-30 seconds
Too short: Poor quality Too long: Unnecessary
Clear speech
No background noise No music or overlapping voices
High fidelity
44.1 kHz or 48 kHz sample rate Minimal compression
Natural speech
Conversational tone Complete sentences
File Formats
Supported formats:
- WAV (recommended) — Lossless quality
- MP3 — Acceptable, minimal compression
- M4A — Acceptable
- FLAC — Lossless alternative
Recording Tips
Quiet Space
- Record in a quiet room
- Turn off fans, AC, appliances
- Close windows to reduce outside noise
- Use soft furnishings to reduce echo
Microphone Placement
- 6-12 inches from mouth
- Slight angle to reduce plosives (p, b, t)
- Use a pop filter if available
- Maintain consistent distance
Recording Settings
- 44.1 kHz or 48 kHz sample rate
- 16-bit or 24-bit depth
- Mono is fine (stereo will be converted)
- Avoid automatic gain control
Speaking Style
- Natural pace — Don't rush or speak too slowly
- Clear articulation — Pronounce words clearly
- Consistent volume — Maintain steady loudness
- Normal tone — Speak as you normally would
- Complete sentences — Avoid fragments or "ums"
Multiple Samples
Adding multiple samples can significantly improve quality:
Model learns a more complete representation
Handles different speaking styles better
Reduces artifacts and improves naturalness
More reliable across different texts
Consider adding samples with:
- Different tones — casual, formal, excited, calm
- Different content — narratives, questions, statements
- Different recording conditions — studio quality, room acoustics
Processing Existing Audio
If you have existing audio (podcasts, videos, etc.):
Find Clean Speech
Look for segments with just the target speaker, no background music, minimal noise
Use Audio Editor
Tools like Audacity or Adobe Audition: cut clean 10-30s segments, remove silence at start/end, normalize volume
Export as WAV
Save as high-quality WAV file
For light background noise, use Audacity's noise reduction (gentle settings — over-processing introduces artifacts).
Testing & Iteration
After creating a cloned profile:
Generate Test
Try a simple phrase: "Hello, this is a test of my voice profile."
Evaluate Quality
Listen for natural tone, clear pronunciation, proper prosody, lack of artifacts
Iterate
If quality is poor: add more samples, try different source audio, check sample quality
Common Issues
Robotic Voice
Cause: Poor quality samples or too short
Fix: Use longer, higher-quality samples
Wrong Tone
Cause: Sample tone doesn't match desired output
Fix: Record samples in the style you want to generate
Artifacts/Glitches
Cause: Background noise or audio issues in samples
Fix: Clean up samples or re-record in quieter environment
Workflow B — Preset Profiles
Use this when you want a ready-made voice without recording anything. Available engines: Kokoro 82M (50 voices) and Qwen CustomVoice (9 voices). See Preset Voices for the full catalog.
Create Profile
Profiles → + New Profile → choose Kokoro or Qwen CustomVoice as the engine
Pick a Voice
The engine's voice catalog appears. Click any voice to preview it
Name and Save
Give the profile a name. No audio sample required
Generate
The profile is ready immediately — use it in the floating generate box or Generate page
Qwen CustomVoice + Instruct
Preset voices in Qwen CustomVoice support delivery instructions — natural-language style control over tone, pace, and emotion. The floating generate box shows a slider icon next to the generate button when a Qwen CustomVoice profile is selected; click it to reveal the instruct textarea.
See Preset Voices → Using Instruct Mode for examples.
Advanced Tips
Celebrity / Character Voices (Cloning)
For cloning public figures or characters:
- Legal considerations — Ensure you have rights or it's clearly fair use
- Source quality — Find high-quality interview audio or clean clips
- Consistency — Use clips where they speak similarly
- Multiple samples — Very important for recognizable voices
Accent & Dialect (Cloning)
Cloning models preserve accent and dialect:
- British English samples generate British English output
- Southern accent samples produce Southern accent output
- Regional pronunciations are maintained
Emotion Transfer (Cloning)
The emotional tone of samples affects generation:
- Energetic samples → energetic output
- Calm samples → calm output
- Mix samples for a more versatile profile
For Qwen CustomVoice presets, use the instruct field instead of relying on sample emotion — that's exactly what it controls.
Managing Profiles
Organization
- Descriptive names — "John Smith - Professional Narrator"
- Add descriptions — Note recording conditions, use cases, or which preset voice
- Language tags — Mark the primary language
- Archive unused — Keep profile list manageable
Export / Import
- Export profiles to share or backup
- Import from colleagues or teammates
- Cloned profiles export with their voice embeddings (not the original audio)
- Preset profiles export as engine + voice ID metadata only — the importer must have that engine's model installed
Next Steps
Engine catalog and best practices for cloning
Full catalog of Kokoro and Qwen CustomVoice voices
Use your profile to generate speech
Create multi-voice narratives