Overview
Some Voicebox engines ship with a curated set of pre-built voices. Instead of cloning from your own audio sample, you pick a voice from a fixed catalog and the model speaks in that voice. No recording, no upload, no per-voice training required.
Two engines in 0.4 ship preset voices:
| Engine | Voices | Languages | Strengths |
|---|---|---|---|
| Kokoro 82M | 50 | 9 | Tiny model, CPU-friendly, lowest VRAM of any engine |
| Qwen CustomVoice | 9 (premium curated) | 4 | Natural-language style control over tone, emotion, pace |
When to Use Preset Voices
You don't have (or don't want to provide) a recording of the target voice
Curated voices have predictable quality across any text input
Skip the audio cleanup, sample preparation, and quality iteration loop
Kokoro runs at CPU realtime with ~150 MB on disk — no GPU needed
Creating a Preset-Voice Profile
Open Profiles → New Profile
Same entry point as cloning profiles
Choose the engine
Select Kokoro or Qwen CustomVoice from the engine dropdown
Pick a preset voice
The voice catalog for the chosen engine appears — preview each by clicking it
Name and save
Give the profile a name. No audio sample needed — just save
Generate
Use the profile like any other in the floating generate box or the Generate page
Kokoro 82M — 50 Voices Across 9 Languages
Kokoro is the smallest engine in Voicebox at 82M parameters. It runs at CPU realtime with negligible VRAM, making it the best option for lightweight local inference. Voices are pre-built style vectors trained into the model — there's no concept of cloning here.
Repository: hexgrad/Kokoro-82M · Apache 2.0 licensed
American English
| Female | Male |
|---|---|
| Alloy | Adam |
| Aoede | Echo |
| Bella | Eric |
| Heart | Fenrir |
| Jessica | Liam |
| Kore | Michael |
| Nicole | Onyx |
| Nova | Puck |
| River | Santa |
| Sarah | |
| Sky |
British English
| Female | Male |
|---|---|
| Alice | Daniel |
| Emma | Fable |
| Isabella | George |
| Lily | Lewis |
Other Languages
| Language | Voices |
|---|---|
Spanish (es) |
Dora (f), Alex (m), Santa (m) |
French (fr) |
Siwis (f) |
Hindi (hi) |
Alpha (f), Beta (f), Omega (m), Psi (m) |
Italian (it) |
Sara (f), Nicola (m) |
Japanese (ja) |
Alpha (f), Gongitsune (f), Nezumi (f), Tebukuro (f), Kumo (m) |
Portuguese (pt) |
Dora (f), Alex (m), Santa (m) |
Chinese (zh) |
Xiaobei (f), Xiaoni (f), Xiaoxiao (f), Xiaoyi (f) |
Kokoro at a Glance
| Property | Value |
|---|---|
| Parameters | 82M |
| Sample rate | 24 kHz |
| VRAM | ~150 MB (negligible on CPU) |
| Speed | Realtime on CPU, faster on GPU |
| Instruct | Not supported (preset voice carries the style) |
| License | Apache 2.0 |
Qwen CustomVoice — 9 Premium Voices with Instruct Control
Qwen CustomVoice ships with 9 curated speakers and supports natural-language style control — you tell the model how to deliver the line ("speak slowly with warmth", "authoritative and clear") and it adapts tone, emotion, and pace.
Two model sizes:
- 1.7B — full quality, recommended default
- 0.6B — lighter, faster, lower-end hardware
Repository: Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice (and 0.6B variant) · by Alibaba
Voice Catalog
| Speaker | Gender | Language | Description |
|---|---|---|---|
| Vivian | female | Chinese | Bright, slightly edgy young female voice |
| Serena | female | Chinese | Warm, gentle young female voice |
| Uncle Fu | male | Chinese | Seasoned male voice with a low, mellow timbre |
| Dylan | male | Chinese | Youthful Beijing male voice with a clear, natural timbre |
| Eric | male | Chinese | Lively Chengdu male voice with a slightly husky brightness |
| Ryan | male | English | Dynamic male voice with strong rhythmic drive (default) |
| Aiden | male | English | Sunny American male voice with a clear midrange |
| Ono Anna | female | Japanese | Playful Japanese female voice with a light, nimble timbre |
| Sohee | female | Korean | Warm Korean female voice with rich emotion |
Using Instruct Mode
In the floating generate box, switch to a Qwen CustomVoice profile and click the delivery instructions toggle (slider icon, left of the generate button). A second textarea appears below the main text:
- Main text → what you want the voice to say
- Instruct text → how you want it delivered
Examples of effective instruct prompts:
Speak slowly with emphasis, like reading bedtime stories
Warm and friendly, conversational tone
Professional and authoritative, broadcast quality
Whisper, intimate and close
Excited and energetic, like sports commentary
The full Generate page also surfaces the instruct field as a separate input.
Qwen CustomVoice at a Glance
| Property | Value |
|---|---|
| Parameters | 1.7B / 0.6B |
| Languages | Chinese, English, Japanese, Korean (10 supported) |
| Voices | 9 curated preset speakers |
| VRAM | ~3.5 GB (1.7B), ~1.2 GB (0.6B) |
| Instruct | Yes — natural-language style control |
| Cloning | No — paired Base Qwen3-TTS engine handles cloning |
Cloning vs Preset — Quick Decision
| You want… | Use |
|---|---|
| To replicate a specific person's voice | Voice Cloning |
| Production-ready voices with no audio prep | Kokoro or Qwen CustomVoice |
| The smallest possible footprint (CPU-only) | Kokoro |
| Fine control over delivery (tone, pace, emotion) | Qwen CustomVoice |
| The broadest language coverage | Voice Cloning via Chatterbox Multilingual (23 langs) |
Limitations
- Preset voices can't be exported to use in other Voicebox installations as audio (only as profile metadata pointing to the same engine + voice ID)
- The Kokoro voice catalog is set by the upstream model — new voices appear only when hexgrad publishes new model releases
- Qwen CustomVoice's 9 speakers are part of the model checkpoint — same constraint
Next Steps
Clone a specific voice from your own audio
Use a profile to generate audio
Compose multi-voice narratives