TTS Engines

For humans: This doc is optimized for AI agents to implement new TTS engines autonomously. It's structured as a phased workflow with explicit gates and a checklist so an agent can do the full integration — dependency research, backend, frontend, bundling — and hand you a draft release or prod build to test locally. It's also a useful reference if you're doing it yourself.

Adding an engine touches ~10 files across 4 layers. The backend protocol work is straightforward — the real time sink is dependency hell, upstream library bugs, and PyInstaller bundling.

Do not start writing code until you complete Phase 0. The v0.2.3 release was three patch releases of PyInstaller fixes because dependency research was skipped. Every issue — inspect.getsource() failures, missing native data files, metadata lookups, dtype mismatches — was discoverable by reading the model library's source code before integration began.

Architecture Overview

The backend is split into layers:

Layer	Purpose	Files Touched
`routes/`	Thin HTTP handlers	None (auto-dispatch)
`services/`	Business logic	None (auto-dispatch)
`backends/`	Engine implementations	`your_engine_backend.py`
`utils/`	Shared utilities	As needed

New engines only need to touch backends/ and models.py on the backend side — the route and service layers use a model config registry that handles dispatch automatically.

Phase 0: Dependency Research

This phase is mandatory. Clone the model library and its key dependencies into a temporary directory and inspect them before writing any integration code. The goal is to produce a dependency audit that identifies every PyInstaller-incompatible pattern, every native data file, and every upstream bug you'll need to work around.

0.1 Clone and Inspect the Model Library

# Create a throwaway workspace
mkdir /tmp/engine-research && cd /tmp/engine-research

# Clone the model library
git clone https://github.com/org/model-library.git
cd model-library

Read these files first, in order:

setup.py / setup.cfg / pyproject.toml — Check pinned dependency versions. If the library pins torch==2.6.0 or numpy<1.26, you'll need --no-deps installation and manual sub-dependency listing (this is what happened with chatterbox-tts).
__init__.py and the main model class — Trace the import chain. Look for:
- from_pretrained() — does it call huggingface_hub internally? Does it pass token=True (which crashes without a stored HF token)?
- from_local() — does it exist? You may need manual snapshot_download() + from_local() to bypass download bugs.
- Device handling — does it default to CUDA? Does it support MPS? Many libraries crash on MPS with unsupported operators.
All import statements — Recursively trace what the library imports. You're looking for:
- inspect.getsource() anywhere in the chain (search all .py files)
- typeguard / @typechecked decorators (these call inspect.getsource() at import time)
- importlib.metadata.version() or pkg_resources.get_distribution() (need --copy-metadata)
- lazy_loader (needs --collect-all to bundle .pyi stubs)

0.2 Scan for PyInstaller-Incompatible Patterns

Run these searches against the cloned library and its transitive dependencies:

# inspect.getsource — will crash in frozen binary without --collect-all
grep -r "inspect.getsource\|getsource(" .

# typeguard / @typechecked — calls inspect.getsource at import time
grep -r "@typechecked\|from typeguard" .

# importlib.metadata — needs --copy-metadata
grep -r "importlib.metadata\|pkg_resources.get_distribution\|pkg_resources.require" .

# Data files loaded at runtime — need --collect-all or --collect-data
grep -r "Path(__file__).parent\|os.path.dirname(__file__)\|resources_path\|pkg_resources.resource_filename" .

# Native library paths — may need env var override in frozen builds
grep -r "/usr/share\|/usr/lib\|/usr/local\|espeak\|phonemize" .

# torch.load without map_location — will crash on CPU-only builds
grep -r "torch.load(" . | grep -v "map_location"

# HuggingFace token bugs
grep -r 'token=True\|token=os.getenv' .

# Float64/Float32 assumptions — librosa returns float64, many models assume float32
grep -r "torch.from_numpy\|\.double()\|float64" .

# @torch.jit.script — calls inspect.getsource(), crashes in frozen builds
grep -r "@torch.jit.script\|torch.jit.script" .

# torchaudio.load — requires torchcodec in torchaudio 2.10+, use soundfile.read() instead
grep -r "torchaudio.load\|torchaudio.save" .

# Gated HuggingFace repos — models that hardcode gated repos as tokenizer/config sources
grep -r "from_pretrained\|tokenizer_name\|AutoTokenizer" . | grep -i "llama\|meta-llama\|gated"

0.3 Install and Trace in a Throwaway Venv

# Create isolated venv
python -m venv /tmp/engine-venv
source /tmp/engine-venv/bin/activate

# Install the package (try normally first)
pip install model-package

# Check if it conflicts with our stack
pip install model-package torch==2.10 transformers==4.57.3 numpy>=1.26
# If this fails, you need --no-deps:
pip install --no-deps model-package

# Get the full dependency tree
pip show model-package  # Check Requires: field
pip show -f model-package  # List all installed files (look for data files)

# Check for non-PyPI dependencies
pip install model-package 2>&1 | grep -i "no matching distribution"

0.4 Test Model Loading on CPU

Before writing any integration code, verify the model works on CPU in a plain Python script:

import torch
# Force CPU to catch map_location bugs early
model = ModelClass.from_pretrained("org/model", device="cpu")

# Test with a float32 audio array (not float64)
import numpy as np
audio = np.random.randn(16000).astype(np.float32)
output = model.generate("Hello world", audio)
print(f"Output shape: {output.shape}, dtype: {output.dtype}, sample rate: {model.sample_rate}")

If this crashes, you've found a bug you'll need to monkey-patch. Common ones:

RuntimeError: expected scalar type Float but found Double → needs float32 cast
RuntimeError: map_location → needs torch.load patch
RuntimeError: Unsupported operator aten::... → needs MPS skip

0.5 Produce a Dependency Audit

Before proceeding to Phase 1, write down:

PyPI vs non-PyPI deps — which packages need --find-links, git+https://, or --no-deps?
PyInstaller directives needed — which packages need --collect-all, --copy-metadata, --hidden-import?
Runtime data files — which packages ship data files (YAML, pretrained weights, phoneme tables, shader libraries) that must be bundled?
Native library paths — which packages look for data at system paths that won't exist in a frozen binary?
Monkey-patches needed — torch.load map_location, float64→float32 casts, MPS skip, HF token bypass, etc.
Sample rate — what does the engine output? (24kHz, 44.1kHz, 48kHz)
Model download method — from_pretrained() with library-managed download, or manual snapshot_download() + from_local()?

This audit becomes your implementation plan for Phases 1, 4, and 5.

Phase 1: Backend Implementation

1.1 Create the Backend File

Create backend/backends/<engine>_backend.py (~200-300 lines) implementing the TTSBackend protocol:

class YourBackend:
"""Must satisfy the TTSBackend protocol."""

async def load_model(self, model_size: str = "default") -> None: ...
async def create_voice_prompt(self, audio_path: str, reference_text: str, use_cache: bool = True) -> tuple[dict, bool]: ...
async def combine_voice_prompts(self, audio_paths: list[str], ref_texts: list[str]) -> tuple[np.ndarray, str]: ...
async def generate(self, text: str, voice_prompt: dict, language: str = "en", seed: int | None = None, instruct: str | None = None) -> tuple[np.ndarray, int]: ...
def unload_model(self) -> None: ...
def is_loaded(self) -> bool: ...
def _get_model_path(self, model_size: str) -> str: ...

Key decisions per engine:

Decision	Options	Examples
Voice prompt storage	Pre-computed tensors vs deferred file paths	Qwen stores tensor dicts; Chatterbox stores paths
Caching	Use voice prompt cache or skip it	LuxTTS caches with prefix; Chatterbox skips caching
Device selection	CUDA / MPS / CPU	Chatterbox forces CPU on macOS (MPS bugs)
Model download	Library handles it vs manual `snapshot_download`	Turbo uses manual download to bypass `token=True` bug
Sample rate	Engine-specific	LuxTTS outputs 48kHz, everything else is 24kHz

1.2 Voice Prompt Patterns

Pattern A: Pre-computed tensors (Qwen, LuxTTS)

encoded = model.encode_prompt(audio_path)
return encoded, False  # (prompt_dict, was_cached)

Pattern B: Deferred file paths (Chatterbox, MLX)

return {"ref_audio": audio_path, "ref_text": reference_text}, False

Pattern C: Hybrid (possible for new engines)

embedding = model.extract_speaker(audio_path)
return {"embedding": embedding, "ref_audio": audio_path}, False

If caching, prefix your cache keys:

cache_key = "yourengine_" + get_cache_key(audio_path, reference_text)

1.3 Register the Engine

In backend/backends/__init__.py:

Add a ModelConfig entry:

ModelConfig(
model_name="your-engine",
display_name="Your Engine",
engine="your_engine",
hf_repo_id="org/model-repo",
size_mb=3200,
needs_trim=False,  # set True if output needs trim_tts_output()
languages=["en", "fr", "de"],
),

Add to TTS_ENGINES dict:

TTS_ENGINES = {
...
"your_engine": "Your Engine",
}

Add factory branch:

elif engine == "your_engine":
from .your_backend import YourBackend
backend = YourBackend()

1.4 Update Request Models

In backend/models.py:

Add engine name to GenerationRequest.engine regex pattern
Add any new language codes to the language regex

Phase 2: Route and Service Integration

With the model config registry, route and service layers have zero per-engine dispatch points. All endpoints use registry helpers like get_model_config(), load_engine_model(), engine_needs_trim(), check_model_loaded(), etc.

You don't need to touch any route or service files unless your engine needs custom behavior in the generate pipeline.

Post-Processing

If your model produces trailing silence, set needs_trim=True on your ModelConfig. The generation service applies trim_tts_output() automatically.

Phase 3: Frontend Integration

3.1 TypeScript Types

In app/src/lib/api/types.ts:

Add to the engine union type on GenerationRequest

3.2 Language Maps

In app/src/lib/constants/languages.ts:

Add entry to ENGINE_LANGUAGES record
Add any new language codes to ALL_LANGUAGES if needed

3.3 Engine/Model Selector

In app/src/components/Generation/EngineModelSelector.tsx:

Add entry to ENGINE_OPTIONS and ENGINE_DESCRIPTIONS
Add to ENGLISH_ONLY_ENGINES if applicable

3.4 Form Hook

In app/src/lib/hooks/useGenerationForm.ts:

Add to Zod schema enum for engine
Add engine-to-model-name mapping
Update payload construction for engine-specific fields

Watch out for model naming inconsistencies. The HuggingFace repo name, the model size label, and the API model name don't always follow predictable patterns. For example, TADA's 3B model is named tada-3b-ml (not tada-3b), because it's a multilingual variant. Always check the actual repo names and build the frontend model name mapping from those, not from assumptions like {engine}-{size}.

3.5 Model Management

In app/src/components/ServerSettings/ModelManagement.tsx:

Add description to MODEL_DESCRIPTIONS record
Add model name to voiceModels filter condition

3.6 Non-Cloning Engines (Preset Voices)

If your engine uses pre-built voices instead of zero-shot cloning from reference audio (e.g. Kokoro), additional integration is needed:

Backend:

In kokoro_backend.py (or your engine), define a VOICES list of (voice_id, display_name, gender, language) tuples
create_voice_prompt() should return {"voice_type": "preset", "preset_engine": "<engine>", "preset_voice_id": "<id>"}
generate() should read voice_prompt.get("preset_voice_id") to select the voice
Add a seed_preset_profiles("<engine>") call in backend/routes/models.py after model download completes
The seed_preset_profiles() function in backend/services/profiles.py creates DB profiles with voice_type="preset"

Frontend:

The EngineModelSelector filters options based on selectedProfile.voice_type:
- "cloned" profiles → only cloning engines shown (Kokoro hidden)
- "preset" profiles → only the preset's engine shown
Profile cards show the engine name as a badge for preset profiles
When a preset profile is selected, the engine auto-switches

Profile schema fields for presets:

voice_type: "preset" (vs "cloned" for traditional profiles)
preset_engine: "<engine>" — which engine owns this voice
preset_voice_id: "<id>" — the engine-specific voice identifier

For future "designed" voices (text description instead of audio, e.g. Qwen CustomVoice):

Use voice_type: "designed" with design_prompt field
create_voice_prompt_for_profile() already returns the design prompt for this type

Phase 4: Dependencies

Use the dependency audit from Phase 0 to drive this phase. You should already know what packages are needed, which conflict, and which require special installation.

4.1 Python Dependencies

Add to backend/requirements.txt. There are three installation patterns, depending on what Phase 0 revealed:

Normal PyPI packages:

some-model-package>=1.0.0

Pinned dependency conflicts (--no-deps) — If the model package pins old versions of torch/numpy/transformers, install with --no-deps and list sub-dependencies manually. This is the pattern used for chatterbox-tts:

# In justfile / CI setup:
pip install --no-deps chatterbox-tts

# In requirements.txt — list each actual sub-dependency:
conformer>=0.3.2
diffusers>=0.31.0
omegaconf>=2.3.0
resemble-perth>=0.0.2
s3tokenizer>=0.1.6

To identify sub-deps: pip show chatterbox-tts → Requires: field, then cross-reference against existing requirements.txt to avoid duplicates.

Non-PyPI packages — Some libraries only exist on GitHub or require custom indexes:

# Git-only packages (no PyPI release)
linacodec @ git+https://github.com/ysharma3501/LinaCodec.git
Zipvoice @ git+https://github.com/ysharma3501/LuxTTS.git

# Custom package indexes (C extensions with platform-specific wheels)
--find-links https://k2-fsa.github.io/icefall/piper_phonemize.html
piper-phonemize>=1.2.0

4.2 Dependency Conflict Resolution

Check for conflicts with the existing stack before adding anything:

# Our current stack pins (approximate):
# Python 3.12+, torch>=2.10, transformers>=4.57, numpy>=1.26

# Test compatibility
pip install model-package torch==2.10 transformers==4.57.3 numpy>=1.26

# If it fails, check what the package pins:
pip show model-package | grep Requires
# Look at setup.py/pyproject.toml for version constraints

Known incompatible patterns in the wild:

torch==2.6.0 — many older packages pin this
numpy<1.26 — conflicts with Python 3.12+
transformers==4.46.3 — many packages pin old transformers
onnxruntime pinned versions — often conflict with torch

4.3 Update Installation Scripts

Dependencies must be added in multiple places:

File	What to add
`backend/requirements.txt`	Package and version constraint
`justfile`	`--no-deps` install line if needed (in `setup-python` and `setup-python-release` targets)
`.github/workflows/release.yml`	Same `--no-deps` line in CI build steps
`Dockerfile`	Same install commands for Docker builds

Phase 5: PyInstaller Bundling (`build_binary.py`)

This is where most of the pain lives. The v0.2.3 release was entirely dedicated to fixing bundling issues — every new engine that shipped in v0.2.1 (LuxTTS, Chatterbox, Chatterbox Turbo) worked in dev but failed in production builds. Don't skip this phase.

5.1 Register Your Engine in `build_binary.py`

Every new engine needs entries in backend/build_binary.py. This file drives PyInstaller and is the single most common source of "works in dev, breaks in prod" bugs. You need to decide which PyInstaller directives your engine's dependencies require:

Directive	What It Does	When You Need It
`--hidden-import <module>`	Includes a module PyInstaller can't detect via static analysis	Dynamic imports, lazy imports, plugin architectures
`--collect-all <package>`	Bundles source `.py` files, data files, AND native libraries	Packages that call `inspect.getsource()` at import time (e.g. `inflect` via `typeguard`'s `@typechecked`), or that ship pretrained model files (e.g. `perth` ships `.pth.tar` + `hparams.yaml`)
`--collect-data <package>`	Bundles only data files (not source or native libs)	Packages with YAML configs, vocab files, etc.
`--collect-submodules <package>`	Bundles all submodules	Packages with deep module trees that PyInstaller misses
`--copy-metadata <package>`	Copies `importlib.metadata` info	Packages that call `importlib.metadata.version()` or `pkg_resources.get_distribution()` at runtime. Already required for: `requests`, `transformers`, `huggingface-hub`, `tokenizers`, `safetensors`, `tqdm`

Example: adding hidden imports and collect-all for a new engine:

# In build_binary.py, inside the args list:
"--hidden-import",
"backend.backends.your_engine_backend",
"--hidden-import",
"your_engine_package",
"--hidden-import",
"your_engine_package.inference",
"--collect-all",
"some_dependency_that_uses_inspect_getsource",
"--copy-metadata",
"some_dependency_that_checks_its_own_version",

5.2 Lessons from v0.2.3 — Real Failures and Their Fixes

These are actual production failures from shipping new engines. Every one of these passed python -m uvicorn in dev:

Engine	Failure	Root Cause	Fix
LuxTTS	`"could not get source code"` on import	`inflect` uses `typeguard`'s `@typechecked` which calls `inspect.getsource()` — needs `.py` source files, not just bytecode	`--collect-all inflect`
LuxTTS	`espeak-ng-data` not found	`piper_phonemize` C library looks for data at `/usr/share/espeak-ng-data/` which doesn't exist in the bundle	`--collect-all piper_phonemize` + set `ESPEAK_DATA_PATH` env var at runtime (see 5.3)
LuxTTS	`inspect.getsource` error in Vocos codec	`linacodec` and `zipvoice` use source introspection	`--collect-all linacodec` + `--collect-all zipvoice`
Chatterbox	`FileNotFoundError` for watermark model	`perth` ships pretrained model files (`hparams.yaml`, `.pth.tar`) that PyInstaller doesn't bundle by default	`--collect-all perth`
All engines	`importlib.metadata` failures	Frozen binary doesn't include package metadata for `huggingface-hub`, `transformers`, etc.	`--copy-metadata` for each affected package
All engines	Download progress bars stuck at 0%	`huggingface_hub` silently disables tqdm progress bars based on logger level in frozen builds — our progress tracker never receives byte updates	Force-enable tqdm's internal counter in `HFProgressTracker`
TADA	`inspect.getsource` error in DAC's `Snake1d`	`@torch.jit.script` calls `inspect.getsource()` which fails without `.py` source files	Wrote a lightweight shim (`dac_shim.py`) reimplementing `Snake1d` without `@torch.jit.script`, registered fake `dac.*` modules in `sys.modules`
All engines	`NameError: name 'obj' is not defined` on macOS	Python 3.12.0 has a CPython bug that corrupts bytecode when PyInstaller rewrites code objects	Upgrade to Python 3.12.13+
All engines	`resource_tracker` subprocess crash	`multiprocessing` in frozen binaries needs `freeze_support()` called before anything else	Added to `server.py` entry point

5.3 Runtime Frozen-Build Handling (`server.py`)

Some fixes can't live in build_binary.py — they need runtime detection. The entry point backend/server.py handles these before any heavy imports:

# 1. freeze_support() — MUST be called before any multiprocessing use
import multiprocessing
multiprocessing.freeze_support()

# 2. Native data paths — redirect C libraries to bundled data
if getattr(sys, 'frozen', False):
_meipass = getattr(sys, '_MEIPASS', os.path.dirname(sys.executable))
_espeak_data = os.path.join(_meipass, 'piper_phonemize', 'espeak-ng-data')
if os.path.isdir(_espeak_data):
    os.environ.setdefault('ESPEAK_DATA_PATH', _espeak_data)

# 3. stdout/stderr safety — PyInstaller --noconsole on Windows sets these to None
if not _is_writable(sys.stdout):
sys.stdout = open(os.devnull, 'w')

If your engine's dependencies include native libraries that look for data at system paths (like espeak-ng does), you'll need to add a similar os.environ.setdefault() block here.

5.4 CUDA vs CPU Build Branching

build_binary.py produces two different binaries:

voicebox-server (CPU) — excludes all nvidia.* packages to avoid bundling ~3 GB of CUDA DLLs
voicebox-server-cuda — includes torch.cuda and torch.backends.cudnn

On Windows, if the build environment has CUDA torch installed but you're building the CPU binary, the script temporarily swaps to CPU-only torch and restores CUDA torch afterward. This prevents PyInstaller from accidentally bundling CUDA libraries into the CPU build.

New engine imports go in the common section (not the CUDA or MLX conditional blocks) unless your engine has platform-specific dependencies.

5.5 MLX Conditional Inclusion

Apple Silicon builds conditionally include MLX hidden imports and --collect-all mlx / --collect-all mlx_audio. If your engine has an MLX-specific backend variant, add its imports inside the if is_apple_silicon() and not cuda: block.

5.6 Testing Frozen Builds

You can't skip this. Models that work in python -m uvicorn will break in the PyInstaller binary. The v0.2.3 release required three patch releases (v0.2.1 → v0.2.2 → v0.2.3) to get all engines working in production.

Build: just build
Launch the binary directly (not via python -m)
Test the full chain: download → load → generate → progress tracking
Check stderr for the actual error (logs go to stderr for Tauri sidecar capture)
Fix, rebuild, repeat

Common gotcha: testing only generation with a pre-cached model from your dev install. Always test with a clean model cache to verify downloads work too.

Phase 6: Common Upstream Workarounds

torch.load device mismatch

_original_torch_load = torch.load
def _patched_torch_load(*args, **kwargs):
kwargs.setdefault("map_location", "cpu")
return _original_torch_load(*args, **kwargs)
torch.load = _patched_torch_load

Float64/Float32 dtype mismatch

original_fn = SomeClass.some_method
def patched_fn(self, *args, **kwargs):
result = original_fn(self, *args, **kwargs)
return result.float()
SomeClass.some_method = patched_fn

HuggingFace token bug

from huggingface_hub import snapshot_download
local_path = snapshot_download(repo_id=REPO, token=None)
model = ModelClass.from_local(local_path, device=device)

MPS tensor issues

Skip MPS entirely if operators aren't supported:

def _get_device(self):
if torch.cuda.is_available():
    return "cuda"
return "cpu"  # Skip MPS

Gated HuggingFace repos as hardcoded config sources

Some models hardcode a gated HuggingFace repo as their tokenizer or config source (e.g., TADA hardcodes "meta-llama/Llama-3.2-1B" in both its AlignerConfig and TadaConfig). This silently fails without HF authentication.

Fix: Download from an ungated mirror and patch the config objects directly:

# Download tokenizer from ungated mirror
UNGATED_TOKENIZER = "unsloth/Llama-3.2-1B"
tokenizer_path = snapshot_download(UNGATED_TOKENIZER, token=None)

# Patch the model config to use the local path instead of the gated repo
config = ModelConfig.from_pretrained(model_path)
config.tokenizer_name = tokenizer_path
model = ModelClass.from_pretrained(model_path, config=config)

Do NOT monkey-patch AutoTokenizer.from_pretrained — it's a classmethod, and replacing it corrupts the descriptor, which breaks other engines that use different tokenizers (e.g., Qwen uses a Qwen tokenizer via AutoTokenizer). Always patch at the config level, not the class method level.

`torchaudio.load()` requires `torchcodec` in 2.10+

As of torchaudio>=2.10, torchaudio.load() requires the torchcodec package for audio I/O. If your engine or backend code uses torchaudio.load(), replace it with soundfile:

# Before (breaks without torchcodec):
import torchaudio
waveform, sr = torchaudio.load("audio.wav")

# After:
import soundfile as sf
import torch
data, sr = sf.read("audio.wav", dtype="float32")
waveform = torch.from_numpy(data).unsqueeze(0)

Note: torchaudio.functional.resample() and other pure-PyTorch math functions work fine without torchcodec — only the I/O functions are affected.

`@torch.jit.script` breaks in frozen builds

torch.jit.script calls inspect.getsource() to parse the decorated function's source code. In a PyInstaller binary, .py source files aren't available, so this crashes at import time.

Fix: Remove or avoid @torch.jit.script decorators. If the decorated function comes from an upstream dependency, write a shim that reimplements the function without the decorator (see "Toxic dependency chains" below).

Toxic dependency chains — the shim pattern

Sometimes a model library depends on a package with a massive, hostile transitive dependency tree, but only uses a tiny piece of it. When the dependency chain is unbuildable or would pull in dozens of unwanted packages, the right move is to write a lightweight shim.

Example: TADA depends on descript-audio-codec (DAC), which pulls in descript-audiotools -> onnx, tensorboard, protobuf, matplotlib, pystoi, etc. The onnx package fails to build from source on macOS. But TADA only uses Snake1d from DAC — a 7-line PyTorch module.

Solution: Create a shim at backend/utils/dac_shim.py that registers fake modules in sys.modules:

import sys
import types
import torch
from torch import nn

def snake(x, alpha):
"""Snake activation — reimplemented without @torch.jit.script."""
return x + (1.0 / (alpha + 1e-9)) * torch.sin(alpha * x).pow(2)

class Snake1d(nn.Module):
def __init__(self, channels):
    super().__init__()
    self.alpha = nn.Parameter(torch.ones(1, channels, 1))
def forward(self, x):
    return snake(x, self.alpha)

# Register fake dac.* modules so "from dac.nn.layers import Snake1d" works
_nn = types.ModuleType("dac.nn")
_layers = types.ModuleType("dac.nn.layers")
_layers.Snake1d = Snake1d
_nn.layers = _layers

for name, mod in [("dac", types.ModuleType("dac")),
               ("dac.nn", _nn), ("dac.nn.layers", _layers)]:
sys.modules[name] = mod

Key rules for shims:

Import the shim before importing the model library (so it finds the fake modules first)
Do NOT use @torch.jit.script in the shim (see above)
Only reimplement what the model actually uses — check the import chain carefully

Candidate Engines

The docs/PROJECT_STATUS.md file is the canonical, living list of candidates under evaluation — including why some have been backlogged (e.g. VoxCPM, which is effectively CUDA-only upstream).

At a glance, current top candidates:

Model	Tier	Size	Cross-platform?	Key Features
MOSS-TTS-Nano	1	0.1 B	Yes (CPU realtime)	48 kHz stereo, Apache 2.0, released 2026-04-13
Voxtral TTS	2	4 B	Likely	`mistralai/Voxtral-4B-TTS-2603` — presets + cloning
VibeVoice	2	~500 M	Yes	Podcast-style multi-speaker dialogue
Dia2	3	TBD	TBD	Successor to the original Dia
Fish Audio S2 Pro	3	Medium	Yes	Word-level control via inline text

Backlogged:

VoxCPM (2B, Apache 2.0) — CUDA ≥12 required upstream; MPS broken in issues #232/#248; CPU path rejected by maintainers (#256). Keep watching for a PR that relaxes the device requirement.

Update PROJECT_STATUS.md when you pick one up or mark one as shipped/backlogged.

Implementation Checklist

Use this as a gate between phases. Do not proceed to the next phase until every item in the current phase is checked.

Phase 0: Dependency Research

Cloned model library source into a temp directory
Read setup.py / pyproject.toml — noted pinned dependency versions
Traced all imports from the model class through to leaf dependencies
Searched for inspect.getsource, @typechecked, typeguard in the full dependency tree
Searched for importlib.metadata, pkg_resources.get_distribution in the dependency tree
Searched for Path(__file__).parent, os.path.dirname(__file__), hardcoded system paths
Searched for torch.load calls missing map_location
Searched for torch.from_numpy without .float() cast
Searched for token=True or token=os.getenv("HF_TOKEN") in HuggingFace calls
Searched for @torch.jit.script / torch.jit.script (crashes in frozen builds)
Searched for torchaudio.load / torchaudio.save (requires torchcodec in 2.10+)
Searched for hardcoded gated HuggingFace repo names (e.g., meta-llama/*)
Evaluated whether any dependency is used minimally enough to shim instead of install
Tested model loading and generation on CPU in a throwaway venv
Tested with a clean HuggingFace cache (no pre-downloaded models)
Produced a written dependency audit documenting all findings

Phase 1: Backend Implementation

Created backend/backends/<engine>_backend.py implementing TTSBackend protocol
Chose voice prompt pattern (pre-computed tensors vs deferred file paths)
Implemented all monkey-patches identified in Phase 0
Used get_torch_device() from backends/base.py for device selection
Used model_load_progress() from backends/base.py for download/load tracking
Tested: model downloads correctly
Tested: model loads on CPU
Tested: generation produces valid audio
Tested: voice cloning from reference audio works
Registered ModelConfig in backends/__init__.py
Added to TTS_ENGINES dict
Added factory branch in get_tts_backend_for_engine()
Updated engine regex in backend/models.py

Phase 2–3: Route, Service, and Frontend

Confirmed zero changes needed in routes/services (or documented why custom behavior is needed)
Added engine to TypeScript union type in app/src/lib/api/types.ts
Added language map entry in app/src/lib/constants/languages.ts
Added to ENGINE_OPTIONS and ENGINE_DESCRIPTIONS in EngineModelSelector.tsx
Added to Zod schema and model-name mapping in useGenerationForm.ts
Added description in ModelManagement.tsx

Phase 4: Dependencies

Added packages to backend/requirements.txt
If --no-deps needed: listed sub-dependencies explicitly
If git-only packages: added @ git+https://... entries
If custom index needed: added --find-links line
Updated justfile setup targets
Updated .github/workflows/release.yml build steps
Updated Dockerfile if applicable
Verified pip install succeeds in a clean venv with existing requirements

Phase 5: PyInstaller Bundling

Added --hidden-import entries in build_binary.py for:
- backend.backends.<engine>_backend
- The model package and its key submodules
Added --collect-all for any packages that:
- Use inspect.getsource() / @typechecked
- Ship pretrained model data files (.pth.tar, .yaml, etc.)
- Ship native data files (phoneme tables, shader libraries, etc.)
Added --copy-metadata for any packages that use importlib.metadata
If engine has native data paths: added os.environ.setdefault() in server.py
Built frozen binary with just build
Tested in frozen binary with clean model cache (not pre-cached from dev):
- Model download works with real-time progress
- Model loading works
- Generation produces valid audio
- No errors in stderr logs

Phase 6: Final Verification

Engine works in dev mode (just dev)
Engine works in frozen binary (just build → run binary directly)
Tested on target platform (macOS for MLX, Windows/Linux for CUDA)
No regressions in existing engines

Architecture Overview

Phase 0: Dependency Research

0.1 Clone and Inspect the Model Library

0.2 Scan for PyInstaller-Incompatible Patterns

0.3 Install and Trace in a Throwaway Venv

0.4 Test Model Loading on CPU

0.5 Produce a Dependency Audit

Phase 1: Backend Implementation

1.1 Create the Backend File

1.2 Voice Prompt Patterns

1.3 Register the Engine

1.4 Update Request Models

Phase 2: Route and Service Integration

Post-Processing

Phase 3: Frontend Integration

3.1 TypeScript Types

3.2 Language Maps

3.3 Engine/Model Selector

3.4 Form Hook

3.5 Model Management

3.6 Non-Cloning Engines (Preset Voices)

Phase 4: Dependencies

4.1 Python Dependencies

4.2 Dependency Conflict Resolution

4.3 Update Installation Scripts

Phase 5: PyInstaller Bundling (build_binary.py)

5.1 Register Your Engine in build_binary.py

5.2 Lessons from v0.2.3 — Real Failures and Their Fixes

5.3 Runtime Frozen-Build Handling (server.py)

5.4 CUDA vs CPU Build Branching

5.5 MLX Conditional Inclusion

5.6 Testing Frozen Builds

Phase 6: Common Upstream Workarounds

torch.load device mismatch

Float64/Float32 dtype mismatch

HuggingFace token bug

MPS tensor issues

Gated HuggingFace repos as hardcoded config sources

torchaudio.load() requires torchcodec in 2.10+

@torch.jit.script breaks in frozen builds

Toxic dependency chains — the shim pattern

Candidate Engines

Implementation Checklist

Phase 0: Dependency Research

Phase 1: Backend Implementation

Phase 2–3: Route, Service, and Frontend

Phase 4: Dependencies

Phase 5: PyInstaller Bundling

Phase 6: Final Verification

Phase 5: PyInstaller Bundling (`build_binary.py`)

5.1 Register Your Engine in `build_binary.py`

5.3 Runtime Frozen-Build Handling (`server.py`)

`torchaudio.load()` requires `torchcodec` in 2.10+

`@torch.jit.script` breaks in frozen builds