Stories & Timeline

Overview

Stories allow users to arrange multiple voice generations on a timeline to create multi-voice narratives. The system supports tracks, trimming, splitting, and audio mixing.

Architecture

Story: A container that holds story items with metadata.

Story Item: Links a generation to a story with timeline position, track, and trim data.

Export: Combines all items into a single mixed audio file.

Data Model

Story Table

class Story(Base):
__tablename__ = "stories"

id = Column(String, primary_key=True)
name = Column(String, nullable=False)
description = Column(Text)
created_at = Column(DateTime)
updated_at = Column(DateTime)

StoryItem Table

class StoryItem(Base):
__tablename__ = "story_items"

id = Column(String, primary_key=True)
story_id = Column(String, ForeignKey("stories.id"))
generation_id = Column(String, ForeignKey("generations.id"))
start_time_ms = Column(Integer, default=0)  # Timeline position
track = Column(Integer, default=0)          # Track number
trim_start_ms = Column(Integer, default=0)  # Trim from start
trim_end_ms = Column(Integer, default=0)    # Trim from end
created_at = Column(DateTime)

Timeline Concepts

Start Time

start_time_ms is the absolute position on the timeline where an item begins playing. Items on the same track cannot overlap; items on different tracks can.

Tracks

A track is an integer (0-indexed) that identifies the horizontal row an item sits on. Audio on separate tracks plays concurrently, so tracks are the primary way to layer multiple voices or sound effects.

Trimming

trim_start_ms and trim_end_ms hide the leading/trailing portions of the source generation without modifying the underlying audio file. The effective playback length is generation.duration * 1000 - trim_start_ms - trim_end_ms. Trimming is non-destructive — the same generation can be trimmed differently in different stories.

Core Operations

Adding Items

When adding a generation to a story:

async def add_item_to_story(
story_id: str,
data: StoryItemCreate,
db: Session,
) -> StoryItemDetail:
# Calculate start time if not provided
if data.start_time_ms is None:
    # Find the end of all existing items
    existing_items = get_items_with_durations(story_id, db)
    max_end_time_ms = max(
        item.start_time_ms + int(gen.duration * 1000)
        for item, gen in existing_items
    )
    start_time_ms = max_end_time_ms + 200  # 200ms gap

# Create the item
item = DBStoryItem(
    id=str(uuid.uuid4()),
    story_id=story_id,
    generation_id=data.generation_id,
    start_time_ms=start_time_ms,
    track=data.track or 0,
)
db.add(item)
db.commit()

Moving Items

Update position and/or track:

async def move_story_item(
story_id: str,
item_id: str,
data: StoryItemMove,
db: Session,
) -> StoryItemDetail:
item = get_item(story_id, item_id, db)

item.start_time_ms = data.start_time_ms
item.track = data.track

db.commit()

Trimming Items

Non-destructive trimming:

async def trim_story_item(
story_id: str,
item_id: str,
data: StoryItemTrim,
db: Session,
) -> StoryItemDetail:
item = get_item(story_id, item_id, db)
generation = get_generation(item.generation_id, db)

# Validate trim doesn't exceed duration
max_duration_ms = int(generation.duration * 1000)
if data.trim_start_ms + data.trim_end_ms >= max_duration_ms:
    return None  # Invalid trim

item.trim_start_ms = data.trim_start_ms
item.trim_end_ms = data.trim_end_ms

db.commit()

Splitting Items

Split one item into two at a specific time:

async def split_story_item(
story_id: str,
item_id: str,
data: StoryItemSplit,
db: Session,
) -> List[StoryItemDetail]:
item = get_item(story_id, item_id, db)
generation = get_generation(item.generation_id, db)

# Calculate split point
current_trim_start = item.trim_start_ms
current_trim_end = item.trim_end_ms
original_duration_ms = int(generation.duration * 1000)
absolute_split_ms = current_trim_start + data.split_time_ms

# Update original: trim from end
item.trim_end_ms = original_duration_ms - absolute_split_ms

# Create new item: trim from start
new_item = DBStoryItem(
    generation_id=item.generation_id,  # Same generation
    start_time_ms=item.start_time_ms + data.split_time_ms,
    track=item.track,
    trim_start_ms=absolute_split_ms,
    trim_end_ms=current_trim_end,
)

db.add(new_item)
db.commit()

return [item, new_item]

Duplicating Items

Create a copy with all properties:

async def duplicate_story_item(
story_id: str,
item_id: str,
db: Session,
) -> StoryItemDetail:
original = get_item(story_id, item_id, db)
generation = get_generation(original.generation_id, db)

# Calculate effective duration for positioning
effective_duration_ms = (
    int(generation.duration * 1000) 
    - original.trim_start_ms 
    - original.trim_end_ms
)

# Place copy after original with 200ms gap
new_item = DBStoryItem(
    generation_id=original.generation_id,
    start_time_ms=original.start_time_ms + effective_duration_ms + 200,
    track=original.track,
    trim_start_ms=original.trim_start_ms,
    trim_end_ms=original.trim_end_ms,
)

db.add(new_item)
db.commit()

Audio Export

Mixing Algorithm

The export function mixes all items into a single audio file:

async def export_story_audio(story_id: str, db: Session) -> bytes:
items = get_all_items_with_generations(story_id, db)

# Calculate total duration
max_end_time_ms = max(
    data['start_time_ms'] + data['duration_ms']
    for data in audio_data
)

# Create output buffer
total_samples = int((max_end_time_ms / 1000.0) * sample_rate)
final_audio = np.zeros(total_samples, dtype=np.float32)

# Mix each item at its position
for data in audio_data:
    audio = data['audio']
    start_sample = int((data['start_time_ms'] / 1000.0) * sample_rate)
    
    # Apply trim
    trimmed_audio = audio[trim_start_sample:len(audio) - trim_end_sample]
    
    # Add to buffer (overlapping items sum together)
    final_audio[start_sample:start_sample + len(trimmed_audio)] += trimmed_audio

# Normalize to prevent clipping
max_val = np.abs(final_audio).max()
if max_val > 1.0:
    final_audio = final_audio / max_val

return audio_to_bytes(final_audio, sample_rate)

API Endpoints

Method Endpoint Description
GET /stories List all stories
POST /stories Create a story
GET /stories/{id} Get story with items
PUT /stories/{id} Update story metadata
DELETE /stories/{id} Delete story
POST /stories/{id}/items Add item to story
DELETE /stories/{id}/items/{item_id} Remove item
PUT /stories/{id}/items/{item_id}/move Move item
PUT /stories/{id}/items/{item_id}/trim Trim item
POST /stories/{id}/items/{item_id}/split Split item
POST /stories/{id}/items/{item_id}/duplicate Duplicate item
PUT /stories/{id}/items/times Batch update times
PUT /stories/{id}/items/reorder Reorder items
GET /stories/{id}/export-audio Export mixed audio

Response Schemas

StoryItemDetail

{
  "id": "item_uuid",
  "story_id": "story_uuid",
  "generation_id": "generation_uuid",
  "start_time_ms": 1500,
  "track": 0,
  "trim_start_ms": 200,
  "trim_end_ms": 100,
  "profile_id": "profile_uuid",
  "profile_name": "Narrator",
  "text": "Hello world",
  "audio_path": "/path/to/audio.wav",
  "duration": 2.5,
  "created_at": "2024-01-15T10:30:00Z"
}

Frontend Integration

The timeline UI needs to:

  1. Fetch story with all items
  2. Render waveforms for each item
  3. Handle drag/drop to move items
  4. Handle edge drag for trimming
  5. Sync playhead across all tracks
  6. Export when user clicks download