# Model catalog

Every model you can call through Framehood — what it does, how to call it, its full
parameter schema, limits, and cost. Pages are grouped by type; pick a category below
for the detailed per-model reference.

**How to call.** Most models are reached through an MCP tool action (e.g. `image(create)`,
`video(swap)`, `audio(speak)`) or the CLI. The raw form is
`POST /v1/jobs/<model>` with the model's `inputs`. Each model's exact, live input schema
is also available at `GET /v1/models/<model>` (and `…/prompt-guide` where present).

**Cost.** Figures are the price in **credits** (per call unless noted). Local
processing/QA steps that run on our own infrastructure are **free**. See
[Credits & plans](/guide/billing).

**Testing.** Every generation model also accepts `mock: true` to return a placeholder
result without running the model (no credits spent).

---

## Image — [full reference →](/reference/models/image)

Text-to-image, image editing, and upscaling.

| Model | What it does | Cost |
|---|---|---|
| FLUX.1 Schnell `flux_schnell` | Fast text-to-image (drafts, iteration) | 1 cr |
| Flux Pro 1.1 Ultra `flux_pro` | Text-to-image, scenes/backgrounds | 12 cr |
| Flux 2 LoRA Realism `flux_realism` | Text-to-image, photorealistic | ≈5 cr (per MP) |
| Nano Banana Pro `nano_banana` | Text-to-image, in-image text / prompt adherence | 30 cr |
| Nano Banana Pro Edit `nano_banana_edit` | Instruction-based image editing | 30 cr |
| Seedream v4.5 Edit `seedream_v45_edit` | High-resolution image editing | 8 cr |
| Seedream v5 Lite Edit `seedream_v5_edit` | Image editing / compositing | 7 cr |
| Topaz Image Upscale `topaz_upscale_image` | Upscale + detail enhancement | 16 cr |

## Video — [full reference →](/reference/models/video)

Generation, image-to-video, editing, swap, and upscaling.

| Model | What it does | Cost |
|---|---|---|
| Seedance 2.0 Reference-to-Video `seedance_r2v` | Reference images/video/audio → video (up to 4K) | 303 cr |
| Kling v3 Standard I2V `kling_v3_std_i2v` | Image-to-video, standard quality | 84 cr |
| Kling v3 Pro I2V `kling_v3_pro_i2v` | Image-to-video, high quality | 112 cr |
| Kling O3 Video Edit `kling_o3_video_edit` | Video edit with reference images | 126 cr |
| PixVerse Swap `pixverse_swap` | Person/object swap in video | 30 cr |
| Wan 2.7 Video Edit `wan_27_video_edit` | Text-guided video-to-video edit | 100 cr |
| Topaz Video Upscale `topaz_upscale_video` | Upscale + enhance video | 100 cr |

## Audio — [full reference →](/reference/models/audio)

Speech, sound effects, music, and audio processing.

| Model | What it does | Cost |
|---|---|---|
| ElevenLabs TTS v3 `elevenlabs_tts_v3` | Text-to-speech, emotional control | 20 cr |
| ElevenLabs TTS (direct) `elevenlabs_tts_direct` | TTS with cloned/linked voice | 20 cr |
| ElevenLabs Sound Effects `elevenlabs_sfx` | Sound-effect / foley generation | 1 cr |
| Minimax Music v2.6 `minimax_music` | Music (instrumental or with lyrics) | 30 cr |
| Audio Concat `audio_concat` | Join audio files in sequence | Free |
| Audio-Only Mix `audio_only_mix` | Mix audio files into one (flat, or a ducked music bed) | Free |
| Audio Trim `audio_trim` | Cut audio to a start/duration window | Free |
| Audio Convert `audio_convert` | Format / sample-rate conversion | Free |
| Audio Tail Fade `tail_fade` | Silence pad + fade-out | Free |

## Video processing & assembly — [full reference →](/reference/models/processing)

Local ffmpeg pipelines (free) plus lipsync.

| Model | What it does | Cost |
|---|---|---|
| Auto Subtitles `captions_auto` | Karaoke-style caption burn-in | 6 cr |
| Full Video Assembly `video_assemble_full` | Clips + transitions + audio + intro/end | Free |
| Assemble Clips `assemble_clips` | Concatenate clips with transitions | Free |
| Video + Audio Mix `video_audio_mix` | Overlay VO/music/SFX onto video | Free |
| Audio Mix `audio_mix` | Layered audio mix onto video | Free |
| Structural Export `structural_export` | Final platform encode (TikTok/Reels/…) | Free |
| Highlight Rolloff `highlight_rolloff` | Surgical highlight compression | Free |
| Sync Lipsync v3 `lipsync_v3` | Lip-sync mouth to audio (expensive) | 1600 cr |

## QA checks — [full reference →](/reference/models/qa)

Quality checks for generated media — mostly free; a few use a vision/STT model.

| Model | What it does | Cost |
|---|---|---|
| Full QA Pipeline `qa_full` | Run all checks on a finished video | 1 cr |
| Same Person Check `check_same_person` | Identity consistency (ref vs test) | 1 cr |
| Scene Matches Plan `check_scene_matches_plan` | Image/video matches shooting plan | 1 cr |
| Image Description Check `check_image_description` | Image matches a text description | 1 cr |
| Voice Consistency Check `check_voice_consistency` | Same speaker throughout | 1 cr |
| Transcript Check `check_transcript` | Transcribe (video or audio) with timecodes, and optionally check it matches expected voiceover | 1 cr |
| Video Description `describe_video` | Timecoded scene/speech/sounds/music breakdown | ≈1 cr per 25 s at fps 1, × fps; min 1 |
| Audio Loudness Check `check_audio_loudness` | LUFS / true-peak vs platform target | Free |
| Audio Structural Check `check_audio_structural` | Codec/duration/sample-rate sanity | Free |
| Audio Tail Check `check_audio_tail` | Detect abrupt audio cut-off | Free |
| Motion Artifacts Check `check_motion_artifacts` | Glitches / jump cuts / artifacts | Free |
| Overexposure Check `overexposure_check` | Blown-out highlights | Free |

## Actors — [full reference →](/reference/models/actors)

Persistent characters: training, generation, and voice.

| Model | What it does | Cost |
|---|---|---|
| Actor LoRA Training `actor_lora_train` | Train a LoRA from 4–30 reference images | 500 cr |
| Flux LoRA Inference `actor_lora_inference` | Actor-consistent image generation | ≈7 cr (per MP) |
| Actor Voice Clone (IVC) `actor_voice_clone` | Clone a voice from samples | Free |