Model catalog
Every model you can call through Framehood — what it does, how to call it, its full parameter schema, limits, and cost. Pages are grouped by type; pick a category below for the detailed per-model reference.
How to call. Most models are reached through an MCP tool action (e.g. image(create), video(swap), audio(speak)) or the CLI. The raw form is POST /v1/jobs/<model> with the model's inputs. Each model's exact, live input schema is also available at GET /v1/models/<model> (and …/prompt-guide where present).
Cost. Figures are the price in credits (per call unless noted). Local processing/QA steps that run on our own infrastructure are free. See Credits & plans.
Testing. Every generation model also accepts mock: true to return a placeholder result without running the model (no credits spent).
Image — full reference →
Text-to-image, image editing, and upscaling.
| Model | What it does | Cost |
|---|---|---|
FLUX.1 Schnell flux_schnell | Fast text-to-image (drafts, iteration) | 1 cr |
Flux Pro 1.1 Ultra flux_pro | Text-to-image, scenes/backgrounds | 12 cr |
Flux 2 LoRA Realism flux_realism | Text-to-image, photorealistic | ≈5 cr (per MP) |
Nano Banana Pro nano_banana | Text-to-image, in-image text / prompt adherence | 30 cr |
Nano Banana Pro Edit nano_banana_edit | Instruction-based image editing | 30 cr |
Seedream v4.5 Edit seedream_v45_edit | High-resolution image editing | 8 cr |
Seedream v5 Lite Edit seedream_v5_edit | Image editing / compositing | 7 cr |
Topaz Image Upscale topaz_upscale_image | Upscale + detail enhancement | 16 cr |
Video — full reference →
Generation, image-to-video, editing, swap, and upscaling.
| Model | What it does | Cost |
|---|---|---|
Seedance 2.0 Reference-to-Video seedance_r2v | Reference images/video/audio → video (up to 4K) | 303 cr |
Kling v3 Standard I2V kling_v3_std_i2v | Image-to-video, standard quality | 84 cr |
Kling v3 Pro I2V kling_v3_pro_i2v | Image-to-video, high quality | 112 cr |
Kling O3 Video Edit kling_o3_video_edit | Video edit with reference images | 126 cr |
PixVerse Swap pixverse_swap | Person/object swap in video | 30 cr |
Wan 2.7 Video Edit wan_27_video_edit | Text-guided video-to-video edit | 100 cr |
Topaz Video Upscale topaz_upscale_video | Upscale + enhance video | 100 cr |
Audio — full reference →
Speech, sound effects, music, and audio processing.
| Model | What it does | Cost |
|---|---|---|
ElevenLabs TTS v3 elevenlabs_tts_v3 | Text-to-speech, emotional control | 20 cr |
ElevenLabs TTS (direct) elevenlabs_tts_direct | TTS with cloned/linked voice | 20 cr |
ElevenLabs Sound Effects elevenlabs_sfx | Sound-effect / foley generation | 1 cr |
Minimax Music v2.6 minimax_music | Music (instrumental or with lyrics) | 30 cr |
Audio Concat audio_concat | Join audio files in sequence | Free |
Audio-Only Mix audio_only_mix | Mix audio files into one (flat, or a ducked music bed) | Free |
Audio Trim audio_trim | Cut audio to a start/duration window | Free |
Audio Convert audio_convert | Format / sample-rate conversion | Free |
Audio Tail Fade tail_fade | Silence pad + fade-out | Free |
Video processing & assembly — full reference →
Local ffmpeg pipelines (free) plus lipsync.
| Model | What it does | Cost |
|---|---|---|
Auto Subtitles captions_auto | Karaoke-style caption burn-in | 6 cr |
Full Video Assembly video_assemble_full | Clips + transitions + audio + intro/end | Free |
Assemble Clips assemble_clips | Concatenate clips with transitions | Free |
Video + Audio Mix video_audio_mix | Overlay VO/music/SFX onto video | Free |
Audio Mix audio_mix | Layered audio mix onto video | Free |
Structural Export structural_export | Final platform encode (TikTok/Reels/…) | Free |
Highlight Rolloff highlight_rolloff | Surgical highlight compression | Free |
Sync Lipsync v3 lipsync_v3 | Lip-sync mouth to audio (expensive) | 1600 cr |
QA checks — full reference →
Quality checks for generated media — mostly free; a few use a vision/STT model.
| Model | What it does | Cost |
|---|---|---|
Full QA Pipeline qa_full | Run all checks on a finished video | 1 cr |
Same Person Check check_same_person | Identity consistency (ref vs test) | 1 cr |
Scene Matches Plan check_scene_matches_plan | Image/video matches shooting plan | 1 cr |
Image Description Check check_image_description | Image matches a text description | 1 cr |
Voice Consistency Check check_voice_consistency | Same speaker throughout | 1 cr |
Transcript Check check_transcript | Transcribe (video or audio) with timecodes, and optionally check it matches expected voiceover | 1 cr |
Video Description describe_video | Timecoded scene/speech/sounds/music breakdown | ≈1 cr per 25 s at fps 1, × fps; min 1 |
Audio Loudness Check check_audio_loudness | LUFS / true-peak vs platform target | Free |
Audio Structural Check check_audio_structural | Codec/duration/sample-rate sanity | Free |
Audio Tail Check check_audio_tail | Detect abrupt audio cut-off | Free |
Motion Artifacts Check check_motion_artifacts | Glitches / jump cuts / artifacts | Free |
Overexposure Check overexposure_check | Blown-out highlights | Free |
Actors — full reference →
Persistent characters: training, generation, and voice.
| Model | What it does | Cost |
|---|---|---|
Actor LoRA Training actor_lora_train | Train a LoRA from 4–30 reference images | 500 cr |
Flux LoRA Inference actor_lora_inference | Actor-consistent image generation | ≈7 cr (per MP) |
Actor Voice Clone (IVC) actor_voice_clone | Clone a voice from samples | Free |