Model catalog

Every model you can call through Framehood — what it does, how to call it, its full parameter schema, limits, and cost. Pages are grouped by type; pick a category below for the detailed per-model reference.

How to call. Most models are reached through an MCP tool action (e.g. image(create), video(swap), audio(speak)) or the CLI. The raw form is POST /v1/jobs/<model> with the model's inputs. Each model's exact, live input schema is also available at GET /v1/models/<model> (and …/prompt-guide where present).

Cost. Figures are the price in credits (per call unless noted). Local processing/QA steps that run on our own infrastructure are free. See Credits & plans.

Testing. Every generation model also accepts mock: true to return a placeholder result without running the model (no credits spent).

Image — full reference →

Text-to-image, image editing, and upscaling.

Model	What it does	Cost
FLUX.1 Schnell `flux_schnell`	Fast text-to-image (drafts, iteration)	1 cr
Flux Pro 1.1 Ultra `flux_pro`	Text-to-image, scenes/backgrounds	12 cr
Flux 2 LoRA Realism `flux_realism`	Text-to-image, photorealistic	≈5 cr (per MP)
Nano Banana Pro `nano_banana`	Text-to-image, in-image text / prompt adherence	30 cr
Nano Banana Pro Edit `nano_banana_edit`	Instruction-based image editing	30 cr
Seedream v4.5 Edit `seedream_v45_edit`	High-resolution image editing	8 cr
Seedream v5 Lite Edit `seedream_v5_edit`	Image editing / compositing	7 cr
Topaz Image Upscale `topaz_upscale_image`	Upscale + detail enhancement	16 cr

Video — full reference →

Generation, image-to-video, editing, swap, and upscaling.

Model	What it does	Cost
Seedance 2.0 Reference-to-Video `seedance_r2v`	Reference images/video/audio → video (up to 4K)	303 cr
Kling v3 Standard I2V `kling_v3_std_i2v`	Image-to-video, standard quality	84 cr
Kling v3 Pro I2V `kling_v3_pro_i2v`	Image-to-video, high quality	112 cr
Kling O3 Video Edit `kling_o3_video_edit`	Video edit with reference images	126 cr
PixVerse Swap `pixverse_swap`	Person/object swap in video	30 cr
Wan 2.7 Video Edit `wan_27_video_edit`	Text-guided video-to-video edit	100 cr
Topaz Video Upscale `topaz_upscale_video`	Upscale + enhance video	100 cr

Audio — full reference →

Speech, sound effects, music, and audio processing.

Model	What it does	Cost
ElevenLabs TTS v3 `elevenlabs_tts_v3`	Text-to-speech, emotional control	20 cr
ElevenLabs TTS (direct) `elevenlabs_tts_direct`	TTS with cloned/linked voice	20 cr
ElevenLabs Sound Effects `elevenlabs_sfx`	Sound-effect / foley generation	1 cr
Minimax Music v2.6 `minimax_music`	Music (instrumental or with lyrics)	30 cr
Audio Concat `audio_concat`	Join audio files in sequence	Free
Audio-Only Mix `audio_only_mix`	Mix audio files into one (flat, or a ducked music bed)	Free
Audio Trim `audio_trim`	Cut audio to a start/duration window	Free
Audio Convert `audio_convert`	Format / sample-rate conversion	Free
Audio Tail Fade `tail_fade`	Silence pad + fade-out	Free

Video processing & assembly — full reference →

Local ffmpeg pipelines (free) plus lipsync.

Model	What it does	Cost
Auto Subtitles `captions_auto`	Karaoke-style caption burn-in	6 cr
Full Video Assembly `video_assemble_full`	Clips + transitions + audio + intro/end	Free
Assemble Clips `assemble_clips`	Concatenate clips with transitions	Free
Video + Audio Mix `video_audio_mix`	Overlay VO/music/SFX onto video	Free
Audio Mix `audio_mix`	Layered audio mix onto video	Free
Structural Export `structural_export`	Final platform encode (TikTok/Reels/…)	Free
Highlight Rolloff `highlight_rolloff`	Surgical highlight compression	Free
Sync Lipsync v3 `lipsync_v3`	Lip-sync mouth to audio (expensive)	1600 cr

QA checks — full reference →

Quality checks for generated media — mostly free; a few use a vision/STT model.

Model	What it does	Cost
Full QA Pipeline `qa_full`	Run all checks on a finished video	1 cr
Same Person Check `check_same_person`	Identity consistency (ref vs test)	1 cr
Scene Matches Plan `check_scene_matches_plan`	Image/video matches shooting plan	1 cr
Image Description Check `check_image_description`	Image matches a text description	1 cr
Voice Consistency Check `check_voice_consistency`	Same speaker throughout	1 cr
Transcript Check `check_transcript`	Transcribe (video or audio) with timecodes, and optionally check it matches expected voiceover	1 cr
Video Description `describe_video`	Timecoded scene/speech/sounds/music breakdown	≈1 cr per 25 s at fps 1, × fps; min 1
Audio Loudness Check `check_audio_loudness`	LUFS / true-peak vs platform target	Free
Audio Structural Check `check_audio_structural`	Codec/duration/sample-rate sanity	Free
Audio Tail Check `check_audio_tail`	Detect abrupt audio cut-off	Free
Motion Artifacts Check `check_motion_artifacts`	Glitches / jump cuts / artifacts	Free
Overexposure Check `overexposure_check`	Blown-out highlights	Free

Actors — full reference →

Persistent characters: training, generation, and voice.

Model	What it does	Cost
Actor LoRA Training `actor_lora_train`	Train a LoRA from 4–30 reference images	500 cr
Flux LoRA Inference `actor_lora_inference`	Actor-consistent image generation	≈7 cr (per MP)
Actor Voice Clone (IVC) `actor_voice_clone`	Clone a voice from samples	Free

Model catalog ​

Image — full reference → ​

Video — full reference → ​

Audio — full reference → ​

Video processing & assembly — full reference → ​

QA checks — full reference → ​

Actors — full reference → ​