Skip to content

Model catalog

Every model you can call through Framehood — what it does, how to call it, its full parameter schema, limits, and cost. Pages are grouped by type; pick a category below for the detailed per-model reference.

How to call. Most models are reached through an MCP tool action (e.g. image(create), video(swap), audio(speak)) or the CLI. The raw form is POST /v1/jobs/<model> with the model's inputs. Each model's exact, live input schema is also available at GET /v1/models/<model> (and …/prompt-guide where present).

Cost. Figures are the price in credits (per call unless noted). Local processing/QA steps that run on our own infrastructure are free. See Credits & plans.

Testing. Every generation model also accepts mock: true to return a placeholder result without running the model (no credits spent).


Image — full reference →

Text-to-image, image editing, and upscaling.

ModelWhat it doesCost
FLUX.1 Schnell flux_schnellFast text-to-image (drafts, iteration)1 cr
Flux Pro 1.1 Ultra flux_proText-to-image, scenes/backgrounds12 cr
Flux 2 LoRA Realism flux_realismText-to-image, photorealistic≈5 cr (per MP)
Nano Banana Pro nano_bananaText-to-image, in-image text / prompt adherence30 cr
Nano Banana Pro Edit nano_banana_editInstruction-based image editing30 cr
Seedream v4.5 Edit seedream_v45_editHigh-resolution image editing8 cr
Seedream v5 Lite Edit seedream_v5_editImage editing / compositing7 cr
Topaz Image Upscale topaz_upscale_imageUpscale + detail enhancement16 cr

Video — full reference →

Generation, image-to-video, editing, swap, and upscaling.

ModelWhat it doesCost
Seedance 2.0 Reference-to-Video seedance_r2vReference images/video/audio → video (up to 4K)303 cr
Kling v3 Standard I2V kling_v3_std_i2vImage-to-video, standard quality84 cr
Kling v3 Pro I2V kling_v3_pro_i2vImage-to-video, high quality112 cr
Kling O3 Video Edit kling_o3_video_editVideo edit with reference images126 cr
PixVerse Swap pixverse_swapPerson/object swap in video30 cr
Wan 2.7 Video Edit wan_27_video_editText-guided video-to-video edit100 cr
Topaz Video Upscale topaz_upscale_videoUpscale + enhance video100 cr

Audio — full reference →

Speech, sound effects, music, and audio processing.

ModelWhat it doesCost
ElevenLabs TTS v3 elevenlabs_tts_v3Text-to-speech, emotional control20 cr
ElevenLabs TTS (direct) elevenlabs_tts_directTTS with cloned/linked voice20 cr
ElevenLabs Sound Effects elevenlabs_sfxSound-effect / foley generation1 cr
Minimax Music v2.6 minimax_musicMusic (instrumental or with lyrics)30 cr
Audio Concat audio_concatJoin audio files in sequenceFree
Audio-Only Mix audio_only_mixMix audio files into one (flat, or a ducked music bed)Free
Audio Trim audio_trimCut audio to a start/duration windowFree
Audio Convert audio_convertFormat / sample-rate conversionFree
Audio Tail Fade tail_fadeSilence pad + fade-outFree

Video processing & assembly — full reference →

Local ffmpeg pipelines (free) plus lipsync.

ModelWhat it doesCost
Auto Subtitles captions_autoKaraoke-style caption burn-in6 cr
Full Video Assembly video_assemble_fullClips + transitions + audio + intro/endFree
Assemble Clips assemble_clipsConcatenate clips with transitionsFree
Video + Audio Mix video_audio_mixOverlay VO/music/SFX onto videoFree
Audio Mix audio_mixLayered audio mix onto videoFree
Structural Export structural_exportFinal platform encode (TikTok/Reels/…)Free
Highlight Rolloff highlight_rolloffSurgical highlight compressionFree
Sync Lipsync v3 lipsync_v3Lip-sync mouth to audio (expensive)1600 cr

QA checks — full reference →

Quality checks for generated media — mostly free; a few use a vision/STT model.

ModelWhat it doesCost
Full QA Pipeline qa_fullRun all checks on a finished video1 cr
Same Person Check check_same_personIdentity consistency (ref vs test)1 cr
Scene Matches Plan check_scene_matches_planImage/video matches shooting plan1 cr
Image Description Check check_image_descriptionImage matches a text description1 cr
Voice Consistency Check check_voice_consistencySame speaker throughout1 cr
Transcript Check check_transcriptTranscribe (video or audio) with timecodes, and optionally check it matches expected voiceover1 cr
Video Description describe_videoTimecoded scene/speech/sounds/music breakdown≈1 cr per 25 s at fps 1, × fps; min 1
Audio Loudness Check check_audio_loudnessLUFS / true-peak vs platform targetFree
Audio Structural Check check_audio_structuralCodec/duration/sample-rate sanityFree
Audio Tail Check check_audio_tailDetect abrupt audio cut-offFree
Motion Artifacts Check check_motion_artifactsGlitches / jump cuts / artifactsFree
Overexposure Check overexposure_checkBlown-out highlightsFree

Actors — full reference →

Persistent characters: training, generation, and voice.

ModelWhat it doesCost
Actor LoRA Training actor_lora_trainTrain a LoRA from 4–30 reference images500 cr
Flux LoRA Inference actor_lora_inferenceActor-consistent image generation≈7 cr (per MP)
Actor Voice Clone (IVC) actor_voice_cloneClone a voice from samplesFree

Framehood