[{"id":"guide/billing","section":"guide","title":"Credits & plans","url":"https://docs.framehood.ai/guide/billing","text":"# Credits & plans\n\nEvery generation costs **credits**. Credits come from your plan; usage is\nmetered per job by the underlying model.\n\n## Free tier\n\nNew accounts start free. You can:\n\n- use the standard tools,\n- generate within your included credits.\n\n## Paid plans\n\nPaid plans are **subscriptions** with stepped tiers — more credits per cycle as\nyou move up. Manage everything from the console:\n\n[**framehood.ai/app/billing**](https://framehood.ai/app/billing)\n\n- **Subscribe / upgrade:** takes effect immediately. Upgrading mid-cycle charges\n  the prorated difference and grants the new credits right away.\n- **Downgrade:** takes effect at the start of your next cycle.\n- **Rollover:** unused credits partially carry into the next cycle, up to a cap.\n- **Payment & invoices:** managed through the Stripe customer portal (the\n  *Manage subscription* button).\n\n## Extra usage\n\nNeed to keep going after the included credits run out? Owners can turn on **Extra\nusage** so the shared pool tops itself up off-session when it runs low — no manual\ncheckout, no interruption.\n\nExtra usage is billed at a **premium rate of €0.0125 per credit** (80 credits per €)\n— **higher than your plan's rate**. It's meant as a safety margin, not a substitute\nfor the right plan: moving to a larger package lowers your per-credit cost. The\nconfig returns a `rate_note` reminding you of this.\n\nYou set:\n\n- a **trigger** — top up when the balance drops below N credits,\n- a **top-up amount** in euros — **€5 minimum, in €5 steps** (€5 buys 400 credits at\n  the extra-usage rate), and\n- a **per-cycle cap** in euros — the most Extra usage can charge in a single billing\n  cycle, as a safety ceiling.\n\nExtra usage **reuses the card already on your subscription** — a subscriber is never\nasked to add one. The cap resets at the start of each billing cycle.\n\n## From MCP and the CLI\n\nYou can check and manage billing without leaving your tools:\n\n```\nbilling(balance)                         # current credits\nbilling(plan)                            # your current plan\nbilling(plans)                           # available subscription steps\nbilling(subscribe, step=\"studio_1\")      # subscribe to a plan\nbilling(topup, amount_eur=20)            # one-off credit purchase (min €20)\nbilling(extra_usage)                     # view Extra-usage config (owner)\nbilling(set_extra_usage, enabled=true, trigger_below=200, amount_eur=5, extra_usage_cap_eur=40)  # owner\nbilling(manage)                          # link to the customer portal\n```\n\n`billing(subscribe, …)` returns a Stripe Checkout link to open in a browser — the\ntools can't redirect you, and credits are added once the payment completes. Extra\nusage charges the subscription card off-session, so there's no browser step.\n\nCLI: `framehood balance`.\n\n## Organizations\n\nAn organization has an **owner** who pays and whose plan funds a shared credit\npool, plus **members** who draw from it. Members don't have payment access;\ninstead they call `billing(request_upgrade)` (or use the console) to email the\nowner a request. Owners also get org-management tools (`org` / the Team screen)\nto invite and remove members and see spend.\n"},{"id":"guide/claude-ai","section":"guide","title":"Add Framehood to Claude.ai","url":"https://docs.framehood.ai/guide/claude-ai","text":"# Add Framehood to Claude.ai\n\nFramehood plugs straight into Claude — the `image`, `video`, `audio`, `qa` and\n`files` tools appear right inside your chats, so you can ask Claude to make a\nposter, cut a clip, or add a voiceover and it calls Framehood for you. There are\ntwo ways to add it: the **plugin** (recommended) or a plain **custom connector**.\n\nEither way there's **no API key to copy** — sign-in is through your browser\n(OAuth).\n\n## Video walkthrough\n\nA short screen recording that installs Framehood and gets it working in Claude:\n\n<VideoPlayer\n  src=\"https://cdn.framehood.ai/docs/framehood-install-tutorial-v5.mp4\"\n  poster=\"/tutorial-poster.jpg\"\n  caption=\"Installing the Framehood plugin in Claude (2:15)\"\n/>\n\n## Install the plugin (recommended)\n\nThe best way to use Framehood with Claude is the **Framehood plugin** — it works\neverywhere Claude does: the web, the desktop app, and your phone. The plugin\nbundles the connection plus **skills** that teach Claude the whole toolset, the\nper-model prompting best-practices, and video-montage craft, so generation works\nnatively with nothing to wire up.\n\nInstall it **as shown in the video above** — the marketplace address to add is:\n\n```\nFramehood/framehood-plugin\n```\n\nIt's completely safe: the plugin is [**open source**](https://github.com/Framehood/framehood-plugin),\nso you can read exactly what it ships. Prefer the terminal? The two install\ncommands for Claude Code are on the [Claude Code](/guide/plugin) page.\n\n**Don't forget to authorize after installing.** Connect Framehood and sign in\nwith your account when Claude prompts (in Claude Code, run `/mcp` → **framehood**\n→ **Authenticate**). Until you do, the tools stay locked.\n\nPrefer the lightest setup? You can add\n**[just the connector](#add-the-connector-pro-max-free)** below instead — it\npulls in no extra resources, though we still recommend the plugin.\n\n## Tips for working with Framehood in Claude\n\nFramehood is a **tool** — and you're teaching your Claude to use it well. Your\nClaude is what holds the knowledge and your preferences, so a little guidance\ngoes a long way.\n\n- **Pick a specific model.** By default Framehood chooses a model for each job.\n  Want a particular one? Ask Claude to **list the available models**, then tell\n  it explicitly which to use — it can pass any parameters that model supports.\n- **Teach Claude your style.** Framehood already knows the basics of editing and\n  montage — cuts, pacing, continuity. But you can teach your Claude the\n  approaches that suit *you*: just ask it to **remember how you like to work**,\n  and it keeps that for next time.\n- **Ask to see the intermediate steps.** Claude aims to hand you a finished clip\n  in one go. If you'd rather review each stage, ask it to **show every\n  intermediate result**.\n- **Keep it current.** Framehood is under active development, so the resources in\n  the plugin — and the models themselves — keep improving. Ask Claude to **check\n  for updates** to the plugin and the connector from time to time.\n- **Send feedback.** Any errors surface right in your chat. Want to flag a bug or\n  suggest something to the team? Just ask Claude to **send feedback** — it\n  delivers your message to the developers without you leaving the chat.\n\n::: tip Turn on memory\nFor Claude to remember your preferences and reference earlier chats, enable\n**memory** and **search across chats** in Claude's settings.\n:::\n\n## Add the connector (Pro, Max, Free)\n\nFor an individual account:\n\n1. Open **Settings → Customize → Connectors**.\n2. Click **+**, then **Add custom connector**.\n3. In **Remote MCP server URL**, paste `https://mcp.framehood.ai/mcp`.\n4. Leave **Advanced settings** (OAuth Client ID / Secret) **empty** — Framehood\n   registers your client automatically. Click **Add**.\n5. The Framehood connector appears. Click **Connect** and approve the sign-in in\n   the browser tab that opens.\n\n::: tip Plans\nCustom connectors work on **Free, Pro, Max, Team, and Enterprise**. Free accounts\nare limited to one custom connector.\n:::\n\n## Add the connector (Team / Enterprise)\n\nAn **owner** adds it once for the organization, then each member connects:\n\n**Owner — configure it:**\n1. Go to **Settings → Organization settings → Connectors**.\n2. Click **Add**, hover **Custom**, and choose **Web**.\n3. Paste `https://mcp.framehood.ai/mcp` (leave the Advanced OAuth fields empty).\n4. Click **Add**.\n\n**Each member — connect it:**\n1. Open **Settings → Customize → Connectors**.\n2. Find **Framehood** (marked *Custom*) and click **Connect**.\n3. Approve the browser sign-in.\n\n## After connecting\n\n- Every call is billed to the **same account and credits** as the CLI, REST API\n  and the [Console](https://framehood.ai/app). See [Credits & plans](/guide/billing).\n- Not seeing the tools? Make sure the connector shows **Connected** and is\n  toggled on for that chat.\n\n## Troubleshooting\n\n**The tools don't appear in a chat.**\nThe connector isn't on for that conversation. Click **+** next to the message\nbox → **Connectors** and toggle **Framehood** on; make sure it shows\n**Connected** in **Settings → Connectors**.\n\n**It keeps asking you to sign in, or a sign-in loop.**\nRe-connect: **Settings → Connectors → Framehood → Connect**, and finish the\nbrowser sign-in fully. There's no API key — it's all OAuth.\n\n**The sign-in tab never opens.**\nYour browser blocked the pop-up. Allow pop-ups for claude.ai and click\n**Connect** again.\n\n**\"You can only add one custom connector.\"**\nFree accounts allow a single custom connector — remove another one or upgrade\nyour plan.\n\n**\"Insufficient credits\" or a job is refused.**\nYou're out of credits. Top up or check your plan in the\n[Console](https://framehood.ai/app) — see [Credits & plans](/guide/billing).\n\n**A video job looks stuck.**\nLong jobs take minutes and Claude polls them — just ask \"check the status\" for\nprogress.\n\n**Installing the plugin instead?** See the Claude Code\n[troubleshooting](/guide/plugin#troubleshooting) for `/mcp` and `/plugin` fixes.\n\n## Notes\n\n- **No API key.** Authentication is OAuth 2.1 (PKCE + Dynamic Client\n  Registration); the token is scoped to `mcp.framehood.ai`.\n- **Public reachability.** Claude connects from Anthropic's cloud, not your\n  device — Framehood is on the public internet, so it just works.\n- Prefer the terminal or an editor instead? See the [MCP server](/guide/mcp)\n  guide for Claude Code, Cursor, and generic clients, or the\n  [Claude Code](/guide/plugin) plugin which bundles this server.\n\nNext: browse the [tools reference](/reference/tools) and the\n[model catalog](/reference/models/).\n"},{"id":"guide/cli","section":"guide","title":"CLI","url":"https://docs.framehood.ai/guide/cli","text":"# CLI\n\n::: warning Experimental\nThe CLI is **experimental and may be unstable** — expect rough edges and\nbreaking changes. For the smoothest experience we recommend the\n[Claude Code plugin](/guide/plugin) or the [Claude.ai connector](/guide/claude-ai);\nfor programmatic use, the [REST API](/reference/api).\n:::\n\nThe `framehood` CLI is a first-class client for the same toolset you get over\n[MCP](/guide/mcp) and the [REST API](/reference/api). It works two ways:\n\n- **One-shot subcommands** — scriptable, automation-friendly commands that run,\n  print, and exit (`framehood generate …`, `framehood jobs`, `framehood\n  balance`).\n- **The interactive studio** — a full-screen terminal app you open by running\n  `framehood` with no subcommand.\n\nBoth modes share one account, one credit balance, and one set of tools.\n\n## Install\n\n::: code-group\n\n```sh [Homebrew]\nbrew install framehood/tap/framehood\n```\n\n```sh [npm]\nnpm install -g framehood\n```\n\n```sh [go]\ngo install github.com/Framehood/framehood-cli@latest\n```\n\n```sh [source]\ngit clone https://github.com/Framehood/framehood-cli\ncd framehood-cli && go build -o framehood .\n```\n\n:::\n\nOr grab a prebuilt binary from the\n[releases page](https://github.com/Framehood/framehood-cli/releases/latest).\n\n### Keep it up to date\n\n```sh\nframehood upgrade        # alias: framehood update\n```\n\n`upgrade` self-replaces the binary with the latest GitHub release. If you\ninstalled through a package manager (Homebrew, npm), it detects that and prints\nthe right command to run instead of overwriting a managed install.\n\n## Sign in\n\n```sh\nframehood login     # opens your browser (OAuth 2.1 + PKCE, loopback redirect)\nframehood logout    # remove stored credentials\nframehood whoami    # email, org role, balance and plan\n```\n\n`login` opens your browser to sign in. The token is stored at\n`~/.framehood/credentials.json` (`0600`) and refreshed automatically, so you\nrarely need to log in again. `whoami` aggregates your account view — email, org\nrole, credit balance, and current plan — in one block.\n\n::: tip\nThe studio can sign you in too: open it signed-out and run `/login` from the\ncommand palette. See [Studio](#studio-interactive).\n:::\n\n## Command reference\n\nEvery subcommand prints human-readable output and exits non-zero on error, so\nthey compose cleanly in scripts. Run any command with `--help` for its flags.\n\n### Generate\n\nOne-shot generation: submit a prompt, poll until the job finishes, print the\noutput URL.\n\n```sh\nframehood generate \"a red fox in the snow\"\nframehood generate --type audio --voice Rachel \"welcome to Framehood\"\nframehood generate --type video \"a drone shot over a coastline\"\n```\n\n| Flag | Default | Notes |\n|------|---------|-------|\n| `--type`, `-t` | `image` | `image` · `video` · `audio` |\n| `--out`, `-o` | by type | output filename (e.g. `image.jpg`, `video.mp4`, `audio.mp3`) |\n| `--action` | by type | override the tool action (`create`, `speak`, `scene`, …) |\n| `--tier` | — | image quality tier: `draft` · `fine` · `photo` |\n| `--format` | — | size preset, e.g. `landscape_16_9`, `square` |\n| `--actor` | — | route through an actor (`act_…`); implies a `scene` for video — *not yet enabled (experimental)* |\n| `--voice` | — | voice name for `--type audio` speech |\n\nDefaults per type: image → `image.create`, audio → `audio.speak`, video →\n`video.create`. Actor routing (`--actor`, which defaults video to `video.scene`)\nis experimental and not yet enabled.\n\n### Jobs\n\nYour generation-history feed, plus per-job cancel.\n\n```sh\nframehood jobs                                   # recent jobs (default view)\nframehood jobs list --kind flux_schnell --status running,succeeded\nframehood jobs cancel <job-id>\n```\n\n| Command | Purpose |\n|---------|---------|\n| `jobs` / `jobs list` | List recent jobs — the generation-history feed |\n| `jobs cancel <job-id>` | Cancel a non-terminal job (errors if it already finished) |\n\nList flags: `--kind` (filter by job kind), `--status` (comma-separated, e.g.\n`running,succeeded`), `--limit`/`-n` (1–100).\n\n### Billing\n\nCredits, plan, and subscription for your organization. Read views are open to\neveryone; subscription changes are owner-only.\n\n```sh\nframehood balance                    # top-level shortcut (back-compat)\nframehood billing                    # = billing plan\nframehood billing balance\nframehood billing plan\nframehood billing plans\nframehood billing transactions -n 20\nframehood billing preview <package>\nframehood billing change <package>\nframehood billing cancel             # cancel at period end\nframehood billing cancel --reactivate\n```\n\n| Command | Purpose |\n|---------|---------|\n| `billing balance` | Your current credit balance |\n| `billing plan` | Your current plan |\n| `billing plans` | The available packages |\n| `billing transactions` | Recent credit ledger, newest first (`--limit`/`-n`, 1–50) |\n| `billing preview <package>` | Prorated cost of switching to a package (owner only) |\n| `billing change <package>` | Switch the subscription, prorated (owner only) |\n| `billing cancel` | Cancel at period end; `--reactivate` resumes it (owner only) |\n\n`<package>` is a package id from `billing plans`. See\n[Credits & plans](/guide/billing) for how subscriptions and the shared org\ncredit pool work.\n\n### Library\n\nSearch your generated assets and manage the trash.\n\n```sh\nframehood library \"red fox\"              # search by prompt\nframehood library --type video -n 50     # filter by media type\nframehood library trashed                # list trashed assets\nframehood library trash <asset-id>\nframehood library restore <asset-id>\n```\n\n| Command | Purpose |\n|---------|---------|\n| `library [query]` | Search your generated assets |\n| `library trashed` | List trashed assets |\n| `library trash <asset-id>` | Move an asset to trash (recoverable for 10 days) |\n| `library restore <asset-id>` | Restore an asset from trash |\n\nList flags: `--type`/`-t` (`image` · `video` · `audio`), `--project` (filter to\na project id), `--limit`/`-n` (default 24).\n\n### Projects\n\nGroup your generations into personal or shared projects, and set the active\nproject so new generations land there automatically.\n\n```sh\nframehood project                                 # list your + shared projects\nframehood project create \"Campaign\" --shared --desc \"Q3 launch\"\nframehood project update <project-id> --name \"Q4 launch\"\nframehood project assign <asset-id> <project-id>  # omit id to unassign\nframehood project use <project-id>                # set active/default\nframehood project current                         # show the active project\nframehood project delete <project-id>\n```\n\n| Command | Purpose |\n|---------|---------|\n| `project` | List your personal + shared projects |\n| `project create <name>` | Create a project (`--shared`, `--desc`) |\n| `project update <project-id>` | Change name / visibility / description (owner only) |\n| `project assign <asset-id> [project-id]` | Put an asset in a project; omit the id to unassign |\n| `project use [project-id]` | Set the active/default project; omit the id to clear it |\n| `project current` | Show the active/default project |\n| `project delete <project-id>` | Delete a project (its assets stay in the library) |\n\nFlags: `create` takes `--shared` (org-wide; default is personal) and `--desc`.\n`update` takes `--name`, `--visibility` (`personal` · `shared`) and\n`--description` — only the fields you pass are changed.\n\n### Team / Org\n\nYour organization: members, spend, and management. Some actions are owner- or\nadmin-only.\n\n```sh\nframehood team                                  # list members\nframehood team spend                            # per-member credit spend\nframehood team trend --days 30                  # daily org spend (7–90)\nframehood team role alice@studio.com admin\nframehood team suspend alice@studio.com\nframehood team enable alice@studio.com\nframehood team invite bob@studio.com\nframehood team accept-invite <token>\nframehood team remove bob@studio.com\n```\n\n| Command | Purpose |\n|---------|---------|\n| `team` | List organization members |\n| `team spend` | Per-member credit spend |\n| `team trend` | Daily org credit spend (`--days`, 7–90; default 30) |\n| `team role <email> <member\\|admin>` | Change a member's role (owner only) |\n| `team suspend <email>` | Suspend a member (owner or admin) |\n| `team enable <email>` | Re-enable a suspended member (owner or admin) |\n| `team invite <email>` | Invite a member by email (owner only) |\n| `team accept-invite <token>` | Join an org with an invite token |\n| `team remove <email>` | Remove a member (owner only) |\n\n### Files\n\nManage your storage: list, upload, delete, publish/unpublish, and download.\n\n```sh\nframehood files                                     # list (default view)\nframehood files --prefix clips/\nframehood files upload https://example.com/clip.mp4 clip.mp4\nframehood files publish clip.mp4\nframehood files unpublish clip.mp4\nframehood files download clip.mp4 -o ./clip.mp4\nframehood files delete clip.mp4\n```\n\n| Command | Purpose |\n|---------|---------|\n| `files` / `files list` | List your files (`--prefix` to filter by key prefix) |\n| `files upload <url> <key>` | Upload a file from a URL |\n| `files delete <key>` | Delete a file |\n| `files publish <key>` | Make a file public |\n| `files unpublish <key>` | Make a published file private again |\n| `files download <key>` | Print a usable URL; with `-o <path>` write the file to disk |\n\n`download` resolves a URL for the file. Without `-o` it prints the URL; with\n`-o` it fetches the bytes (authenticated for private files) and saves them.\n\n### API keys\n\nProgrammatic API keys for the [REST API](/reference/api) and the CLI.\n\n```sh\nframehood keys                          # list (prefix + metadata)\nframehood keys create --name ci\nframehood keys delete <prefix-or-key>\n```\n\n| Command | Purpose |\n|---------|---------|\n| `keys` / `keys list` | List your keys (prefix + metadata) |\n| `keys create` | Mint a new key (`--name` is an optional label) |\n| `keys delete <prefix-or-key>` | Revoke a key by its prefix or full value |\n\n::: warning\nA newly created key's secret is **shown once** and can't be retrieved later.\nCopy it immediately.\n:::\n\n### Models, skills & workflows\n\nBrowse the model catalog, a single model's schema, its prompt guide, and the\nmulti-step workflows.\n\n```sh\nframehood models                        # list available models\nframehood models flux_schnell           # one model's full schema\nframehood skill flux_schnell            # parameters, tips, prompt guide\nframehood workflows                     # list multi-step workflows\nframehood workflows video_production    # one workflow's skill\n```\n\n| Command | Purpose |\n|---------|---------|\n| `models [kind]` | List models, or show one model's schema |\n| `skill <kind>` | Show a model's skill — parameters, tips, prompt guide |\n| `workflows [name]` | List workflows, or show one workflow's skill |\n\n### Config\n\nView or change CLI settings. The output directory is where the studio saves\nresults.\n\n```sh\nframehood config get\nframehood config set output-dir ~/Downloads\nframehood config set output-dir \"\"      # clear → current working directory\n```\n\n| Command | Purpose |\n|---------|---------|\n| `config get` | Print the resolved settings (output dir, config dir, MCP base) |\n| `config set <key> <value>` | Set a setting (currently `output-dir`) |\n\n`config set output-dir` expands a leading `~`, creates the directory if needed,\nand stores the absolute path. Clearing it reverts to the current working\ndirectory.\n\n## Studio (interactive)\n\nRun `framehood` with no subcommand to open the interactive studio — a\nfull-screen terminal app. It opens even when you're signed out (it shows a\n\"not signed in\" state; `/login` signs you in from inside).\n\n```sh\nframehood\n```\n\nThe compose box at the bottom is your main surface: type a prompt and press\n`enter` to generate. The hint line reads\n*type a prompt · / for commands · ⇥ to change action*.\n\n### The command palette\n\nPress `/` (from an empty input) to open the **command palette** — a filterable\ngrid of every action across all tools, plus built-in slash commands. Type to\nfilter; navigate with `← → ↑ ↓`; press `enter` to run the highlighted command;\n`esc` closes it.\n\nYou can also type a full command inline. `/image create a red fox` runs the\naction with that prompt straight away; `/balance` and `/billing balance` both\nresolve to the same action.\n\nBuilt-in slash commands:\n\n| Command | What it does |\n|---------|--------------|\n| `/help` | Toggle the full key help |\n| `/new` | Clear the current result and start fresh |\n| `/open` | Open the selected result in your browser |\n| `/copy` | Copy the selected result's URL to the clipboard |\n| `/save` | Save the selected result to your output directory |\n| `/history` | Jump to the generation-history view |\n| `/setdir` | Set the output directory for saved results |\n| `/login` · `/logout` · `/whoami` | Manage your session from inside the studio |\n| `/upgrade` | Self-update to the latest release |\n| `/quit` | Quit the studio |\n\nBeyond these, every tool action (`image create`, `billing\nbalance`, `files list`, `library list`, `project create`, `org members`,\n`api_keys create`, `jobs list`, …) is in the palette (actor actions such as\n`video scene` are experimental and not yet enabled). Read-only actions run\nimmediately; actions that need parameters open a small form (their label ends\nwith `›`); prompt-only actions close the palette and wait for you to type a\nprompt and press `enter`.\n\n### The work-action ring\n\nWith the palette closed and the input focused, cycle the **work action** — the\ngeneration action that `enter` submits:\n\n- `⇥` (Tab) — next action\n- `⇧⇥` (Shift+Tab) — previous action\n\nThis rings through the generation actions (image / video / audio and their\nvariants) so you can switch what you're creating without opening the palette.\n\n### Input history\n\nIn the compose box, recall previously submitted prompts with `↑` / `↓`. Pressing\n`↓` past the newest entry restores whatever draft you were typing.\n\n### Results & generation history\n\nWhile a job runs you see a live status indicator. Finished generations are kept\nin a persistent, paginated **history** (stored locally at\n`~/.framehood/history.json` — type, prompt, URL, and timestamp only; no tokens).\n\nIn the history pane:\n\n| Key | Action |\n|-----|--------|\n| `↑` / `↓` (or `k` / `j`) | Move the selection |\n| `⇞` / `⇟` (PgUp / PgDn) | Newer / older page |\n| `o` | Open the selected result in your browser |\n| `c` | Copy the result's URL to the clipboard |\n| `s` | Save the result to your output directory |\n| `u` | Use the result as input for the next action |\n\n### Output directory\n\nSaved results (via `s` or `/save`) go to your configured output directory.\nSet it from inside the studio with `/setdir` (or `/setdir ~/Downloads` inline),\nor from a shell with `framehood config set output-dir <path>`. Unset, it\ndefaults to the current working directory.\n\n### Quitting\n\nPress `ctrl+c` **twice** to quit — the first press arms the quit and shows\n*press ctrl+c again to quit*; any other key disarms it, so a stray `ctrl+c`\nnever exits on its own. `/quit` exits immediately.\n\n## Configuration\n\nThe CLI defaults to the production Framehood deployment. Override endpoints with\nenvironment variables for local development.\n\n| Env var | Default | Purpose |\n|---------|---------|---------|\n| `FRAMEHOOD_MCP_BASE` | `https://mcp.framehood.ai` | MCP + OAuth origin |\n| `FRAMEHOOD_API_BASE` | same as `FRAMEHOOD_MCP_BASE` | REST `/v1/…` origin |\n| `FRAMEHOOD_CONFIG_DIR` | `~/.framehood` | credentials and CLI state directory |\n\nPoint `FRAMEHOOD_MCP_BASE` at a local `wrangler dev` worker to develop against\nit. Credentials, the local studio history, and CLI settings all live under the\nconfig directory.\n"},{"id":"guide/mcp","section":"guide","title":"MCP server","url":"https://docs.framehood.ai/guide/mcp","text":"# MCP server\n\nFramehood's primary interface is the [Model Context Protocol](https://modelcontextprotocol.io).\nAny MCP-aware client can connect and use the [tools](/reference/tools) directly.\n\n## Endpoint\n\n```\nhttps://mcp.framehood.ai/mcp\n```\n\n- **Transport:** Streamable HTTP\n- **Auth:** OAuth 2.1 with PKCE and Dynamic Client Registration — your client\n  opens a browser for sign-in on first connect; no API key to copy.\n\n## Connect a client\n\n### Claude.ai (web & desktop app)\n\nAdd Framehood as a custom connector in [claude.ai](https://claude.ai): **Settings\n→ Customize → Connectors → + → Add custom connector**, paste\n`https://mcp.framehood.ai/mcp`, then **Connect** and approve the browser sign-in.\nSee the illustrated, step-by-step [Add to Claude.ai](/guide/claude-ai) guide.\n\n### Claude Code\n\n```sh\nclaude mcp add --transport http framehood https://mcp.framehood.ai/mcp\n```\n\nThen run `/mcp` inside Claude Code and complete the browser sign-in. Or install\nthe [Claude Code](/guide/plugin) plugin, which bundles this server.\n\n### Claude Desktop\n\nSettings → **Connectors** → **Add custom connector**, then enter:\n\n- **Name:** Framehood\n- **URL:** `https://mcp.framehood.ai/mcp`\n\nApprove the browser sign-in when prompted.\n\n### Cursor\n\nAdd to `~/.cursor/mcp.json` (or the project's `.cursor/mcp.json`):\n\n```json\n{\n  \"mcpServers\": {\n    \"framehood\": {\n      \"url\": \"https://mcp.framehood.ai/mcp\"\n    }\n  }\n}\n```\n\n### Generic MCP client\n\nAny client that supports remote (HTTP) MCP servers with OAuth works. Configure:\n\n```json\n{\n  \"mcpServers\": {\n    \"framehood\": { \"type\": \"http\", \"url\": \"https://mcp.framehood.ai/mcp\" }\n  }\n}\n```\n\n## First steps after connecting\n\n1. Call a tool, e.g. `image` with `{ \"action\": \"create\", \"prompt\": \"…\", \"out\": \"hero.jpg\" }`.\n2. If the response is a queued job, poll `get_status(job_id=…)` until it\n   succeeds; the output URL is in `outputs`.\n3. To transcribe an audio or video URL into timecoded segments, call\n   `qa(action=\"transcript\", video=…)` (or `audio=…` for a pure audio file).\n\nSee the full [tools reference](/reference/tools) for the complete toolset.\n\n## Notes\n\n- The OAuth issuer is the connection host (`mcp.framehood.ai`), so tokens are\n  scoped to this server.\n- Inside an organization, members without payment access can call\n  `billing(request_upgrade)` to email the owner; owners get payment and\n  org-management tools.\n"},{"id":"guide/plugin","section":"guide","title":"Framehood in Claude Code","url":"https://docs.framehood.ai/guide/plugin","text":"# Framehood in Claude Code\n\nThe **Framehood plugin** brings Framehood into **Claude Code** — the terminal\nCLI, the Claude desktop app's **Code**, and the IDE extensions. It bundles the\nMCP connection, a `/framehood:create` command, and skills, so Claude can generate\nmedia out of the box.\n\n## Install\n\nRun these two commands in Claude Code — the first adds the marketplace, the\nsecond installs the plugin:\n\n```\n/plugin marketplace add Framehood/framehood-plugin\n/plugin install framehood@framehood\n```\n\nYou'll need a [Framehood account](https://framehood.ai).\n\n## Authorize\n\nAfter installing, connect and sign in to the MCP server — otherwise the tools\nstay locked:\n\n1. Run **`/mcp`**.\n2. Select **framehood** in the list.\n3. Choose **Authenticate** and complete the sign-in in the browser tab that\n   opens.\n\nThat's it — no API key to paste. The token is scoped to `mcp.framehood.ai`.\n\n## What it adds\n\n- **MCP server** `framehood` → `https://mcp.framehood.ai/mcp`. Claude Code runs\n  the OAuth sign-in on connect.\n- **`/framehood:create`** — a command for one-line requests:\n\n  ```\n  /framehood:create a cinematic portrait of a lighthouse keeper as keeper.jpg\n  ```\n\n  Plugin commands are **namespaced by the plugin**, so it's `/framehood:create`,\n  not `/create` — type `/framehood` to find it in the slash-command menu.\n\n- **`framehood` skill** — teaches Claude the `image`, `video`, `audio`, `qa` and\n  `files` tools, uploading local files, the job→poll workflow, choosing a model,\n  prompt improvement, and credits, so natural requests (\"make a 5-second clip of\n  waves at sunset\") route correctly.\n- **`video-montage` skill** — assembling finished videos from parts (reels, ads,\n  mini-dramas, mini-docs), with built-in cinematography and editing know-how.\n\n## Use it naturally\n\nOnce installed and authorized you don't need the command — just ask:\n\n> Make me a hero image of a neon-lit alley, 16:9, and upscale it.\n\nClaude picks `image(create)` then `image(upscale)`, polls each job, and returns\nthe URLs.\n\n## Troubleshooting\n\n**The tools don't show up.**\nThe MCP server isn't connected. Run `/mcp`, check **framehood** is listed and\nshows **connected**, and authenticate it if it doesn't.\n\n**The `/framehood:create` command doesn't appear.**\nPlugin commands are namespaced — it's `/framehood:create`, not `/create`. Type\n`/framehood` to find it. If it's still missing right after installing, run\n`/reload-plugins` and check `/plugin` shows **framehood** enabled. (You rarely\nneed the command anyway — just ask in plain language.)\n\n**\"Not authenticated\", a 401, or a sign-in loop.**\nThe OAuth session expired or was never completed. Run `/mcp` → **framehood** →\n**Authenticate**, and finish the browser sign-in fully before returning.\n\n**`/plugin install` says the plugin or marketplace isn't found.**\nAdd the marketplace first, then install — in this order:\n`/plugin marketplace add Framehood/framehood-plugin`, then\n`/plugin install framehood@framehood`. Check the repo name is exact.\n\n**The sign-in tab never opens.**\nYour browser blocked the pop-up. Allow pop-ups and run `/mcp` → **Authenticate**\nagain.\n\n**\"Insufficient credits\" or a job is refused.**\nYou're out of credits. Check your balance and plan in the\n[Console](https://framehood.ai/app) — see [Credits & plans](/guide/billing).\n\n**A video or training job looks stuck.**\nLong jobs run for minutes and Claude polls them — just ask \"check the status\".\nAn unknown job id returns *no such job* rather than a fake status.\n\n**Claude forgets your preferences between sessions.**\nTeach it once and ask it to remember; make sure **memory** and **search across\nchats** are enabled in Claude's settings.\n\n**Newer models or skills aren't available.**\nFramehood ships updates often. Update the plugin\n(`/plugin` → **framehood** → update, or re-run `/plugin install framehood@framehood`)\nand reconnect with `/mcp`.\n\n## See also\n\nNot in Claude Code? Add Framehood to [Claude.ai](/guide/claude-ai) (web, desktop,\nmobile) as a connector, use the [CLI](/guide/cli), or connect any\n[MCP client](/guide/mcp).\n"},{"id":"guide/quickstart","section":"guide","title":"Quickstart","url":"https://docs.framehood.ai/guide/quickstart","text":"# Quickstart\n\nFramehood gives you image, video, and audio generation through one functional\ntoolset. Pick the interface that fits how you work — all three share the same\naccount and credit balance.\n\n::: tip ▶ Prefer to watch?\nSee the [3-minute setup video](/guide/claude-ai#video-walkthrough) — installing\nFramehood in Claude and running a first generation.\n:::\n\n## 1. Create an account\n\nSign up at [framehood.ai](https://framehood.ai). New accounts start on the free\ntier and generate within your included credits.\nSee [Credits & plans](/guide/billing).\n\n## 2. Choose an interface\n\n::: tip Which should I use?\n- Working inside **Claude or an MCP editor**? → [MCP server](/guide/mcp) (or the [Claude Code](/guide/plugin) plugin).\n- Working in the **terminal**? → [CLI](/guide/cli).\n- Building an **app**? → [REST API](/reference/api).\n:::\n\n### Fastest path: MCP\n\nPoint your MCP client at:\n\n```\nhttps://mcp.framehood.ai/mcp\n```\n\nYour client runs a browser sign-in (OAuth) on first connect. After that the\n`image`, `video`, `audio`, and other [tools](/reference/tools) are available.\n\n### Fastest path: CLI\n\n```sh\nframehood login\nframehood generate \"a red fox in the snow\"\n```\n\nSee the [CLI guide](/guide/cli) for install and the interactive studio.\n\n### Fastest path: REST\n\n```sh\ncurl -X POST https://api.framehood.ai/v1/jobs/flux_pro \\\n  -H \"Authorization: Bearer $TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"inputs\":{\"prompt\":\"a red fox in the snow\",\"out\":\"fox.jpg\"}}'\n```\n\nSee the [REST API reference](/reference/api).\n\n## 3. Generate\n\nEvery job costs credits proportional to the underlying model. Long jobs return a\n`job_id` you poll until it succeeds; the result is a URL to your output file.\nThe exact flow per interface is in each guide.\n"},{"id":"index","section":"root","title":"Framehood","url":"https://docs.framehood.ai/","text":"---\nlayout: home\n\nhero:\n  name: Framehood\n  text: Creative superpowers for AI Agents\n  tagline: Give your agent image, video, and audio tools — make a picture for your site, an infographic, a short clip, or a whole content factory, all in plain language without leaving your chat.\n  actions:\n    - theme: brand\n      text: Quickstart\n      link: /guide/quickstart\n    - theme: alt\n      text: Connect to Claude\n      link: /guide/claude-ai\n    - theme: alt\n      text: Models\n      link: /reference/models/\n\nfeatures:\n  - title: Right inside your chat\n    details: Ask your agent in plain language and Framehood makes the image, video, or audio — an ad, an infographic, a product shot, a short clip — without leaving your usual interface.\n  - title: The newest models, chosen for you\n    details: Think in outcomes; the server routes each request to the best current model (updated constantly) and packages editing and montage know-how so results come out polished.\n  - title: Everywhere you work\n    details: Reach it over MCP inside your agent, the Claude plugin, the REST API, or the CLI — one account and one credit balance across all of them.\n---\n\n::: info In active development\nFramehood is evolving fast — new models and skills land often, and you may hit\nthe occasional rough edge. Found a bug or have an idea? Just ask Claude to send\nfeedback and it reaches the team.\n:::\n\n## What is Framehood?\n\nFramehood gives your agent a set of **creative tools** — `image`, `video`,\n`audio`, `qa`, `files`. Ask in plain language and it makes or edits the media for\nyou, routing each request to the newest, best model for the job (updated\nconstantly) and drawing on built-in editing and montage know-how.\n\nYou can reach it three ways, all backed by the same account and credits:\n\n| Interface | Endpoint | Best for |\n|-----------|----------|----------|\n| **MCP** | `https://mcp.framehood.ai/mcp` | agents & editors (Claude, Cursor, …) |\n| **CLI** | `framehood` | scripts, pipelines, the terminal |\n| **REST** | `https://api.framehood.ai` | apps & custom integrations |\n\nStart with the [Quickstart](/guide/quickstart).\n"},{"id":"llm","section":"root","title":"For AI agents","url":"https://docs.framehood.ai/llm","text":"# For AI agents\n\nThis documentation is built to be read by agents as much as by people. If you are an\nLLM or operate one, everything here is available in machine-friendly form — no HTML\nparsing needed.\n\n## The one-request orientation\n\nFetch [`/llms.txt`](https://docs.framehood.ai/llms.txt) — an annotated map of every\npage with links to raw-markdown versions.\n\n## Raw markdown for any page\n\nAppend `.md` to any page URL:\n\n- `https://docs.framehood.ai/guide/quickstart` → [`/guide/quickstart.md`](https://docs.framehood.ai/guide/quickstart.md)\n- the landing page itself → [`/index.md`](https://docs.framehood.ai/index.md)\n\nEvery HTML page also carries a `<link rel=\"alternate\" type=\"text/markdown\">` head tag\npointing at its twin.\n\n## The whole docs in one request\n\n- [`/llms-full.txt`](https://docs.framehood.ai/llms-full.txt) — all pages concatenated\n  (small enough for any context window).\n- [`/agent-corpus.json`](https://docs.framehood.ai/agent-corpus.json) — structured JSON:\n  `[{ id, section, title, url, text }]` per page, regenerated on every deploy.\n\n## Give your agent Framehood itself\n\nThe docs describe the product; the product is also agent-native. Connect Framehood to\nClaude or any MCP client and generate image, video and audio from inside the agent:\nsee the [MCP guide](/guide/mcp) and the [quickstart](/guide/quickstart).\n"},{"id":"reference/api","section":"reference","title":"REST API","url":"https://docs.framehood.ai/reference/api","text":"# REST API\n\nThe REST API is for apps and custom integrations. Most users should prefer\n[MCP](/guide/mcp) or the [CLI](/guide/cli), which wrap this.\n\n## Base URL\n\n```\nhttps://api.framehood.ai\n```\n\n`api.framehood.ai` is the primary REST host. `worker.framehood.ai` is an\nequivalent legacy alias kept during the `api.*` transition — it serves the exact\nsame backend and handlers, so every path below works identically on either host.\nPrefer `api.framehood.ai` for new integrations.\n\n## Authentication\n\nSend a bearer token on every request:\n\n```\nAuthorization: Bearer <token>\n```\n\nA token is either:\n\n- a **session token** (what the web console uses), or\n- an **API key** — create one in the console (Settings → API keys) or via\n  `POST /api-keys`.\n\n## Job lifecycle\n\nGeneration is asynchronous. Submit a job, then poll it until it reaches a\nterminal status (`succeeded` / `failed` / `cancelled`).\n\n### Submit a job\n\n```\nPOST /v1/jobs/{kind}\n```\n\n`{kind}` is a model kind (list them with `GET /v1/models`). The body carries the\nmodel inputs:\n\n```sh\ncurl -X POST https://api.framehood.ai/v1/jobs/flux_pro \\\n  -H \"Authorization: Bearer $TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"inputs\":{\"prompt\":\"a red fox in the snow\",\"out\":\"fox.jpg\"}}'\n```\n\nAdd `?wait=25` to block up to 25s for fast jobs and return the finished result\ninline instead of a `job_id`.\n\nA queued response:\n\n```json\n{ \"job_id\": \"job_…\", \"kind\": \"flux_pro\", \"status\": \"queued\",\n  \"status_url\": \"/v1/jobs/job_…\" }\n```\n\n::: info REST vs MCP: `next_step`\nOn the REST API, job responses keep the legacy prose-string `next_step`\npolling hint. The MCP surface replaced it with a machine-readable object\n`{tool, action?, args?, why}` (server 2.12.0). REST convergence is planned.\n:::\n\n### Poll a job\n\n```\nGET /v1/jobs/{job_id}\n```\n\nWhen `status` is `succeeded`, the result URLs are in `outputs`\n(`image_url` / `video_url` / `audio_url`). When `failed`, see `error`.\n\n### Poll many jobs at once\n\n```\nGET /v1/jobs/batch?ids=job_a,job_b,…\n```\n\nUp to 50 comma-separated ids per call — the parallel-submit pattern: fire all\nyour submits, collect the ids, then batch-poll every 30–60s until\n`summary.running + summary.queued` is 0.\n\n```json\n{\n  \"summary\": { \"total\": 3, \"queued\": 0, \"running\": 1, \"succeeded\": 1,\n               \"failed\": 0, \"cancelled\": 0, \"not_found\": 1 },\n  \"jobs\": [\n    { \"job_id\": \"job_a\", \"status\": \"succeeded\", \"done\": true,\n      \"outputs\": { \"image_url\": \"…\" }, \"credits\": 12, \"…\": \"…\" },\n    { \"job_id\": \"job_b\", \"status\": \"running\", \"done\": false, \"…\": \"…\" },\n    { \"job_id\": \"job_c\", \"status\": \"not_found\" }\n  ]\n}\n```\n\nEach entry mirrors the single-job compact shape (`done` is true once the job is\nterminal; `failed` entries carry the structured `error {code, message}`). An id\nthat doesn't exist — or that belongs to another account — comes back as\n`status: \"not_found\"`; it never fails the whole request. More than 50 ids is a\n`400`.\n\n### Other job endpoints\n\n| Method & path | Purpose |\n|---------------|---------|\n| `GET /v1/jobs` | list your recent jobs (filters: `kind`, `status`, `since` — RFC3339 or unix seconds, `limit`, `cursor`) |\n| `POST /v1/jobs/{job_id}/cancel` | cancel a queued/running job |\n\n## Models & guidance\n\n| Method & path | Purpose |\n|---------------|---------|\n| `GET /v1/models` | list available model kinds |\n| `GET /v1/models/{kind}` | input schema for a kind |\n| `GET /v1/models/{kind}/prompt-guide` | prompting guide for a kind |\n| `GET /v1/models/{kind}/skill` | agent skill (usage guidance) for a kind |\n| `GET /v1/skill` | entry-point skill (how to drive the API) |\n| `GET /v1/workflows` | list available multi-step workflows |\n| `GET /v1/workflows/{name}/skill` | skill for a named workflow |\n| `GET /v1/health` | liveness (`{ \"ok\": true }`) — no auth |\n\n## Files\n\n| Method & path | Purpose |\n|---------------|---------|\n| `GET /files` | list your stored outputs |\n| `GET /files/{key}` | download a file |\n| `DELETE /files/{key}` | delete a file |\n| `POST /files/{key}/publish` | make a file public |\n| `POST /files/{key}/unpublish` | make a published file private again |\n| `GET /files/public/{userId}/{key}` | public file (no auth) |\n\n### Uploads\n\nBring your own input (a reference image, audio, etc.) into your private storage,\nthen reference the returned `url` in a job's inputs.\n\n| Method & path | Purpose |\n|---------------|---------|\n| `PUT /upload?key={key}` | stream a file body directly to storage |\n| `POST /upload-from-url` | fetch a remote file into storage (`{ \"source_url\": \"…\", \"key\": \"…\" }`) |\n\n::: warning Breaking change: `/upload-from-url` error responses and accepted sources\n`POST /upload-from-url` now enforces the same rules as the `files(import_remote)`\nMCP action above:\n\n- Only `https://` source URLs are accepted (`http://` is rejected).\n- Only image, video, and audio files are accepted — a PDF, SVG, or other\n  non-media file that previously stored successfully is now rejected.\n- **Error responses changed shape.** They used to be a free-text\n  `{ \"error\": \"<sentence>\" }`. They are now a structured code:\n  `{ \"error\": \"<code>\", \"message\": \"<detail>\" }`, where `<code>` is one of\n  `url_not_allowed` (400), `unsupported_media_type` (415), `too_large` (413),\n  `external_file_expired` (502), `fetch_failed` (502). If your integration\n  matched on the old error text, update it to match on `error`.\n\nThe success response shape (`{ ok, key, r2_key, url, content_type, size }`)\nand the request body are unchanged.\n:::\n\n## Billing\n\nRead endpoints are available to any org member. The endpoints that change a\nsubscription, card, or plan are **owner-only** (enforced server-side; a member\ngets `403 forbidden`).\n\n| Method & path | Purpose |\n|---------------|---------|\n| `GET /billing/balance` | current credit balance |\n| `GET /billing/transactions` | credit ledger |\n| `GET /billing/subscription` | current subscription (status, allowance, balance, role) |\n| `GET /billing/plans` | available credit packages |\n| `GET /billing/manage` | owner: card on file, cancel state, recent invoices (in-app billing) |\n| `POST /billing/checkout` | owner: start a Stripe Checkout for a package (`{ \"package\": \"…\" }`) → `{ url }` |\n| `POST /billing/change` | owner: switch the active subscription to another package, prorated (`{ \"package\": \"…\" }`) |\n| `POST /billing/preview` | owner: preview the prorated cost + credits of a switch (`{ \"package\": \"…\" }`) |\n| `POST /billing/cancel` | owner: cancel at period end, or `{ \"reactivate\": true }` to resume |\n| `POST /billing/topup` | buy a one-off batch of extra credits now (`{ \"amount_eur\": 20 }`, min €20; credits at the extra-usage rate) → a hosted invoice `{ url }` (an owner with a saved card is charged automatically) |\n| `GET / PUT /billing/extra-usage` | owner: view / configure automatic overflow top-ups (Extra usage) |\n| `GET /billing/extra-usage/charges` | Extra-usage charge history (date, amount, credits, receipt link) |\n| `POST /billing/card` | owner: open the Stripe card-entry page → `{ url }` |\n| `POST /billing/card-finalize` | owner: set the just-added card as default (`{ \"session\": \"…\" }`) |\n| `POST /billing/portal` | owner: open the Stripe customer portal → `{ url }` |\n\n## Library & projects\n\n| Method & path | Purpose |\n|---------------|---------|\n| `GET /library` | search assets (`q`, `type`, `project`, `limit`, `offset`) |\n| `GET /library/trash` | list trashed assets (auto-purged after 10 days) |\n| `POST /library/{id}/trash`, `POST /library/{id}/restore` | trash / restore an asset |\n| `POST /library/{id}/project` | assign to a project (`{ \"project_id\": \"…\\|null\" }`) |\n| `GET / POST /projects` | list / create projects |\n| `GET /projects/active` | get your active (default) project |\n| `PUT / POST /projects/active` | set your active (default) project (`{ \"project_id\": \"…\\|null\" }`) |\n| `PATCH / DELETE /projects/{id}` | rename·revisibility / delete (owner) |\n\n## Keys, actors, orgs\n\n::: warning Actors temporarily disabled\nActor endpoints are part of the actor feature, which is **temporarily disabled**\nwhile we rework it — see [Actor models](/reference/models/actors).\n:::\n\n| Method & path | Purpose |\n|---------------|---------|\n| `GET / POST /api-keys`, `DELETE /api-keys/{key}` | manage API keys |\n| `GET /actors` | list your actors |\n| `GET /orgs`, `/orgs/members`, `/orgs/spend`, `/orgs/spend/trend` | organization info + daily spend |\n| `POST /orgs/invites` | owner: invite by email; returns the join link (`{ \"email\": \"…\", \"role\": \"member\\|owner\" }`) |\n| `POST /orgs/invites/accept` | accept an invite (`{ \"token\": \"…\" }`) |\n| `PATCH /orgs/members/{userId}` | change role / suspend (`{ \"role\": \"admin\" }` or `{ \"suspended\": true }`) |\n| `DELETE /orgs/members/{userId}` | remove a member (owner) |\n\n::: tip\nThe same operations are available over [MCP](/reference/tools) without managing\ntokens — your client handles auth.\n:::\n"},{"id":"reference/models/actors","section":"reference","title":"Actor models","url":"https://docs.framehood.ai/reference/models/actors","text":"# Actor models\n\n::: warning Temporarily disabled\nActor models — LoRA training, actor-consistent generation, and voice cloning —\nare **temporarily disabled** while we rework them. The `actor` tool and the\nactor-dependent actions (`image(actor_sheet)`, `video(scene)`, `actor_id`\nrouting) are not currently available. This page is kept for reference.\n:::\n\nPersistent characters: LoRA training, actor-consistent generation, and voice cloning.\n\n> Generations are charged in credits (see [Credits & plans](/guide/billing)). Every generation model also accepts `mock: true` for a free placeholder result.\n\n### Actor LoRA Training `actor_lora_train`\n\nFine-tunes a Flux LoRA on a ZIP of 4–30 reference images, producing a downloadable LoRA `.safetensors` URL for a persistent actor identity.\n\n**Call it via** — MCP tool `actor` action `create` (the `create` path submits training to `actor_lora_train` after registering the actor) · raw: `POST /v1/jobs/actor_lora_train`\n\n| | |\n|---|---|\n| **Cost** | 500 cr per call |\n| **Mode / timeout** | webhook / 20m |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `images_data_url` | string | ✓ | — | publicly-accessible URL or base64 data URI | URL to a ZIP archive of training images. Use at least 4 (more is better). The archive may also contain per-image caption `.txt` files and `*_mask.jpg` mask files sharing the image's name. |\n| `trigger_word` | string |  | — (none) | — | Trigger word used in captions. If omitted, no trigger word is used; if no captions are supplied, the trigger word is used in place of captions. |\n| `create_masks` | boolean |  | `true` | true / false | If true, segmentation masks weight the training loss (a face mask is used for people when possible). |\n| `steps` | integer |  | — (unspecified) | — | Number of training steps for the LoRA. |\n| `is_style` | boolean |  | `false` | true / false | If true, trains a style LoRA: deactivates segmentation and auto-captioning and uses the trigger word to specify the style. |\n| `is_input_format_already_preprocessed` | boolean |  | `false` | true / false | If false, expects raw input (image + matching caption file by name). Set true if the data is already in the preprocessed format. |\n| `data_archive_format` | string |  | — (inferred from URL) | e.g. `zip` | Archive format. If unspecified, inferred from the URL. |\n\nOur wrapper params (not part of the model schema): `out` (required — output filename for the resulting LoRA file) and `mock` (optional — returns a test placeholder instead of running real training). No `format`/size mapping applies to this model (it has no size field; our YAML `format_field` is empty).\n\n**Limits** — the model states a practical minimum of ~4 images (more recommended); our wrapper documents a 4–30 reference-image range. No max resolution, file-size, or hard image-count cap is published for this endpoint, so none is asserted here.\n\n### Flux LoRA Inference `actor_lora_inference`\n\nFLUX.1 [dev] text-to-image generation with one or more custom LoRA adaptations — used internally to render an actor with its trained likeness LoRA.\n\n**Call it via** — `image(create, actor_id=…, prompt=…)` (also used internally by `image(actor_sheet)`, `image(animate)` first-frame, `actor(batch)`, and `video(scene)`; the Worker injects the actor's LoRA path + scale and prepends the trigger_word) · raw: `POST /v1/jobs/actor_lora_inference`\n\n| | |\n|---|---|\n| **Cost** | Billed per megapixel — ≈7 cr per image at the ~1 MP presets |\n| **Mode / timeout** | sync / 60s (from our YAML) |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `prompt` | string | ✓ | — | — | The prompt to generate an image from. |\n| `image_size` | ImageSize \\| enum | | `portrait_16_9` (our default; the model's own is `landscape_4_3`) | `square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9` — or an object `{width, height}` for custom size | The size of the generated image. |\n| `num_inference_steps` | integer | | `28` | — | The number of inference steps to perform. |\n| `seed` | integer | | — | — | Same seed + same prompt + same model version yields the same image. |\n| `loras` | list&lt;LoraWeight&gt; | | — | each item: `{path` (string, required)`, scale` (float, default `1`)`}` | The LoRAs to use; any number may be supplied and are merged. |\n| `guidance_scale` | float | | `3.5` | — | CFG scale — how closely the model sticks to the prompt. |\n| `sync_mode` | boolean | | — | — | If true, media is returned as a data URI and not stored in request history. |\n| `num_images` | integer | | `1` | — | Number of images to generate (always 1 for streaming output). |\n| `enable_safety_checker` | boolean | | `true` | — | Enables the safety checker. |\n| `output_format` | enum | | `jpeg` | `jpeg, png` | The format of the generated image. |\n| `acceleration` | enum | | `none` | `none, regular` | Acceleration level; `regular` balances speed and quality. |\n\nOur wrapper params (not part of the model schema): `out` (required — workdir-relative output filename), `mock` (optional — test placeholder). The `image`/`video` MCP tools accept the same friendly `format` names on the actor path as on the plain path and normalize them to this model's `image_size` enum before submitting: `portrait`/`vertical`/`shorts`/`reels`/`9:16` → `portrait_16_9` (default), `landscape`/`horizontal`/`wide`/`16:9` → `landscape_16_9`, `square`/`1:1` → `square_hd`, `3:4` → `portrait_4_3`, `4:3` → `landscape_4_3`. Matching is case-insensitive (the value is lowercased before lookup). (Raw `image_size` enum values pass through unchanged; an unrecognised value falls back to the default rather than erroring.)\n\n**Limits** — `image_size` enum is fixed to the six named values above; custom sizes are passed as a `{width, height}` object (model default `512×512`). `num_images` defaults to 1 and is forced to 1 for streaming output. No max-resolution / file-size / character limit is published for the model beyond these.\n\n### Actor Voice Clone (IVC) `actor_voice_clone`\n\nInstant Voice Cloning from one or more audio samples; returns an ElevenLabs `voice_id` that can be stored on an actor and reused for text-to-speech.\n\n**Call it via** — `actor` tool, `create` or `update` action with `voice_sample_url` set (the worker submits the clone job automatically and stores the returned `voice_id` on the actor) · raw: `POST /v1/jobs/actor_voice_clone`\n\n| | |\n|---|---|\n| **Cost** | 1 cr per call |\n| **Mode / timeout** | sync / 120s |\n\n**Parameters** — the model's input schema (`POST /v1/voices/add`, multipart form):\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `name` | string | ✓ | — | — | The name that identifies this voice (shown in the voice dropdown). |\n| `files` | file[] (multipart) | ✓ | — | audio recordings | A list of audio recordings intended for voice cloning. |\n| `remove_background_noise` | boolean | — | `false` | `true` / `false` | If set, removes background noise from samples via the audio-isolation model. If samples have no background noise, this can reduce quality. |\n| `description` | string \\| null | — | `null` | — | A description of the voice. |\n| `labels` | map&lt;string,string> \\| string \\| null | — | `null` | keys: language, accent, gender, age | Labels for the voice (free-form metadata). |\n\n**Our wrapper params** (not part of the model schema): `out` (required — output filename for the job result), `mock` (optional — test placeholder, skips real generation). Our YAML exposes the model's `files` field as `sample_urls` (an array of public audio URLs); the Go adapter downloads each URL and submits it as a multipart `files` entry. This model has no `format`→size mapping (`format_field` is empty).\n\n**Limits** — Request is a multipart form accepting multiple audio files. No hard max-file-count or per-file size is published in this endpoint's reference; documented guidance is to provide clean samples totaling ~1–3 minutes. `labels` keys are restricted to language, accent, gender, or age. Other hard limits (exact max file count / file size / supported codecs) are not stated in the endpoint reference and are omitted.\n"},{"id":"reference/models/audio","section":"reference","title":"Audio models","url":"https://docs.framehood.ai/reference/models/audio","text":"# Audio models\n\nSpeech, sound effects, and music (model input schemas), plus local audio processing (our ffmpeg implementation, free).\n\n> Generations are charged in credits (see [Credits & plans](/guide/billing)). Every generation model also accepts `mock: true` for a free placeholder result.\n\n### ElevenLabs TTS v3 `elevenlabs_tts_v3`\n\nExpressive text-to-speech with inline audio-tag emotional control and 70+ language support, powered by ElevenLabs' Eleven v3 model.\n\n**Call it via** — `audio(action: \"speak\")` (MCP `audio` tool) · raw: `POST /v1/jobs/elevenlabs_tts_v3`\n\n| | |\n|---|---|\n| **Cost** | 20 cr per 1,000 characters |\n| **Mode / timeout** | sync / 60s |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `text` | string | ✓ | — | — | Text to convert to speech. Supports inline audio tags like `[laughs]`, `[whispers]`, `[excited]`. |\n| `voice` | string |  | `Rachel` | e.g. Aria, Roger, Sarah, Laura, Charlie, George, Callum, River, Liam, Charlotte, Alice, Matilda, Will, Jessica, Eric, Chris, Brian, Daniel, Lily, Bill (or a voice ID) | Voice name or ID. |\n| `stability` | float |  | `0.5` | 0–1 | Voice stability. Lower = more expressive variation; higher = more consistent delivery. |\n| `similarity_boost` | float |  | `0.75` | 0–1 | How closely the output matches the reference voice. |\n| `speed` | float |  | `1` | — | Playback speed multiplier. |\n| `language_code` | string |  | — | ISO 639-1 (e.g. en, ru, es, fr, de, ja, ko, zh) | Forces a specific output language. |\n| `apply_text_normalization` | enum |  | `auto` | `auto`, `on`, `off` | Controls spelling-out of numbers, abbreviations, etc. |\n| `seed` | int |  | — | — | Random seed for reproducibility. |\n| `timestamps` | bool |  | `false` | — | When true, returns per-word timestamps in the response. |\n| `output_format` | enum |  | `mp3_44100_128` | mp3_22050_32, mp3_44100_32, mp3_44100_64, mp3_44100_96, mp3_44100_128, mp3_44100_192, pcm_8000, pcm_16000, pcm_22050, pcm_24000, pcm_44100, pcm_48000, ulaw_8000, alaw_8000, opus_48000_32, opus_48000_64, opus_48000_96, opus_48000_128, opus_48000_192 | Output codec, sample rate, and bitrate. |\n\nOur wrapper params (not part of the model schema): `out` (required — workdir-relative output path, `.mp3`) and `mock` (optional — test placeholder, no real generation). This model does not use the `format`→size mapping (`format_field` is empty).\n\n**Limits** — Pricing is 20 cr per 1,000 characters (a 500-char paragraph = 10 cr; a 10,000-char story = 200 cr). Supported output formats: MP3 (22.05/44.1 kHz, 32–192 kbps), PCM (8–48 kHz), µ-law/A-law 8 kHz, Opus 48 kHz (32–192 kbps). 70+ languages supported. No hard maximum character count is published.\n\n### ElevenLabs TTS (direct) `elevenlabs_tts_direct`\n\nConverts text into speech using a chosen ElevenLabs `voice_id` (cloned, linked, or library voice) and returns an audio file.\n\n**Call it via** — `audio(speak, actor_id=…)` (routes a configured actor's voice through this model; plain `audio(speak)` without `actor_id` uses `elevenlabs_tts_v3` instead). Also used internally by `video(scene)` for per-line narration. · raw: `POST /v1/jobs/elevenlabs_tts_direct`\n\n| | |\n|---|---|\n| **Cost** | 20 cr per call |\n| **Mode / timeout** | sync / 60s |\n\n**Parameters** — the model's input schema (`voice_id` is a path parameter; the rest are request-body fields):\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `voice_id` | string | ✓ | — | — | Path param. ID of the voice to use (from Get Voices). |\n| `text` | string | ✓ | — | — | The text that will be converted into speech. |\n| `model_id` | string |  | `eleven_multilingual_v2` | any TTS-capable model id | Model identifier; must support text-to-speech. |\n| `language_code` | string \\| null |  | null | ISO 639-1 | Enforces a language for the model and text normalization. |\n| `voice_settings` | object \\| null |  | null | see sub-properties | Per-request overrides of the voice's stored settings. |\n| `voice_settings.stability` | number |  | 0.5 | 0.0–1.0 | How stable the voice is / randomness between generations. |\n| `voice_settings.similarity_boost` | number |  | 0.75 | 0.0–1.0 | How closely the AI adheres to the original voice. |\n| `voice_settings.style` | number |  | 0 | 0.0–1.0 | Style exaggeration of the voice. |\n| `voice_settings.use_speaker_boost` | boolean |  | true | true/false | Boosts similarity to the original speaker. |\n| `voice_settings.speed` | number |  | 1.0 | ~0.7–1.2 | Playback speed; &lt;1 slows, >1 speeds up. |\n| `seed` | integer \\| null |  | null | 0–4294967295 | Best-effort deterministic sampling. |\n| `previous_text` | string \\| null |  | null | — | Text preceding this request, for continuity. |\n| `next_text` | string \\| null |  | null | — | Text following this request, for continuity. |\n| `previous_request_ids` | string[] \\| null |  | null | max 3 | Request ids of prior samples, for continuity. |\n| `next_request_ids` | string[] \\| null |  | null | max 3 | Request ids of later samples, for continuity. |\n| `pronunciation_dictionary_locators` | object[] \\| null |  | null | max 3 | Pronunciation dictionary locators (id, version_id). |\n| `apply_text_normalization` | enum |  | `auto` | `auto`, `on`, `off` | Controls number/date spell-out normalization. |\n| `apply_language_text_normalization` | boolean |  | false | true/false | Language-specific normalization (Japanese only; raises latency). |\n| `output_format` | enum (query) |  | `mp3_44100_128` | `mp3_22050_32`, `mp3_44100_32/64/96/128/192`, `pcm_8000/16000/22050/24000/44100`, `ulaw_8000`, `alaw_8000`, `opus_48000_*`, etc. (28 values) | Query param. `codec_samplerate_bitrate`; mp3_192 needs Creator+, pcm/wav 44.1kHz needs Pro+. |\n| `enable_logging` | boolean (query) |  | true | true/false | Query param. false = zero-retention mode (enterprise only). |\n\nOur wrapper params (not part of the model schema): `out` (required — output audio filename, mp3) and `mock` (optional — test placeholder). This model has no `format`→size mapping (`format_field` is empty in our YAML).\n\n**Limits** — model limits: `seed` 0–4294967295; up to 3 `pronunciation_dictionary_locators`, 3 `previous_request_ids`, 3 `next_request_ids` per request; output formats limited to the 28 `output_format` enum values (mp3 192kbps requires Creator tier or above; PCM/WAV at 44.1kHz requires Pro tier or above). No hard maximum text length is published for this endpoint, so no character cap is asserted here (our YAML's \"keep under 5000 characters\" is guidance, not a confirmed limit).\n\n### ElevenLabs Sound Effects `elevenlabs_sfx`\n\nGenerate sound effects (foley, ambience, UI, impacts) from a text description using ElevenLabs' Sound Effects V2 model.\n\n**Call it via** — `audio(sfx)` (the `audio` MCP tool with `action: \"sfx\"`; pass your description in `prompt`, which the worker maps to the model's `text` field) · raw: `POST /v1/jobs/elevenlabs_sfx`\n\n| | |\n|---|---|\n| **Cost** | Billed per second of audio |\n| **Mode / timeout** | sync / 60s |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `text` | string | ✓ | — | max 450 characters | The text describing the sound effect to generate. |\n| `duration_seconds` | number | | none (model decides) | `0.5`–`22` (nullable) | Duration in seconds. If omitted/null, optimal duration is determined from the prompt. |\n| `prompt_influence` | number | | `0.3` | `0`–`1` | How closely to follow the prompt. Higher values mean less variation. |\n| `output_format` | string (enum) | | `mp3_44100_128` | `mp3_22050_32`, `mp3_44100_32`, `mp3_44100_64`, `mp3_44100_96`, `mp3_44100_128`, `mp3_44100_192`, `pcm_8000`, `pcm_16000`, `pcm_22050`, `pcm_24000`, `pcm_44100`, `pcm_48000`, `ulaw_8000`, `alaw_8000`, `opus_48000_32`, `opus_48000_64`, `opus_48000_96`, `opus_48000_128`, `opus_48000_192` | Output audio format, as `codec_sampleRate_bitrate`. |\n| `loop` | boolean | | `false` | `true` / `false` | Whether to create a sound effect that loops smoothly. |\n\nOur wrapper params (not part of the model schema): `out` (required — workdir-relative output path, e.g. `.mp3`) and `mock` (optional — test placeholder). No `format` mapping applies to this model (`format_field` is empty).\n\n**Limits** — model limits:\n- `text`: max 450 characters.\n- `duration_seconds`: 0.5–22 seconds.\n- `prompt_influence`: 0–1.\n- Output codecs: MP3 (22.05/44.1 kHz, 32–192 kbps), PCM (8–48 kHz), μ-law/A-law 8 kHz, Opus 48 kHz (32–192 kbps).\n\n### Minimax Music v2.6 `minimax_music`\n\nMiniMax Music 2.6 creates complete tracks with singing, backing music, and detailed arrangements from a style description and optional lyrics.\n\n**Call it via** — `audio(music)` MCP tool · raw: `POST /v1/jobs/minimax_music`\n\n| | |\n|---|---|\n| **Cost** | 30 cr per call |\n| **Mode / timeout** | webhook / 8m (from our YAML) |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `prompt` | string | ✓ | — | 10–2000 chars | Description of the music style, mood, genre, and scenario. |\n| `lyrics` | string | | `\"\"` | max 3500 chars | Song lyrics. Use `\\n` to separate lines. Supports structure tags: `[Intro]`, `[Verse]`, `[Pre Chorus]`, `[Chorus]`, `[Post Chorus]`, `[Hook]`, `[Bridge]`, `[Interlude]`, `[Transition]`, `[Build Up]`, `[Break]`, `[Inst]`, `[Solo]`, `[Outro]`. Required when `is_instrumental` is false. |\n| `lyrics_optimizer` | boolean | | `false` | true / false | When true and `lyrics` is empty, auto-generates lyrics from the prompt. |\n| `is_instrumental` | boolean | | `false` | true / false | When true, generates vocal-free instrumental music. |\n| `audio_setting` | object | | — | see below | Audio configuration settings (object). |\n| `audio_setting.sample_rate` | integer | | `44100` | 16000, 24000, 32000, 44100 | Sample rate of generated audio (Hz). |\n| `audio_setting.bitrate` | integer | | `256000` | 32000, 64000, 128000, 256000 | Bitrate of generated audio (bps). |\n| `audio_setting.format` | string | | `mp3` | mp3, wav, pcm | Output audio format. |\n\nOur wrapper params (not part of the model schema): `out` (required — workdir-relative output path, e.g. `.mp3`), `mock` (optional — test placeholder). This model has no `format_field`, so our `format` wrapper is not used here.\n\n**Limits** — model limits: `prompt` 10–2000 characters; `lyrics` max 3500 characters; output formats mp3 / wav / pcm; sample rate up to 44100 Hz; bitrate up to 256000 bps. Lyrics are required when `is_instrumental` is false.\n\n### Audio Concat `audio_concat`\n\n| Field | Value |\n|---|---|\n| Category | audio_process |\n| Mode | sync |\n| Timeout | 30s |\n| Cost | Free (cost_per_unit: 0) |\n| Handler | `execAudioConcat` → `AudioConcat` (`internal/ffmpeg/audio_concat.go`) |\n| MCP route | `audio(action: \"concat\")` — maps the tool's `tracks[]` arg to the model's `files` field |\n\n**Description:** Concatenate multiple audio files in order. Accepts a mix of input formats — every input is decoded and re-encoded to the target output format, then joined with ffmpeg's concat demuxer (`-c copy`, no second re-encode).\n\n**Parameters** (from YAML `input_schema`, cross-checked against handler):\n\n| Param | Type | Required | Default | Notes |\n|---|---|---|---|---|\n| `files` | array of string | yes | — | Ordered list of audio paths (any mix of mp3/wav/aac/flac/ogg). Handler errors if empty; non-string entries rejected. |\n| `out` | string | yes | — | Output audio path. |\n| `silence_between` | number | no | 0 | Seconds of silence inserted between files (not after the last). Implemented via generated `anullsrc` mono 44.1 kHz segments. |\n| `output_format` | string | no | inferred from `out` ext, else mp3 | enum: mp3, aac, wav, flac, ogg. Read by handler ✓. |\n| `sample_rate` | integer | no | source rate | Target Hz; applied via `-ar`. Read by handler ✓. |\n\n**Behaviour notes:**\n- **Single-file fast path:** with one file and `silence_between <= 0`, if input/output extensions match and no `sample_rate` is given, it byte-copies the file (acts as a pass-through). Otherwise it delegates to `AudioConvert` — i.e. a single file makes this a format converter.\n- Codec mapping (via `outputCodecArgs`): wav→pcm_s16le, flac→flac, ogg→libvorbis 192k, aac→aac 192k, default→libmp3lame 192k.\n- Concat-list injection is guarded: a file path containing a quote or newline is rejected.\n- Returns `outputs.audio` / `outputs.local_path` plus metrics (`num_files`, `total_duration_sec`, `silence_between`).\n\n---\n\n### Audio-Only Mix `audio_only_mix`\n\n| Field | Value |\n|---|---|\n| Category | audio_process |\n| Mode | sync |\n| Timeout | 2m |\n| Cost | Free (cost_per_unit: 0) |\n| Handler | `execAudioOnlyMix` → `AudioOnlyMix` (`internal/ffmpeg/audio_only_mix.go`) |\n| MCP route | `audio(action: \"mix\")` — passes `tracks[]` (and the optional `music` / `music_level`) through |\n\n**Description:** Mix audio files into a single audio file. Two modes: a **flat mix** of 2+ tracks with ffmpeg's `amix` filter, or — when the optional `music` bed is set — a **music-under-voice** mix where `tracks` are the primary program (1+ allowed) and the bed is auto-fit to their length and ducked under them. Unlike `video_audio_mix` (which overlays audio onto a video), this produces a pure audio file with no video track.\n\n**Parameters:**\n\n| Param | Type | Required | Default | Notes |\n|---|---|---|---|---|\n| `tracks` | array of string | yes | — | Audio paths. Flat mix: min 2, all at equal level. With `music`: the primary program (e.g. voiceover), min 1. |\n| `music` | string | no | — | Optional background music bed. When set, the bed is auto-fit to the tracks' length (trimmed if longer, looped if shorter) and ducked under them. |\n| `music_level` | number | no | `-18` | Music bed level in dB relative to the voice (used only with `music`). |\n| `out` | string | yes | — | Output audio path. |\n\n**Behaviour notes (code-only, not exposed as params):**\n- Flat mix: all tracks are mixed at **equal levels**; output is normalized (`amix=...:normalize=1`) to prevent clipping; output duration equals the **longest** input.\n- Music-under-voice: the bed never runs past the voice and never drowns it (ducked at `music_level` dB).\n- Output is forced to **stereo** (`-ac 2`).\n- For per-layer volume / timing offsets onto a video, use `video_audio_mix` instead.\n\n---\n\n### Audio Trim `audio_trim`\n\n| Field | Value |\n|---|---|\n| Category | audio_process |\n| Mode | sync |\n| Timeout | 1m |\n| Cost | Free (cost_per_unit: 0) |\n| MCP route | `audio(action: \"trim\")` — maps the tool's `audio` arg to the model's `in` field |\n\n**Description:** Cut an audio file to a start time and optional duration — e.g. shorten a long music bed before mixing, or drop a lead-in/lead-out. Output timestamps are rebased to 0, so the result is a clean seekable clip.\n\n**Parameters:**\n\n| Param | Type | Required | Default | Notes |\n|---|---|---|---|---|\n| `in` | string | yes | — | Input audio path (the MCP `trim` action's `audio` argument). |\n| `out` | string | yes | — | Output audio path. |\n| `start_sec` | number | no | 0 | Where the kept window starts, in seconds (≥ 0). |\n| `duration_sec` | number | no | — | Length of the kept window. Omit (or ≤ 0) to keep everything from `start_sec` to the end. |\n\n---\n\n### Audio Convert `audio_convert`\n\n| Field | Value |\n|---|---|\n| Category | audio_process |\n| Mode | sync |\n| Timeout | 30s |\n| Cost | Free (cost_per_unit: 0) |\n| Handler | `execAudioConvert` → `AudioConvert` (`internal/ffmpeg/audio_convert.go`) |\n| MCP route | **None** — internal-only (REST `POST /v1/jobs/audio_convert` or pipeline step). No `audio(...)` action routes here. |\n\n**Description:** Convert an audio file between formats, change sample rate, and/or adjust bitrate. Input format is auto-detected; output is chosen by the format key (see mismatch below) or inferred from the `out` extension.\n\n**Parameters** (from YAML — see mismatch flag):\n\n| Param | Type | Required | Default | Notes |\n|---|---|---|---|---|\n| `in` | string | yes | — | Input audio path. |\n| `out` | string | yes | — | Output audio path; format inferred from extension if no format key set. |\n| `output_format` | string | no | inferred from `out` ext | enum: mp3, mp3_128, mp3_320, aac, aac_256, wav, wav_48k, flac, ogg, opus. **⚠ See mismatch.** |\n| `sample_rate` | integer | no | original | Target Hz (e.g. 44100, 48000); applied via `-ar`. Read by handler ✓. |\n\n**⚠ YAML ↔ handler mismatch (important):** The YAML declares the format selector as **`output_format`**, but `execAudioConvert` reads **`inputs[\"format\"]`** (executor.go:250), not `output_format`. Consequences:\n- A caller passing `output_format` exactly as the YAML documents will have it **silently ignored**; the handler falls back to inferring the format from the `out` file extension.\n- The extended enum values that have no matching extension — `mp3_128`, `mp3_320`, `aac_256`, `wav_48k`, `opus` — are only reachable by passing the **undocumented** key `format` (e.g. `format: \"mp3_320\"`). Format/bitrate table (handler `audioCodecs`): mp3=192k, mp3_128=128k, mp3_320=320k, aac=192k, aac_256=256k, wav/wav_48k=pcm_s16le (wav_48k forces `-ar 48000`), flac=lossless, ogg=libvorbis 192k, opus=libopus 128k.\n- Recommendation: either rename the YAML field to `format`, or update the handler to also read `output_format` (as `audio_concat` does), or have the MCP/handler alias the two keys.\n\n**Behaviour notes:** Unknown format → error listing valid keys. Returns `outputs.audio` / `outputs.local_path` plus metrics (`input_duration_sec`, `output_duration_sec`, `format`, `codec`).\n\n---\n\n### Audio Tail Fade `tail_fade`\n\n| Field | Value |\n|---|---|\n| Category | audio_process |\n| Mode | sync |\n| Timeout | 30s |\n| Cost | Free (cost_per_unit: 0) |\n| Handler | `execTailFade` → `TailFade` (`internal/ffmpeg/tail_fade.go`) |\n| MCP route | **None** — internal-only (REST `POST /v1/jobs/tail_fade` or pipeline step). No `audio(...)` action routes here. |\n\n**Description:** Add a silence pad and a fade-out at the end of an audio file to prevent an abrupt ending (the \"audio cuts off\" bug). Intended to run after voiceover generation, before assembly. Purely parameter-driven — no prompt.\n\n**Parameters:**\n\n| Param | Type | Required | Default | Notes |\n|---|---|---|---|---|\n| `in` | string | yes | — | Input audio path (workdir-relative). |\n| `out` | string | yes | — | Output audio path. |\n| `pad_sec` | number | no | 0.8 | Seconds of trailing silence added (ffmpeg `apad=pad_dur`). |\n| `fade_sec` | number | no | 0.6 | Fade-out duration (ffmpeg `afade=t=out`). |\n\n**Behaviour notes:**\n- Defaults are applied when the value is `<= 0`, so passing `0` yields the default (0.8 / 0.6), not a true zero. To disable padding/fade you cannot use this model with 0.\n- The fade start point is computed internally as `input_duration + 0.1s` — it is not a parameter.\n- Output encoded with `-q:a 2` (VBR ~190 kbps mp3-class quality, format from `out` ext).\n- Returns `outputs.audio` / `outputs.local_path` plus metrics (`input_duration_sec`, `output_duration_sec`, `pad_sec`, `fade_sec`, `fade_start_sec`).\n"},{"id":"reference/models/image","section":"reference","title":"Image models","url":"https://docs.framehood.ai/reference/models/image","text":"# Image models\n\nText-to-image generation, image editing, and upscaling. Parameter tables are each model’s **input schema**; our wrapper params (`out`, `mock`, `format`) are noted per model.\n\n> Generations are charged in credits (see [Credits & plans](/guide/billing)). Every generation model also accepts `mock: true` for a free placeholder result.\n\n### FLUX.1 Schnell `flux_schnell`\n\nTurbo-mode (1-4 step) text-to-image generation from a 12B-parameter FLUX flow transformer — fast enough for prototyping, prompt iteration, and bulk draft runs.\n\n**Call it via** — `image` tool, `action: \"create\"`, `tier: \"draft\"` (the default tier) · raw: `POST /v1/jobs/flux_schnell`\n\n| | |\n|---|---|\n| **Cost** | 1 cr per call |\n| **Mode / timeout** | sync / 30s (from our YAML) |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `prompt` | string | ✓ | — | — | The prompt to generate an image from. |\n| `image_size` | string \\| object | | `landscape_4_3` | enum: `square_hd`, `square`, `portrait_4_3`, `portrait_16_9`, `landscape_4_3`, `landscape_16_9` — or `{width, height}` object (each 1–14142) | The size of the generated image. |\n| `num_inference_steps` | integer | | `4` | 1–12 | The number of inference steps to perform. |\n| `num_images` | integer | | `1` | 1–4 | The number of images to generate. |\n| `guidance_scale` | number | | `3.5` | 1–20 | CFG scale — how closely the model sticks to the prompt. |\n| `seed` | integer \\| null | | `null` | — | Same seed + same prompt + same model version → same image. |\n| `output_format` | string | | `jpeg` | enum: `jpeg`, `png` | The format of the generated image. |\n| `enable_safety_checker` | boolean | | `true` | — | If true, the safety checker is enabled. |\n| `acceleration` | string | | `none` | enum: `none`, `regular`, `high` | Generation speed — higher is faster. |\n| `sync_mode` | boolean | | `false` | — | If true, media returns as a data URI and isn't stored in request history. |\n\nOur wrapper params (not part of the model input schema): `out` (required — output filename/workdir-relative path), `mock` (optional — test placeholder), and `format` (optional — our size preset `shorts`/`reels`/`horizontal`, mapped to the model's `image_size` field: shorts/reels → `portrait_16_9`, horizontal → `landscape_16_9`, default → `portrait_16_9`).\n\n**Limits** — billed at 1 cr per megapixel, rounded up to the nearest megapixel. Custom `image_size` max 14142 × 14142 px. Up to 4 images per call; 1–12 inference steps. (No prompt character limit, duration, frame count, or file-size limit is published for this model.)\n\n### FLUX 1.1 [pro] ultra `flux_pro`\n\nText-to-image generation at up to 2K resolution (4 megapixels) with enhanced photorealism, with optional reference-image conditioning.\n\n**Call it via** — `image` tool, `action: \"create\"`, `tier: \"fine\"` (MCP) · raw: `POST /v1/jobs/flux_pro`\n\n| | |\n|---|---|\n| **Cost** | 12 cr per call |\n| **Mode / timeout** | sync / 30s (from our YAML) |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `prompt` | string | ✓ | — | — | The prompt to generate an image from. |\n| `seed` | integer | | null | — | Same seed + same prompt + same model version → same image. |\n| `sync_mode` | boolean | | `false` | — | If true, media is returned as a data URI and not stored in request history. |\n| `num_images` | integer | | `1` | 1–4 | Number of images to generate. |\n| `output_format` | string | | `jpeg` | `jpeg`, `png` | Format of the generated image. |\n| `safety_tolerance` | string | | `\"2\"` | `\"1\"`–`\"6\"` | Content-filter level; 1 = most strict, 6 = most permissive. |\n| `enhance_prompt` | boolean | | `false` | — | Whether to enhance the prompt for better results. |\n| `image_url` | string | | null | — | Reference image URL to condition generation on. |\n| `image_prompt_strength` | number | | `0.1` | 0–1 | Strength of the image prompt (reference-image influence). |\n| `aspect_ratio` | string | | `9:16` | `21:9`, `16:9`, `4:3`, `3:2`, `1:1`, `2:3`, `3:4`, `9:16`, `9:21` (free-form string also accepted) | Aspect ratio of the generated image. |\n| `raw` | boolean | | `false` | — | Generate less processed, more natural-looking images. |\n\nOur wrapper params (not part of the model input schema): `out` (required — output filename/path), `mock` (optional — test placeholder), and `format` (optional — size preset mapped to the model's `aspect_ratio` field: `shorts`/`reels`→`9:16`, `horizontal`→`16:9`, default `9:16`).\n\n**Limits** — model limits:\n- Max resolution: 4 megapixels (up to 2048×2048). Billing rounds up to the nearest megapixel.\n- Max images per call: 4 (`num_images`).\n- `image_prompt_strength` range: 0–1.\n- Output formats: JPEG, PNG.\n\n### Flux 2 LoRA Realism `flux_realism`\n\nText-to-image photorealism — FLUX.2 with a realism LoRA tuned for natural lighting, skin texture, and documentary-style detail; ideal for character portraits, people, products, and lifestyle scenes.\n\n**Call it via** — `image(action: \"create\", tier: \"photo\")` · raw: `POST /v1/jobs/flux_realism`\n\n| | |\n|---|---|\n| **Cost** | Billed per megapixel — ≈4–5 cr per image at the ~1 MP presets |\n| **Mode / timeout** | sync / 60s (from our YAML) |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `prompt` | string | ✓ | — | — | The prompt to generate a realistic image with natural lighting and authentic details. |\n| `image_size` | enum \\| object | | `landscape_4_3` | `square_hd`, `square`, `portrait_4_3`, `portrait_16_9`, `landscape_4_3`, `landscape_16_9` — or an object `{width, height}` (each int, >0, max 14142) | The size of the generated image. |\n| `guidance_scale` | number | | `2.5` | `0`–`20` | CFG scale. How closely the model follows the prompt. |\n| `num_inference_steps` | integer | | `40` | `4`–`50` | Number of inference steps; higher enhances realism. |\n| `acceleration` | enum | | `regular` | `none`, `regular` | Acceleration level; `regular` balances speed and quality. |\n| `seed` | integer \\| null | | none | — | Random seed for reproducibility; same seed + prompt → same result. |\n| `sync_mode` | boolean | | `false` | — | If true, media is returned as a data URI and not saved in history. |\n| `enable_safety_checker` | boolean | | `true` | — | Whether to enable the safety checker for the generated image. |\n| `output_format` | enum | | `png` | `png`, `jpeg`, `webp` | The format of the output image. |\n| `num_images` | integer | | `1` | `1`–`4` | Number of images to generate per call. |\n| `lora_scale` | number | | `1` | `0`–`2` | Strength of the realism effect. |\n\nOur wrapper params (not part of the model input schema): `out` (required — output filename), `mock` (optional — test placeholder), and `format` (optional — our friendly aspect preset, e.g. `shorts`/`reels`/`horizontal`, which we map to the model's `image_size` field via `format_mapping`: shorts/reels → `portrait_16_9`, horizontal → `landscape_16_9`).\n\n**Limits** — max **4 images** per call (`num_images` 1–4); inference steps **4–50**; custom `image_size` object dimensions up to **14142 px** per side (max ~4 MP recommended); output formats **PNG / JPEG / WebP**; text prompt only (no image input).\n\n### Nano Banana Pro `nano_banana`\n\nText-to-image on Google's Nano Banana Pro (Gemini 3 Pro Image): strong prompt adherence and best-in-class text rendering inside the image — posters, labels, UI mockups, and scenes that must follow the brief closely.\n\n**Call it via** — `image` tool, `action: \"create\"`, `model: \"nano_banana\"` (explicit model — the tier presets map to the FLUX family) · raw: `POST /v1/jobs/nano_banana`\n\n| | |\n|---|---|\n| **Cost** | 30 cr per call; 4K outputs charged at 2x |\n| **Mode / timeout** | sync / 2m (from our YAML) |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `prompt` | string | ✓ | — | — | What to generate. |\n| `num_images` | integer | | `1` | 1–4 | Number of images to generate. |\n| `seed` | integer | | — | any int | Seed for the RNG. |\n| `aspect_ratio` | string (enum) | | `1:1` | `21:9`, `16:9`, `3:2`, `4:3`, `5:4`, `1:1`, `4:5`, `3:4`, `2:3`, `9:16` | Aspect ratio of the output. |\n| `output_format` | string (enum) | | `png` | `jpeg`, `png`, `webp` | Format of the generated image. |\n| `resolution` | string (enum) | | `1K` | `1K`, `2K`, `4K` | Output resolution (4K costs 2x). |\n\nOur wrapper params (not part of the model input schema): `out` (required — workdir-relative output path), `mock` (optional — test placeholder), and `format` (optional — friendly size preset `shorts`/`reels`/`horizontal`, mapped to the model's `aspect_ratio` via `format_mapping`: shorts/reels → `9:16`, horizontal → `16:9`, default `1:1`).\n\n**Limits** — text prompt only (no image input; for instruction-based editing use `nano_banana_edit`); all outputs carry SynthID watermarking.\n\n### Nano Banana Pro Edit `nano_banana_edit`\n\nInstruction-based image editing built on Google's Gemini 3 Pro Image (Nano Banana 2): modify, restyle, inpaint, or compose images via natural-language instructions with no masks.\n\n**Call it via** — `image(edit)` MCP tool/action routes to our default editor (`seedream_v5_edit`); `nano_banana_edit` is a registered editor reachable directly · raw: `POST /v1/jobs/nano_banana_edit`\n\n| | |\n|---|---|\n| **Cost** | 30 cr per call; 4K outputs charged at 2x; web search adds 3 cr |\n| **Mode / timeout** | sync / 60s (from our YAML) |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `prompt` | string | ✓ | — | length 3–50000 chars | The prompt / editing instruction. |\n| `image_urls` | array[string] | ✓ | — | up to 14 images | URLs of the images to edit / compose. |\n| `num_images` | integer | | `1` | 1–4 | Number of images to generate. |\n| `seed` | integer | | — | any int (nullable) | Seed for the RNG. |\n| `aspect_ratio` | string (enum) | | `auto` | `auto`, `21:9`, `16:9`, `3:2`, `4:3`, `5:4`, `1:1`, `4:5`, `3:4`, `2:3`, `9:16` | Aspect ratio of the output (`auto` preserves source proportions). |\n| `output_format` | string (enum) | | `png` | `jpeg`, `png`, `webp` | Format of the generated image. |\n| `safety_tolerance` | string (enum) | | `4` | `1`–`6` | Content-moderation tolerance (1 strictest, 6 least strict). |\n| `sync_mode` | boolean | | `false` | — | If true, media is returned as a data URI and is not kept in request history. |\n| `system_prompt` | string | | `\"\"` | length ≤ 50000 chars | Optional system instruction steering persona/output style. |\n| `resolution` | string (enum) | | `1K` | `1K`, `2K`, `4K` | Output resolution (4K costs 2x). |\n| `limit_generations` | boolean | | `false` | — | Experimental: cap each prompting round to 1 image, ignoring count hints in the prompt. |\n| `enable_web_search` | boolean | | `false` | — | Allow the model to use live web data (adds 3 cr). |\n\nOur wrapper params (not part of the model input schema): `out` (required — workdir-relative output path), `mock` (optional — test placeholder), and `format` (optional — friendly size preset `shorts`/`reels`/`horizontal`, which our config maps to the model's `aspect_ratio` field via `format_field: aspect_ratio` → `shorts`/`reels`=`9:16`, `horizontal`=`16:9`; with no explicit `format` the default is `auto` — the edit preserves the source image's aspect ratio).\n\n**Limits** — prompt 3–50000 chars; `system_prompt` ≤ 50000 chars; `num_images` 1–4; up to 14 input images per composition; character consistency for up to 5 people; resolutions 1K (1024px) / 2K (2048px) / 4K; input images capped at ~89,478,485 pixels (oversized inputs rejected with 422 `image_too_large`); output formats PNG / JPEG / WebP; all outputs carry SynthID watermarking.\n\n### Seedream v4.5 Edit `seedream_v45_edit`\n\nEdit and compose images at high resolution from natural-language instructions, referencing up to 10 source images in one unified generation/editing architecture.\n\n**Call it via** — the `image` MCP tool with `action: \"edit\"` is the user-facing edit route, but note that action currently maps to `seedream_v5_edit`; this v4.5 variant is reached by calling the model directly. · raw: `POST /v1/jobs/seedream_v45_edit`\n\n| | |\n|---|---|\n| **Cost** | 8 cr per call |\n| **Mode / timeout** | sync / 60s (from our YAML) |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `prompt` | string | ✓ | — | — | Text prompt used to edit the image. |\n| `image_urls` | array&lt;string&gt; | ✓ | — | up to 10 URLs | Input images for editing. If more than 10 are sent, only the last 10 are used. |\n| `image_size` | object `{width,height}` **or** enum string | | `{width: 2048, height: 2048}` | enum: `square_hd`, `square`, `portrait_4_3`, `portrait_16_9`, `landscape_4_3`, `landscape_16_9`, `auto_2K`, `auto_4K`; or object with width/height each 1920–4096 | Output size. Width and height must each be 1920–4096, and total pixels between 2560×1440 and 4096×4096. |\n| `num_images` | integer | | `1` | 1–6 | Number of separate model generations to run with the prompt. |\n| `max_images` | integer | | `1` | 1–6 | If &gt;1, enables multi-image output: up to `max_images` per generation, `num_images` generations total. Total images (inputs + outputs) must not exceed 15. |\n| `seed` | integer (nullable) | | null | — | Random seed to control stochasticity. |\n| `sync_mode` | boolean | | `false` | — | If true, media is returned as a data URI and is not stored in request history. |\n| `enable_safety_checker` | boolean | | `true` | — | Enables the safety checker. |\n\nOur wrapper params (not part of the model input schema): `out` (required — output filename / workdir-relative path), `mock` (optional — test placeholder), and `format` (optional — our preset that we map to the model's `image_size` field via `format_mapping`: `shorts`/`reels` → 1080×1920, `horizontal` → 1920×1080).\n\n**Limits** — up to 10 input reference images (last 10 used if more provided); max total images (inputs + outputs) = 15; output resolution 1920–4096 px per axis, total pixels 2560×1440 to 4096×4096 (max 4 MP / 2048×2048 typical); output format PNG via URL or data URI; ~60s inference.\n\n### Seedream v5 Lite Edit `seedream_v5_edit`\n\nFast, intelligent image editing from Seedream 5.0 Lite — modify existing images, add/remove elements, composite characters into scenes, and apply style/color transfer, with up to 10 reference images per call.\n\n**Call it via** — `image(action: \"edit\", image_url, prompt)` (MCP tool `image`, action `edit`) · raw: `POST /v1/jobs/seedream_v5_edit`\n\n| | |\n|---|---|\n| **Cost** | 7 cr per call |\n| **Mode / timeout** | sync / 60s (from our YAML) |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `prompt` | string | ✓ | — | — | Text prompt describing the edit to apply. |\n| `image_urls` | string[] | ✓ | — | up to 10 images | URLs of input images to edit. If more than 10 are sent, only the last 10 are used. |\n| `image_size` | ImageSize object \\| enum string | — | `auto_2K` | enum: `square_hd`, `square`, `portrait_4_3`, `portrait_16_9`, `landscape_4_3`, `landscape_16_9`, `auto_2K`, `auto_3K`, `auto_4K`; or `{width, height}` (each 1–14142). Total pixels must be 2560×1440…4096×4096, else scaled. | Output image size, as a preset enum or explicit width/height. |\n| `num_images` | integer | — | `1` | 1–6 | Number of separate generations to run with the prompt. |\n| `max_images` | integer | — | `1` | 1–6 | If >1, enables multi-image generation: up to `max_images` images per generation, so total output is between `num_images` and `max_images×num_images`. |\n| `sync_mode` | boolean | — | `false` | true / false | If true, media is returned as a data URI and output isn't stored in request history. |\n| `enable_safety_checker` | boolean | — | `true` | true / false | If true, the content safety checker is enabled. |\n\nOur wrapper params (not part of the model input schema): `out` (required — workdir-relative output filename), `mock` (optional — test placeholder, no real generation). Our `format` (optional — `shorts`/`reels`/`horizontal`) is a wrapper we map to the model's `image_size` field as an explicit `{width, height}` object (shorts/reels → 1080×1920, horizontal → 1920×1080).\n\n**Limits** — model limits:\n- Max reference images: **10** (last 10 used if more are sent).\n- Max resolution: **3072×3072** (9 MP); total pixel count supported between 2560×1440 (≈3.7 MP) and 4096×4096 (≈9.43 MP, scaled to fit).\n- Batch: **1–6** generations per call (`num_images`), up to **6** images each (`max_images`).\n- Output format: **PNG** delivered via HTTPS URL (or data URI when `sync_mode=true`).\n\n### Topaz Image Upscale `topaz_upscale_image`\n\nTopaz image enhancer — upscale and enhance images (add detail, face enhancement, sharpening, denoising, compression-artifact removal, and generative detail).\n\n**Call it via** — `image` tool, `action: \"upscale\"` (MCP) · raw: `POST /v1/jobs/topaz_upscale_image`\n\n| | |\n|---|---|\n| **Cost** | 16 cr per call |\n| **Mode / timeout** | sync / 120s (from our YAML) |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `image_url` | string | ✓ | — | non-empty URL | URL of the image to be upscaled. |\n| `model` | string (enum) | | `Standard V2` | `Low Resolution V2`, `Standard V2`, `CGI`, `High Fidelity V2`, `Text Refine`, `Recovery`, `Redefine`, `Recovery V2`, `Standard MAX`, `Wonder`, `Wonder 3` | Model to use for image enhancement. |\n| `upscale_factor` | number | | `2` | `1`–`4` | Factor to upscale the image by (2.0 doubles width and height). |\n| `crop_to_fill` | boolean | | `false` | true / false | Crop the output to fill the target size. |\n| `output_format` | string (enum) | | `jpeg` | `jpeg`, `png` | Output format of the upscaled image. |\n| `subject_detection` | string (enum) | | `All` | `All`, `Foreground`, `Background` | Subject detection mode. Applies to standard enhance and Recovery V2 models. |\n| `face_enhancement` | boolean | | `true` | true / false | Apply face enhancement. Applies to standard enhance and Recovery V2 models. |\n| `face_enhancement_creativity` | number | | `0` | `0`–`1` | Creativity for face enhancement; 0 = none, 1 = max. Ignored if face enhancement is disabled. |\n| `face_enhancement_strength` | number | | `0.8` | `0`–`1` | Strength of face enhancement; 0 = none, 1 = max. Ignored if face enhancement is disabled. |\n| `sharpen` | number | | — | `0`–`1` | Sharpening level. Applies to Standard V2, Low Resolution V2, CGI, High Fidelity V2, Text Refine, Redefine. |\n| `denoise` | number | | — | `0`–`1` | Denoising level. Applies to Standard V2, Low Resolution V2, CGI, High Fidelity V2, Text Refine, Redefine. |\n| `fix_compression` | number | | — | `0`–`1` | Compression-artifact removal. Applies to Standard V2, Low Resolution V2, High Fidelity V2, Text Refine. |\n| `strength` | number | | — | `0.01`–`1` | Enhancement strength. Applies to Text Refine model only. |\n| `creativity` | integer | | — | `1`–`6` | Generative creativity (higher = more hallucinated detail). Applies to Redefine model only. |\n| `texture` | integer | | — | `1`–`5` | Texture detail level for generative upscaling. Applies to Redefine model only. |\n| `prompt` | string | | — | max 1024 chars | Text prompt to guide generative upscaling. Applies to Redefine model only. |\n| `autoprompt` | boolean | | — | true / false | Auto-generate the prompt for generative upscaling. Applies to Redefine model only. |\n| `detail` | number | | — | `0`–`1` | Detail recovery level. Applies to Recovery V2 model only. |\n| `enhancement_strength` | string (enum) | | — | `low`, `medium`, `high` | Enhancement strength for generative upscaling. Applies to Wonder 3 model only; auto-configured when omitted. |\n\nOur wrapper params (not part of the model input schema): `out` (required — workdir-relative output filename), `mock` (optional — test placeholder). This model has no `format` mapping (`format_field` is empty), so no model size field is derived from `format`.\n\n**Limits** — model limits: `upscale_factor` `1`–`4`; `prompt` ≤ 1024 chars; accepted input formats jpg, jpeg, png, webp, gif, avif. Catalog cost is a flat 16 cr per call (covers outputs up to ~24 MP).\n"},{"id":"reference/models/index","section":"reference","title":"Model catalog","url":"https://docs.framehood.ai/reference/models/","text":"# Model catalog\n\nEvery model you can call through Framehood — what it does, how to call it, its full\nparameter schema, limits, and cost. Pages are grouped by type; pick a category below\nfor the detailed per-model reference.\n\n**How to call.** Most models are reached through an MCP tool action (e.g. `image(create)`,\n`video(swap)`, `audio(speak)`) or the CLI. The raw form is\n`POST /v1/jobs/<model>` with the model's `inputs`. Each model's exact, live input schema\nis also available at `GET /v1/models/<model>` (and `…/prompt-guide` where present).\n\n**Cost.** Figures are the price in **credits** (per call unless noted). Local\nprocessing/QA steps that run on our own infrastructure are **free**. See\n[Credits & plans](/guide/billing).\n\n**Testing.** Every generation model also accepts `mock: true` to return a placeholder\nresult without running the model (no credits spent).\n\n---\n\n## Image — [full reference →](/reference/models/image)\n\nText-to-image, image editing, and upscaling.\n\n| Model | What it does | Cost |\n|---|---|---|\n| FLUX.1 Schnell `flux_schnell` | Fast text-to-image (drafts, iteration) | 1 cr |\n| Flux Pro 1.1 Ultra `flux_pro` | Text-to-image, scenes/backgrounds | 12 cr |\n| Flux 2 LoRA Realism `flux_realism` | Text-to-image, photorealistic | ≈5 cr (per MP) |\n| Nano Banana Pro `nano_banana` | Text-to-image, in-image text / prompt adherence | 30 cr |\n| Nano Banana Pro Edit `nano_banana_edit` | Instruction-based image editing | 30 cr |\n| Seedream v4.5 Edit `seedream_v45_edit` | High-resolution image editing | 8 cr |\n| Seedream v5 Lite Edit `seedream_v5_edit` | Image editing / compositing | 7 cr |\n| Topaz Image Upscale `topaz_upscale_image` | Upscale + detail enhancement | 16 cr |\n\n## Video — [full reference →](/reference/models/video)\n\nGeneration, image-to-video, editing, swap, and upscaling.\n\n| Model | What it does | Cost |\n|---|---|---|\n| Seedance 2.0 Reference-to-Video `seedance_r2v` | Reference images/video/audio → video (up to 4K) | 303 cr |\n| Kling v3 Standard I2V `kling_v3_std_i2v` | Image-to-video, standard quality | 84 cr |\n| Kling v3 Pro I2V `kling_v3_pro_i2v` | Image-to-video, high quality | 112 cr |\n| Kling O3 Video Edit `kling_o3_video_edit` | Video edit with reference images | 126 cr |\n| PixVerse Swap `pixverse_swap` | Person/object swap in video | 30 cr |\n| Wan 2.7 Video Edit `wan_27_video_edit` | Text-guided video-to-video edit | 100 cr |\n| Topaz Video Upscale `topaz_upscale_video` | Upscale + enhance video | 100 cr |\n\n## Audio — [full reference →](/reference/models/audio)\n\nSpeech, sound effects, music, and audio processing.\n\n| Model | What it does | Cost |\n|---|---|---|\n| ElevenLabs TTS v3 `elevenlabs_tts_v3` | Text-to-speech, emotional control | 20 cr |\n| ElevenLabs TTS (direct) `elevenlabs_tts_direct` | TTS with cloned/linked voice | 20 cr |\n| ElevenLabs Sound Effects `elevenlabs_sfx` | Sound-effect / foley generation | 1 cr |\n| Minimax Music v2.6 `minimax_music` | Music (instrumental or with lyrics) | 30 cr |\n| Audio Concat `audio_concat` | Join audio files in sequence | Free |\n| Audio-Only Mix `audio_only_mix` | Mix audio files into one (flat, or a ducked music bed) | Free |\n| Audio Trim `audio_trim` | Cut audio to a start/duration window | Free |\n| Audio Convert `audio_convert` | Format / sample-rate conversion | Free |\n| Audio Tail Fade `tail_fade` | Silence pad + fade-out | Free |\n\n## Video processing & assembly — [full reference →](/reference/models/processing)\n\nLocal ffmpeg pipelines (free) plus lipsync.\n\n| Model | What it does | Cost |\n|---|---|---|\n| Auto Subtitles `captions_auto` | Karaoke-style caption burn-in | 6 cr |\n| Full Video Assembly `video_assemble_full` | Clips + transitions + audio + intro/end | Free |\n| Assemble Clips `assemble_clips` | Concatenate clips with transitions | Free |\n| Video + Audio Mix `video_audio_mix` | Overlay VO/music/SFX onto video | Free |\n| Audio Mix `audio_mix` | Layered audio mix onto video | Free |\n| Structural Export `structural_export` | Final platform encode (TikTok/Reels/…) | Free |\n| Highlight Rolloff `highlight_rolloff` | Surgical highlight compression | Free |\n| Sync Lipsync v3 `lipsync_v3` | Lip-sync mouth to audio (expensive) | 1600 cr |\n\n## QA checks — [full reference →](/reference/models/qa)\n\nQuality checks for generated media — mostly free; a few use a vision/STT model.\n\n| Model | What it does | Cost |\n|---|---|---|\n| Full QA Pipeline `qa_full` | Run all checks on a finished video | 1 cr |\n| Same Person Check `check_same_person` | Identity consistency (ref vs test) | 1 cr |\n| Scene Matches Plan `check_scene_matches_plan` | Image/video matches shooting plan | 1 cr |\n| Image Description Check `check_image_description` | Image matches a text description | 1 cr |\n| Voice Consistency Check `check_voice_consistency` | Same speaker throughout | 1 cr |\n| Transcript Check `check_transcript` | Transcribe (video or audio) with timecodes, and optionally check it matches expected voiceover | 1 cr |\n| Video Description `describe_video` | Timecoded scene/speech/sounds/music breakdown | ≈1 cr per 25 s at fps 1, × fps; min 1 |\n| Audio Loudness Check `check_audio_loudness` | LUFS / true-peak vs platform target | Free |\n| Audio Structural Check `check_audio_structural` | Codec/duration/sample-rate sanity | Free |\n| Audio Tail Check `check_audio_tail` | Detect abrupt audio cut-off | Free |\n| Motion Artifacts Check `check_motion_artifacts` | Glitches / jump cuts / artifacts | Free |\n| Overexposure Check `overexposure_check` | Blown-out highlights | Free |\n\n## Actors — [full reference →](/reference/models/actors)\n\nPersistent characters: training, generation, and voice.\n\n| Model | What it does | Cost |\n|---|---|---|\n| Actor LoRA Training `actor_lora_train` | Train a LoRA from 4–30 reference images | 500 cr |\n| Flux LoRA Inference `actor_lora_inference` | Actor-consistent image generation | ≈7 cr (per MP) |\n| Actor Voice Clone (IVC) `actor_voice_clone` | Clone a voice from samples | Free |\n"},{"id":"reference/models/processing","section":"reference","title":"Video processing & assembly","url":"https://docs.framehood.ai/reference/models/processing","text":"# Video processing & assembly\n\nAuto-captions and lipsync plus local ffmpeg pipelines (free, our implementation).\n\n> Generations are charged in credits (see [Credits & plans](/guide/billing)). Every generation model also accepts `mock: true` for a free placeholder result.\n\n### Auto Subtitles `captions_auto`\n\nAutomatically transcribe a video's audio and burn in karaoke-style subtitles with word-level highlighting, customizable Google Fonts, colors, and animation.\n\n**Call it via** — `video` tool, `action: \"captions\"` (MCP) · raw: `POST /v1/jobs/captions_auto`\n\n| | |\n|---|---|\n| **Cost** | 6 cr per minute of video |\n| **Mode / timeout** | webhook / 10m |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `video_url` | string | ✓ | — | — | URL of the video file to add automatic subtitles to (max 100 MB). |\n| `language` | string | | `en` | 2-letter code (`en`, `es`, `fr`, `de`, `it`, `pt`, `nl`, `ja`, `zh`, `ko`, …) or 3-letter ISO code (`eng`, `spa`, `fra`, …) | Language code for transcription. |\n| `font_name` | string | | `Montserrat` | any Google Font name (e.g. `Poppins`, `Bebas Neue`, `Oswald`, `Inter`, `Roboto`) | Font from fonts.google.com. |\n| `font_size` | integer | | `100` | 20–150 | Font size in pixels (TikTok style uses larger text). |\n| `font_weight` | string | | `bold` | `normal`, `bold`, `black` | Font weight. |\n| `font_color` | string | | `white` | `white`, `black`, `red`, `green`, `blue`, `yellow`, `orange`, `purple`, `pink`, `brown`, `gray`, `cyan`, `magenta` | Subtitle text color for non-active words. |\n| `highlight_color` | string | | `purple` | same 13 colors as `font_color` | Color for the currently speaking word (karaoke-style highlight). |\n| `stroke_width` | integer | | `3` | 0–10 | Text stroke/outline width in pixels (0 = no stroke). |\n| `stroke_color` | string | | `black` | same 13 colors as `font_color` | Text stroke/outline color. |\n| `background_color` | string | | `none` | the 13 colors above plus `none`, `transparent` | Background color behind text. |\n| `background_opacity` | number | | `0` | 0.0–1.0 | Background opacity (0 = transparent, 1 = opaque). |\n| `position` | string | | `bottom` | `top`, `center`, `bottom` | Vertical position of subtitles. |\n| `y_offset` | integer | | `75` | -200–200 | Vertical offset in pixels (positive = down, negative = up). |\n| `words_per_subtitle` | integer | | `3` | 1–12 | Max words per subtitle segment (1 = single word, 8–12 = full sentences). |\n| `enable_animation` | boolean | | `true` | true / false | Bounce-style entrance animation for subtitles. |\n\nOur wrapper params (not part of the model schema): `out` (required — workdir-relative output path) and `mock` (optional — test placeholder, no real generation). This model has no `format`/size mapping (`format_field` is empty).\n\n**Limits** — `video_url` max file size 100 MB. Accepted input formats: mp4, mov, webm, m4v, gif. Cost is metered at 6 cr per minute of video. Transcription is via ElevenLabs speech-to-text.\n\n### Full Video Assembly `video_assemble_full`\n\n| | |\n|---|---|\n| **Category** | video_process |\n| **Mode** | sync |\n| **Timeout** | 10m |\n| **Cost** | free (`cost_per_unit: 0`) |\n| **MCP action** | `video(assemble)` (worker `video.ts` → kind `video_assemble_full`) |\n\nOne-call complete assembly: concatenates clips with visual transitions (xfade), mixes audio layers (VO / music / ambient SFX / transition SFX / intro SFX / end SFX), and applies intro fade + ending preset. Replaces `assemble_clips` + `audio_mix` in a single job. Implemented by `VideoAssembleFull` (`video_assemble_full.go`), dispatched by `execVideoAssembleFull`. Pre-validates that VO fits inside the assembled duration (hard error if VO is >0.5s longer). When the VO and video durations diverge by more than 3s, the job result gains a `warnings` array flagging the mismatch.\n\n**Parameters** (from `input_schema`, cross-checked against `executor.go`/`video_assemble_full.go`):\n\n| Param | Type | Req | Default | Notes |\n|---|---|---|---|---|\n| `clips` | array&lt;object> | yes | — | Ordered. Each `{path, transition, transition_sfx}`. |\n| `clips[].path` | string | yes | — | Clip path. |\n| `clips[].transition` | string | no | `cut` | Visual transition INTO this clip. YAML enum: cut, dissolve, fadeblack, fadewhite, wipeleft, wiperight, smoothleft, blur, flash, distance, circlecrop. **Caveat:** the underlying `AssembleClips` only implements `cut`→concat, `dissolve`→xfade fade, `wipe`→wipeleft; every other value falls through to a plain `fade` xfade. So fadeblack/blur/flash/etc. currently render as a crossfade, not their named effect. |\n| `clips[].transition_sfx` | string | no | — | SFX path played centered on this cut (`-0.15s` lead, volume 0.7). |\n| `out` | string | yes | — | Output video path. |\n| `xfade_duration` | number | no | `0.2` | Visual transition duration (s). |\n| `intro` | object | no | — | `{fade_in, fade_in_duration, sfx}`. |\n| `intro.fade_in` | bool | no | `false` | Hard start unless true. |\n| `intro.fade_in_duration` | number | no | `0.3` | |\n| `intro.sfx` | string | no | — | Intro whoosh (volume 0.7). |\n| `vo` | string | no | — | Voiceover path (0 dB by default). |\n| `vo_level` | number | no | `0` | VO volume (dB). |\n| `vo_offset_sec` | number | no | `0` (min 0) | Delay before VO starts — align speech with a later clip. Negative is rejected. |\n| `music` | string | no | — | Music bed path. |\n| `music_level` | number | no | `-24` | Music volume (dB); handler defaults to −24 if 0. |\n| `sfx_ambient` | string | no | — | Ambient SFX path. |\n| `sfx_level` | number | no | `-18` | Handler defaults to −18 if 0. |\n| `ending` | object | no | — | `{type, end_sfx, video_fade, music_fade_start, end_sfx_start, black_tail}`. |\n| `ending.type` | string | no | `social` | Preset enum: social / cinematic / loop. social: fade 0.3s, music fade −0.5s, end_sfx −0.3s. cinematic: fade 1.0s, music −2.0s, sfx −1.0s, 0.5s black tail. loop: no fades/tail. Per-field overrides win over the preset. |\n\n> **Undocumented input:** the handler also reads a **top-level `ending_type` string** (`executor.go:358`) before merging `ending.type`. Not declared in the YAML; nested `ending.type` overrides it. Prefer the documented nested form.\n\n**Output:** `{ ok, outputs:{video, local_path}, metrics:{num_clips, video_duration, output_duration, ending_type, video_fade, music_fade_start, black_tail, xfade_duration, audio_layers}, warnings[] }`. The `warnings` array is present when the VO/video durations diverge by more than 3s.\n\n---\n\n### Assemble Clips `assemble_clips`\n\n| | |\n|---|---|\n| **Category** | video_process |\n| **Mode** | sync |\n| **Timeout** | 5m |\n| **Cost** | free (`cost_per_unit: 0`) |\n| **MCP action** | **none — internal/REST only.** No MCP action maps here; `video(assemble)` routes to `video_assemble_full`. Reachable only via direct `POST /v1/jobs/assemble_clips` or as a building block of `video_assemble_full`. (proxy.ts maps it to `video/assemble` for error-hint purposes only.) |\n\nConcatenate clips in array order. If all transitions are cut/hold/match-cut, uses the concat demuxer with **stream copy** (fast, no re-encode); if any dissolve/wipe is present, re-encodes via the `xfade` filter (libx264, CRF 19). Clips lacking an audio track get a silent track injected first (`ensureAudioTrack`). Implemented by `AssembleClips` (`assemble_clips.go`), dispatched by `execAssembleClips`.\n\n**Parameters** (from `input_schema`, cross-checked against `assemble_clips.go`):\n\n| Param | Type | Req | Default | Notes |\n|---|---|---|---|---|\n| `clips` | array&lt;object> | yes | — | Ordered `{path, trans_in, duration}`. |\n| `clips[].path` | string | yes | — | Clip path. Rejected if it contains `'`, newline, or CR (concat-list injection guard). |\n| `clips[].trans_in` | string | no | `cut` | Transition INTO this clip (first clip's is ignored). YAML enum: cut, dissolve, wipe, match-cut, j-cut, l-cut, hold. Handler: cut/hold/match-cut → stream-copy concat; dissolve → xfade fade; wipe → xfade wipeleft; **any other value (incl. j-cut/l-cut) → default `fade` xfade** (plain crossfade, no audio lead/lag). |\n| `clips[].duration` | number | no | — | Clip duration override in seconds (0 = full clip). Handler reads `m[\"duration\"]`. |\n| `out` | string | yes | — | Output video path. |\n| `xfade_duration` | number | no | `0.1` | Dissolve/wipe duration (s); handler clamps ≤0 to 0.1. |\n\n> **Duration caveat (documented in YAML):** each dissolve/wipe shortens total output by `xfade_duration`. Plan VO length against the *assembled* duration, not the raw clip sum.\n\n**Output:** `{ ok, outputs:{video, local_path}, metrics:{num_clips, total_duration_sec, transitions_applied, method:\"concat_demuxer\"|\"xfade_filter\", ...} }`.\n\n---\n\n### Video + Audio Mix `video_audio_mix`\n\n| | |\n|---|---|\n| **Category** | video_process |\n| **Mode** | sync |\n| **Timeout** | 5m |\n| **Cost** | free (`cost_per_unit: 0`) |\n| **MCP action** | `video(mix_audio)` (worker `video.ts` → kind `video_audio_mix`). **MCP exposes only `tracks: string[]`**, which the worker expands into `layers`: the FIRST track becomes the VO (`level: 0`, `label: \"vo\"`), the rest are mixed at `-24 dB` (`label: \"track2\"…`), all with `start_sec: 0`. Custom per-layer `level`/`start_sec`/`label` and `keep_original_audio` are reachable via direct REST `/v1/jobs/video_audio_mix`. |\n\nOverlay audio layers (VO, music, SFX) onto a video with per-layer dB level and start offset, then `amix` them. Video stream is copied (`-c:v copy`); audio re-encoded AAC 192k; output trimmed to the video length. Implemented by `AudioMix` (`audio_mix.go`), dispatched by `execAudioMix`.\n\n**Parameters** (from `input_schema`, cross-checked against `audio_mix.go`):\n\n| Param | Type | Req | Default | Notes |\n|---|---|---|---|---|\n| `video` | string | yes | — | Input video. (MCP `mix_audio` maps `video_url` → `video`.) |\n| `out` | string | yes | — | Output video path. |\n| `layers` | array&lt;object> | yes | — | Each `{path, level, start_sec, label}`. |\n| `layers[].path` | string | yes | — | Audio path. |\n| `layers[].level` | number | no | `0` | dB (0 = original, −24 = background). Converted to linear via exact `10^(dB/20)`. |\n| `layers[].start_sec` | number | no | `0` | Offset from video start; >0 adds `adelay`. |\n| `layers[].label` | string | yes | — | Reporting label. **Semantically special:** `label:\"vo\"` triggers a hard error if VO is longer than video (+0.5s) and a tight-timing warning within 0.5s; `label:\"music\"` only warns when it exceeds video. |\n| `keep_original_audio` | bool | no | `false` | If true, mixes the video's existing `[0:a]` in too. |\n\n**Output:** `{ ok, outputs:{video, local_path}, metrics:{video_duration_sec, output_duration_sec, layers[], keep_original_audio, warnings[]} }`.\n\n---\n\n### Audio Mix `audio_mix`\n\n| | |\n|---|---|\n| **Category** | video_process |\n| **Mode** | sync |\n| **Timeout** | 5m |\n| **Cost** | free (`cost_per_unit: 0`) |\n| **MCP action** | **none — deprecated alias.** Registered in `executor.go` as `\"audio_mix\": e.execAudioMix` with the comment *\"deprecated name, alias for video_audio_mix\"*. Identical YAML and identical handler to `video_audio_mix`. Not present in any worker action map; reachable only via direct `POST /v1/jobs/audio_mix`. Prefer **video_audio_mix**. |\n\nFunctionally identical to **video_audio_mix** above — same `AudioMix` (`audio_mix.go`) handler, same parameters (`video`, `out`, `layers[]{path,level,start_sec,label}`, `keep_original_audio`), same output. Kept for backward compatibility of the old name only. See video_audio_mix for the full parameter table and the `label:\"vo\"`/`\"music\"` validation behaviour.\n\n> **Doc note:** two YAML files (`audio_mix.yaml`, `video_audio_mix.yaml`) document a single implementation. Despite the name, this operates on a **video** input (requires `video` + `layers`), not audio-only mixing — audio-only mixing is the separate `audio_only_mix` model.\n\n---\n\n### Structural Export `structural_export`\n\n| | |\n|---|---|\n| **Category** | video_process |\n| **Mode** | sync |\n| **Timeout** | 5m |\n| **Cost** | free (`cost_per_unit: 0`) |\n| **MCP action** | **none — internal/pipeline only.** No worker action maps here; reachable via direct `POST /v1/jobs/structural_export` or as a final encode step in the pipeline. |\n\nFinal platform-specific structural encode — scale + letterbox-pad to target resolution and re-encode (libx264 `-preset slow`, `+faststart`). **No creative/color filters.** Apply after upscale and caption burn-in. Implemented by `StructuralExport` (`structural_export.go`), dispatched by `execStructuralExport`.\n\n**Parameters** (from `input_schema`, cross-checked against `structural_export.go`):\n\n| Param | Type | Req | Default | Notes |\n|---|---|---|---|---|\n| `in` | string | yes | — | Input video path. Handler reads `inputs[\"in\"]`. |\n| `out` | string | yes | — | Output video path. |\n| `platform` | string | yes (handler errors if empty) | YAML default `shorts` | Preset enum. tiktok/reels/shorts → 1080×1920, 30fps, CRF 19, AAC 192k. youtube-long → 1920×1080, 24fps, CRF 18, AAC 192k. ads → 1080×1920, 30fps, CRF 17, AAC 256k. Unknown value → error listing valid platforms. |\n\n**Output:** `{ ok, outputs:{video, local_path}, metrics:{platform, resolution, fps, crf, total_duration_sec} }`.\n\n---\n\n### Highlight Rolloff `highlight_rolloff`\n\n| | |\n|---|---|\n| **Category** | video_process |\n| **Mode** | sync |\n| **Timeout** | 5m |\n| **Cost** | free (`cost_per_unit: 0`) |\n| **MCP action** | **none — internal/QA-pipeline only.** No worker action maps here; reachable via direct `POST /v1/jobs/highlight_rolloff` or the QA/fix pipeline. Intended to run only when `overexposure_check` fails. |\n\nSurgical overexposure fix: compresses highlights via a fixed `curves` filter (`all='0/0 0.85/0.85 1/0.92'` — values above 85% rolled off to max 92%), audio stream-copied. After encoding it **automatically re-runs the overexposure check** (3% clipped threshold, 2 fps sampling) and returns the post-fix verdict. This is the only sanctioned creative color operation in the pipeline. Implemented by `HighlightRolloff` (`highlight_rolloff.go`), dispatched by `execHighlightRolloff`.\n\n**Parameters** (from `input_schema`, cross-checked against `highlight_rolloff.go`):\n\n| Param | Type | Req | Default | Notes |\n|---|---|---|---|---|\n| `in` | string | yes | — | Input video path. Handler reads `inputs[\"in\"]`. |\n| `out` | string | yes | — | Output video path. |\n\nNo tunable parameters — the curve and the post-check thresholds are hardcoded.\n\n**Output:** `{ ok, outputs:{video, local_path}, metrics:{filter, total_duration_sec, post_check, post_verdict} }`. Per the YAML guidance, if the source still exceeds 3% clipping after rolloff the source clips are bad and the pipeline should block to Visual Prompting — this routing is pipeline policy, the handler itself only surfaces `post_verdict`.\n\n### Sync Lipsync v3 `lipsync_v3`\n\nsync-3, Sync.so's most powerful lipsync model, syncs mouth movement to an audio track on a talking-head video using native visual intelligence.\n\n**Call it via** — `video` tool, `action: \"lipsync\"` (MCP) · raw: `POST /v1/jobs/lipsync_v3`\n\n| | |\n|---|---|\n| **Cost** | 1600 cr per minute of output |\n| **Mode / timeout** | webhook / 15m (from our YAML) |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `video_url` | string | ✓ | — | — | URL of the input video (face visible) |\n| `audio_url` | string | ✓ | — | — | URL of the input audio |\n| `sync_mode` | string (enum) | | `cut_off` (model); our `video(lipsync)` sends `loop` unless you pass one | `cut_off`, `loop`, `bounce`, `silence`, `remap` | How to handle audio/video duration mismatch. `cut_off` trims to the shorter input (drops the tail of longer audio); `loop`/`bounce` repeat the video (never drops speech); `silence` pads with silence; `remap` speed-adjusts |\n| `options` | object | | — | nested `Sync3GenerationOptions` | Additional Sync.so generation options (advanced). Fields: `sync_mode` (overrides top-level), `model_mode` (`lips`/`face`/`head`/`lipsync`/`emotion`/`talking_head`), `prompt` (emotion: `happy`/`sad`/`angry`/`disgusted`/`surprised`/`neutral`), `temperature` (0–1, ignored by sync-3), `active_speaker_detection` (object, for multi-person videos), `occlusion_detection_enabled` (bool, ignored by sync-3) |\n\nOur wrapper params (not part of the model schema): `out` (required — workdir-relative output path) and `mock` (optional — test placeholder). No `format` mapping applies (our `format_field` is empty; sync-3 has no size/resolution field).\n\n**Limits**:\n- Accepted video formats: `mp4`, `mov`, `webm`, `m4v`, `gif`\n- Accepted audio formats: `mp3`, `ogg`, `wav`, `m4a`, `aac`\n- Billing is per minute of output video at 1600 cr/min (no published hard cap on duration/resolution/file size).\n"},{"id":"reference/models/qa","section":"reference","title":"QA checks","url":"https://docs.framehood.ai/reference/models/qa","text":"# QA checks\n\nQuality checks for generated media — our own implementations (local ffmpeg, with a vision/STT call for some).\n\n> Generations are charged in credits (see [Credits & plans](/guide/billing)). Every generation model also accepts `mock: true` for a free placeholder result.\n\n### Full QA Pipeline `qa_full`\n\n- **Provider:** local (ffmpeg + vision: google/gemini-2.5-flash, optional speech-to-text)\n- **Endpoint:** none (in-process pipeline, `execQAFull` → `QAPipeline` in `qa_pipeline.go`)\n- **MCP action:** `qa` tool, `action: \"full\"` → routes to `qa_full` (`QA_MODELS.full`)\n- **Cost:** 1 credit per run (one vision call + optional transcription). Upper bound — drops to free if no vision client is configured.\n- **Timeout:** `5m`\n\nRuns all QA checks on a finished video in one pass. Probes the video, extracts 5 frames (10/30/50/70/90%) once, extracts audio once, then runs ffmpeg checks (overexposure, motion artifacts, audio structural/loudness/tail) plus a single multi-frame Gemini call (person consistency, visual quality, and — when a plan is given — scene-matches-plan). If STT+vision are configured and `plan.vo_text` is present, also runs an in-pipeline transcript word-overlap check. Returns per-check `PASS/FAIL/SKIP/ERROR` and an overall verdict (`FAIL` if any check fails).\n\n**Parameters**\n\n| Name | Type | Req | Default | Notes |\n|---|---|---|---|---|\n| `video` | string | yes | — | Video file path. Handler reads `inputs[\"video\"]`. |\n| `plan` | object | no | — | Shooting plan: `SET, LIGHT, SHOT_TYPE, ACTORS_ACTION, vo_text`. Presence of `SET` enables the scene-matches-plan sub-check; `vo_text` enables the transcript sub-check. |\n| `expected_characters` | integer | no | 1 | **Declared in YAML but NOT read by the handler** — person-consistency always runs across all frames regardless. Inert. |\n\n**Mismatch notes:** vision model is hard-coded to `google/gemini-2.5-flash` (no override field). The transcript sub-check uses `simpleTranscriptCompare` (word overlap, no second LLM call), unlike standalone `check_transcript`. Audio checks emit `SKIP` if the video has no audio track.\n\n---\n\n### Same Person Check `check_same_person`\n\n- **Provider:** local (vision: google/gemini-2.5-flash)\n- **Endpoint:** none (`execCheckSamePerson` → `CheckSamePerson` in `check_vision.go`)\n- **MCP action:** `qa` tool, `action: \"person\"` → routes to `check_same_person`. The MCP layer maps `image1`→`ref` and `image2`→`test`.\n- **Cost:** 1 credit (one vision call)\n- **Timeout:** `30s`\n\nCompares facial features between a reference image and a test image (or video — mid-frame auto-extracted via `extractMidFrame`). Sends both to Gemini with `VisionCheckMulti`. Returns `same_person` (bool), `confidence` (0–100), `differences` (list), and `verdict`. PASS requires `same_person == true` AND `confidence >= min_confidence`.\n\n**Parameters**\n\n| Name | Type | Req | Default | Notes |\n|---|---|---|---|---|\n| `ref` | string | yes | — | Reference image URL (persona_ref). Passed to the API as-is (no base64 conversion). |\n| `test` | string | yes | — | Test image path/URL, or a video (mid-frame extracted, ext in `.mp4/.mov/.avi/.mkv/.webm`). |\n| `min_confidence` | integer | no | 85 | Min confidence (0–100) for PASS. Handler re-clamps to 85 if `<= 0`. |\n| `model` | string | no | `google/gemini-2.5-flash` | Vision model override. |\n\n**Mismatch notes:** YAML/handler fields match exactly. Errors if the vision client is not configured on the server, or if `ref`/`test` is empty.\n\n---\n\n### Scene Matches Plan Check `check_scene_matches_plan`\n\n- **Provider:** local (vision: google/gemini-2.5-flash)\n- **Endpoint:** none (`execCheckSceneMatchesPlan` → `CheckSceneMatchesPlan` in `check_vision.go`)\n- **MCP action:** `qa` tool, `action: \"scene\"` → routes to `check_scene_matches_plan`. MCP maps `video`→`in` and passes `plan` through. Both `video` and `plan` are required at the MCP layer.\n- **Cost:** 1 credit\n- **Timeout:** `30s`\n\nChecks each shooting-plan field (`SET / LIGHT / SHOT_TYPE / ACTORS_ACTION`) against the image. For video input, extracts the mid-frame. Sends the plan as JSON + the image to Gemini (`VisionCheck`). Returns per-field `{verdict, reason}` under `fields`, plus overall `verdict` (`FAIL` if any field fails; the model is instructed to only judge fields present in the plan).\n\n**Parameters**\n\n| Name | Type | Req | Default | Notes |\n|---|---|---|---|---|\n| `in` | string | yes | — | Image or video path to check. Handler reads `inputs[\"in\"]`. |\n| `plan` | object | yes | — | Plan object with `SET, LIGHT, SHOT_TYPE, ACTORS_ACTION`. Handler errors if nil. |\n| `model` | string | no | `google/gemini-2.5-flash` | Vision model override. |\n\n**Mismatch notes:** YAML/handler fields match. Note the field name is `in` (not `video`/`image`); the MCP `scene` action takes `video` and remaps it.\n\n---\n\n### Image Description Check `check_image_description`\n\n- **Provider:** local (vision: google/gemini-2.5-flash)\n- **Endpoint:** none (`execCheckImageDescription` → `CheckImageDescription` in `check_vision.go`)\n- **MCP action:** `qa` tool, `action: \"image\"` → routes to `check_image_description`. MCP maps `image_url`→`in` and passes `description` through.\n- **Cost:** 1 credit\n- **Timeout:** `30s`\n\nSends an image + expected description to Gemini; the model judges whether the image matches. Local files are read and base64-encoded as a `data:image/png` URI; `http`-prefixed inputs are passed as-is. Uses structured output (`VisionCheckStructured` with a `verdict/match/reason/details` schema) and falls back to unstructured `VisionCheck` on error. Returns `verdict (PASS/FAIL)`, `match` (bool), `reason`, and `details` (found/missing elements).\n\n**Parameters**\n\n| Name | Type | Req | Default | Notes |\n|---|---|---|---|---|\n| `in` | string | yes | — | Image path (local) or URL. |\n| `description` | string | yes | — | Expected description text. |\n| `model` | string | no | `google/gemini-2.5-flash` | Vision model override. |\n\n**Mismatch notes:** YAML/handler fields match. Caveat: non-http paths are always encoded as `image/png` regardless of real extension — a `.jpg` is still sent with a PNG MIME label (works with Gemini, but technically mislabeled).\n\n---\n\n### Voice Consistency Check `check_voice_consistency`\n\n- **Provider:** local (vision/audio: google/gemini-2.5-flash)\n- **Endpoint:** none (`execCheckVoiceConsistency` → `CheckVoiceConsistency` in `check_audio.go`)\n- **MCP action:** `qa` tool, `action: \"voice\"` → routes to `check_voice_consistency`. MCP maps `audio`→`in`.\n- **Cost:** 1 credit\n- **Timeout:** `30s`\n\nExtracts N short (~3s) audio segments evenly across the file with ffmpeg, base64-encodes them as `data:audio/mpeg` URIs, and sends all segments to Gemini in one structured call to judge whether the same speaker (pitch, timbre, accent, style, gender, age impression) is present throughout. Returns `verdict (PASS/FAIL)`, `same_speaker` (bool), `issues` (list).\n\n**Parameters**\n\n| Name | Type | Req | Default | Notes |\n|---|---|---|---|---|\n| `in` | string | yes | — | Audio file (mp3/wav/aac). |\n| `segments` | integer | no | 3 | Number of segments to compare. Handler overrides only when `> 0`; internally re-clamps `<= 1` to 3. |\n| `model` | string | no | `google/gemini-2.5-flash` | Model override. |\n\n**Mismatch notes:** Undocumented short-circuit — audio under 2.0s returns PASS immediately with `note: \"audio too short to compare segments\"` (no API call). Needs ≥2 extractable segments or it errors.\n\n---\n\n### Transcript Check `check_transcript`\n\n- **Provider:** local (vision: google/gemini-2.5-flash + speech-to-text)\n- **Endpoint:** none (`execCheckTranscript` → `CheckTranscriptMatchesPlan` in `check_vision.go`)\n- **MCP action:** `qa` tool, `action: \"transcript\"` → accepts a video OR a pure audio URL plus an optional ISO-639-1 `language` hint and an optional `expected_text`. Omit `expected_text` for transcription-only mode (no compare, verdict `PASS`).\n- **Cost:** 1 credit (transcription + comparison)\n- **Timeout:** `2m`\n\nPipeline: extract audio (ffmpeg → mp3; skipped when the input is already audio) → transcribe via our STT step → compare to expected text via an LLM call. If `expected_text` is omitted, it runs in transcription-only mode: no comparison, verdict `PASS`. Returns `actual_transcript`, `duration_sec`, `segments` (`[{start_s, end_s, text}]`), `segment_count`, and — when comparing — `similarity_pct`, `missing_words`, `extra_words`, `verdict`. PASS at `similarity >= 80%`. On any LLM/parse error it falls back to `simpleTranscriptCompare` (word overlap).\n\n**Parameters**\n\n| Name | Type | Req | Default | Notes |\n|---|---|---|---|---|\n| `video` | string | yes | — | Video **or pure audio** file path/URL; audio extracted automatically when a video is given. |\n| `expected_text` | string | no | — | Expected voiceover text. **Optional** — omit → transcription-only mode (no compare, verdict `PASS`). |\n| `language` | string | no | — | ISO-639-1 code passed to the STT step (improves accuracy). |\n| `vision_model` | string | no | `google/gemini-2.5-flash` | LLM for semantic comparison. |\n\n**Mismatch notes:** The standalone check does a real LLM comparison (`vision.VisionCheck`), whereas the same check inside `qa_full` uses word-overlap only — the two paths differ.\n\n---\n\n### Video Description `describe_video`\n\n- **Provider:** local (multimodal analysis model)\n- **Endpoint:** none (`execDescribeVideo` in `describe_video.go`)\n- **MCP action:** `qa` tool, `action: \"describe\"` → routes to `describe_video`. MCP maps `video`→`in` and passes `fps`/`focus` through.\n- **Cost:** ≈1 credit per 25 s of video at `fps: 1`, scales with `fps`; minimum 1 credit.\n- **Timeout:** `5m`\n\nWatches the whole video and returns a timecoded, scene-by-scene breakdown. The segments partition the video at scene changes (cuts, location changes, clear changes of action); each segment reports `start_s`/`end_s`, `scene` (what visually happens), `speech` (transcribed words, `\"\"` if none), `sounds` (notable SFX/ambient), and `music` (`\"\"` if none). Async: the call returns a `job_id` — poll `get_status` (~every 15 s; a typical run takes 1–3 minutes), then read `segments` and `segment_count` from the result.\n\n**Parameters**\n\n| Name | Type | Req | Default | Notes |\n|---|---|---|---|---|\n| `video` | string | yes | — | Video URL. Sent to the model as `in`. Max duration 1 hour, and duration × fps must not exceed 3600 (fps 1 → up to 60 min, fps 5 → up to 12 min); larger inputs are rejected. |\n| `fps` | integer | no | 1 | Frames sampled per second (1–5). Raise for fast-cut footage; cost scales with `fps`. |\n| `focus` | string | no | — | Extra instruction (≤2000 chars), e.g. \"focus on product shots\" or an expected-shot list to check against. |\n\n**Notes:** the analysis model is pinned server-side (no caller override). Segment text fields are length-capped and the segment list is bounded, so very long or unusual videos return a trimmed but well-formed result.\n\n### `check_audio_loudness`\n\n- **Provider:** local (ffmpeg `loudnorm`)\n- **Display name:** Audio Loudness Check\n- **Category / mode:** qa_check / sync\n- **Cost:** free (`cost_per_unit: 0`)\n- **Timeout:** 30s\n- **MCP action:** none (internal-only; REST `POST /v1/jobs/check_audio_loudness` or via `qa_full`)\n- **Handler:** `execCheckAudioLoudness` → `CheckAudioLoudness` (`check_audio.go`)\n\nMeasures integrated loudness and true peak with a single ffmpeg `loudnorm=print_format=json` analysis pass, parses the JSON from ffmpeg stderr (`input_i`, `input_tp`, `input_lra`).\n\n| Param | Type | Req | Default | Notes |\n|---|---|---|---|---|\n| `in` | string | yes | — | Audio file path/URL to check (materialized & SSRF-checked by the executor). |\n| `target_lufs` | number | no | -14 | Target integrated LUFS. **Cannot be set to literal 0** — handler treats 0 as \"unset\" and substitutes -14. |\n| `tolerance` | number | no | 3 | Allowed deviation in LU. 0 → coerced to 3. |\n| `max_true_peak` | number | no | -1 | Max true peak in dBTP. 0 → coerced to -1. |\n\n**Verdict:** PASS if `|integrated - target| <= tolerance` AND `true_peak <= max_true_peak`, else FAIL.\n**Metrics:** `lufs_integrated`, `true_peak_db`, `lra`, plus echoed `target_lufs`/`tolerance`/`max_true_peak`.\n\n> Note: handler coerces any 0-valued numeric param to its default (see code-vs-YAML mismatches). If the loudnorm JSON block is missing from stderr the call errors instead of returning a verdict.\n\n---\n\n### `check_audio_structural`\n\n- **Provider:** local (ffprobe)\n- **Display name:** Audio Structural Check\n- **Category / mode:** qa_check / sync\n- **Cost:** free (`cost_per_unit: 0`)\n- **Timeout:** 30s\n- **MCP action:** none (internal-only; REST `POST /v1/jobs/check_audio_structural` or via `qa_full`)\n- **Handler:** `execCheckAudioStructural` → `CheckAudioStructural` (`check_audio.go`), via `Probe` (ffprobe)\n\nProbes the file, finds the first audio stream, and checks duration and codec.\n\n| Param | Type | Req | Default | Notes |\n|---|---|---|---|---|\n| `in` | string | yes | — | Audio file path/URL to check. |\n\n**Verdict:** FAIL if no streams / no audio stream, OR duration &lt; 1.0s, OR codec not in `{mp3, aac, pcm_s16le, flac, vorbis, opus}`; else PASS.\n**Metrics:** `duration_sec`, `sample_rate`, `channels`, `codec`, `bitrate_kbps`. Failing reasons listed in `issues`.\n\n> Note: YAML/prompt_guide name the metrics `duration` and `bitrate`; handler emits `duration_sec` and `bitrate_kbps` (= ffprobe `bit_rate` / 1000). Sample-rate and channel values are reported but never cause a FAIL.\n\n---\n\n### `check_audio_tail`\n\n- **Provider:** local (ffmpeg `volumedetect`)\n- **Display name:** Audio Tail Check\n- **Category / mode:** qa_check / sync\n- **Cost:** free (`cost_per_unit: 0`)\n- **Timeout:** 30s\n- **MCP action:** none (internal-only; REST `POST /v1/jobs/check_audio_tail` or via `qa_full`)\n- **Handler:** `execCheckAudioTail` → `CheckAudioTail` (`check_audio.go`)\n\nDetects an abrupt cut-off at the end of audio (the \"v1 VO bug\"). Splits the trailing `tail_sec` window in two and compares per-half RMS measured with ffmpeg `volumedetect` (`mean_volume`).\n\n| Param | Type | Req | Default | Notes |\n|---|---|---|---|---|\n| `in` | string | yes | — | Audio file path/URL to check. |\n| `tail_sec` | number | no | 1.0 | Seconds of tail to analyze. `<= 0` → coerced to 1.0; clamped down to total duration if shorter. |\n| `silence_db` | number | no | -40 | RMS dB threshold below which the tail counts as silent. **Cannot be set to literal 0** — 0 → coerced to -40. |\n\n**Verdict:** PASS if `rms_second_half <= silence_db` (silent) OR `rms_second_half < rms_first_half * 0.7` (fading); else FAIL (\"tail not fading\").\n**Metrics:** `tail_sec`, `silence_db`, `rms_first_half`, `rms_second_half`, `is_silent`, `is_fading`, `total_duration`.\n\n> Note: YAML prose says PASS when the second half is merely \"quieter\"; the handler is stricter and requires a ≥30% RMS drop (`* 0.7`). An unmeasurable half returns -100 dB (treated as silent → PASS).\n\n---\n\n### `check_motion_artifacts`\n\n- **Provider:** local (ffmpeg `signalstats` YDIF)\n- **Display name:** Motion Artifacts Check\n- **Category / mode:** qa_check / sync\n- **Cost:** free (`cost_per_unit: 0`)\n- **Timeout:** 2m\n- **MCP action:** none (internal-only; REST `POST /v1/jobs/check_motion_artifacts` or via `qa_full`)\n- **Handler:** `execCheckMotionArtifacts` → `CheckMotionArtifacts` (`check_video.go`)\n\nScans for frame-to-frame luminance-difference spikes that indicate glitches or unintended jump cuts. Parses YDIF from an ffmpeg `signalstats=stat=tout` pass, computes mean, and flags frames where `diff > mean * spike_factor`.\n\n| Param | Type | Req | Default | Notes |\n|---|---|---|---|---|\n| `in` | string | yes | — | Video file path/URL to check. |\n| `spike_factor` | number | no | 4 | A frame whose diff exceeds `mean * spike_factor` is a spike. `<= 0` → coerced to 4. Lower = stricter. |\n\n**Verdict:** PASS if `spikes_count <= 1` (a single spike can be a legitimate transition); FAIL if `> 1`.\n**Metrics:** `frames_checked`, `mean_diff`, `max_diff`, `stddev`, `spike_factor`, `spikes_count`, `spike_frames`.\n\n> Note: the handler runs an extra `mestimate`+`metadata=print` pass whose output is discarded — only the `signalstats` YDIF pass is used. If no YDIF lines parse, it returns PASS with a `could not extract frame differences` warning. `spike_frames` are indices into the parsed YDIF list, not absolute video frame numbers.\n\n---\n\n### `overexposure_check`\n\n- **Provider:** local (ffmpeg `signalstats` BRNG)\n- **Display name:** Overexposure Check\n- **Category / mode:** qa_check / sync\n- **Cost:** free (`cost_per_unit: 0`)\n- **Timeout:** 2m\n- **MCP action:** none (internal-only; REST `POST /v1/jobs/overexposure_check` or via `qa_full`)\n- **Handler:** `execOverexposureCheck` → `CheckOverexposure` (`overexposure.go`)\n\nDetects blown-out highlights in an image or video. Samples frames at `sample_fps` and reads `signalstats` BRNG (percent of pixels outside broadcast range) as the clipped-pixel proxy, taking the worst sampled frame.\n\n| Param | Type | Req | Default | Notes |\n|---|---|---|---|---|\n| `in` | string | yes | — | Image or video path/URL to check. |\n| `max_clipped_pct` | number | no | 3.0 | Max % of clipped pixels before FAIL. `<= 0` → coerced to 3.0. |\n| `sample_fps` | number | no | 2 | Frames per second to sample (video). `<= 0` → coerced to 2. Read as an int. |\n\n**Verdict:** PASS if `worst_frame_pct <= max_clipped_pct`; else FAIL (suggested fix: apply `highlight_rolloff`, then re-check).\n**Metrics:** `worst_frame_pct`, `max_clipped_pct`, `frames_checked`, `max_brng`.\n\n> Note: the YAML describes \"clipped pixels at max luminance\", but the handler measures BRNG (broadcast-range %), not a true white-clip count — `worst_frame_pct` is a proxy. A discarded `histogram` pass runs first. If `signalstats` returns no BRNG frames, the handler returns PASS with a `signalstats not available` warning (`frames_checked: 0`), which can mask genuine overexposure.\n"},{"id":"reference/models/video","section":"reference","title":"Video models","url":"https://docs.framehood.ai/reference/models/video","text":"# Video models\n\nVideo generation, image-to-video, editing, swap, and upscaling — model input schemas.\n\n> Generations are charged in credits (see [Credits & plans](/guide/billing)). Every generation model also accepts `mock: true` for a free placeholder result.\n\n### Seedance 2.0 Reference-to-Video `seedance_r2v`\n\nByteDance's reference-to-video model that generates a clip from a text prompt plus up to 9 reference images, 3 videos, and 3 audio clips for identity, motion, and voice consistency. Output up to native 4K.\n\n**Call it via** — `video` tool, `action: \"create\"` (text→video; optional `reference_images`, `video_urls`, `audio_urls`) · raw: `POST /v1/jobs/seedance_r2v`\n\n| | |\n|---|---|\n| **Cost** | 303 cr per call (5 s at the default 720p). Scales with resolution: 480p ≈ 135 cr, 1080p 681 cr, 4K 1555 cr per 5 s |\n| **Mode / timeout** | webhook / 15m |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `prompt` | string | ✓ | — | — | Text prompt used to generate the video. Refer to references as @Image1, @Video1, @Audio1. |\n| `image_urls` | list&lt;string&gt; | | — | up to 9; JPEG/PNG/WebP; ≤30 MB each | Reference images. Refer to them as @Image1, @Image2… Total files across all modalities ≤ 12. |\n| `video_urls` | list&lt;string&gt; | | — | up to 3; MP4/MOV; combined 2–15 s; total &lt;50 MB; each ~480p (640×640) to ~720p (834×1112) | Reference videos. Refer to them as @Video1, @Video2… |\n| `audio_urls` | list&lt;string&gt; | | — | up to 3; MP3/WAV; combined ≤15 s; ≤15 MB each | Reference audio. Refer to them as @Audio1… If audio is provided, at least one reference image or video is required. |\n| `resolution` | enum | | `720p` | `480p`, `720p`, `1080p`, `4k` | 480p for cheap drafts (~0.45× credits), 720p default, 1080p for final delivery (2.25×), 4k for hero shots (~5.1×). |\n| `duration` | enum | | `auto` | `auto`, `4`–`15` | Duration in seconds, or auto to let the model decide. |\n| `aspect_ratio` | enum | | `auto` | `auto`, `21:9`, `16:9`, `4:3`, `1:1`, `3:4`, `9:16` | Aspect ratio of the generated video. When omitted, our wrapper applies its vertical preset (`9:16`) — pass `auto` explicitly to follow the reference images' geometry. |\n| `generate_audio` | boolean | | `true` | — | Generate synchronized audio (SFX, ambient, lip-synced speech). Cost is the same either way. |\n| `bitrate_mode` | enum | | `standard` | `standard`, `high` | Output bitrate mode; `high` requests a higher-quality, larger-file encode. |\n| `end_user_id` | string | | — | — | Unique ID of the end user. |\n\nOur wrapper params (not part of the model input schema): `out` (required — workdir-relative output path), `mock` (optional — test placeholder), and `format` (optional — size preset `shorts`/`reels`/`horizontal`, mapped by our `format_field`/`format_mapping` to the model's `aspect_ratio`: shorts/reels→`9:16`, horizontal→`16:9`, default `9:16`).\n\n**Limits** — prompt: text only. image_urls: max 9 images, JPEG/PNG/WebP, ≤30 MB each. video_urls: max 3 videos, MP4/MOV, combined 2–15 s, total &lt;50 MB, each between ~480p (640×640) and ~720p (834×1112). audio_urls: max 3 files, MP3/WAV, combined ≤15 s, ≤15 MB each; requires at least one image or video reference. Total reference files across all modalities ≤ 12. Output resolution up to native 4K; duration 4–15 s (or auto). No seed input — every render is a new take.\n\n### Kling v3 Standard Image-to-Video `kling_v3_std_i2v`\n\nImage-to-video at standard quality with cinematic visuals, fluid motion, native audio generation, and custom element support — use for quick drafts and iterations before pro renders.\n\n**Call it via** — `image` tool, `action: \"animate\"`, `tier: \"standard\"` (the default animate tier) · raw: `POST /v1/jobs/kling_v3_std_i2v`\n\nThe `image(animate)` tool exposes the multi-shot timeline directly: pass `multi_prompt` (an array of `{prompt, duration}` shots) and optional `shot_type` instead of a single `prompt`. The tool validates Kling's caps before submitting — **at most 6 shots and a combined duration ≤ 15 s** (each shot 1–15 s, default 5) — and rejects `prompt` + `multi_prompt` together.\n\n| | |\n|---|---|\n| **Cost** | 84 cr per call |\n| **Mode / timeout** | webhook / 15m |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `start_image_url` | string | ✓ | — | — | URL of the image used as the starting frame of the video. |\n| `prompt` | string |  | — | maxLength 2500 | Text prompt for video generation. Either `prompt` or `multi_prompt` must be provided, but not both. |\n| `multi_prompt` | array&lt;object> |  | — | items: `{ prompt: string (required), duration: string default \"5\", enum \"1\"–\"15\" }` | List of prompts for multi-shot generation; divides the video into multiple shots. |\n| `duration` | string |  | `\"5\"` | `\"3\"`,`\"4\"`,`\"5\"`,`\"6\"`,`\"7\"`,`\"8\"`,`\"9\"`,`\"10\"`,`\"11\"`,`\"12\"`,`\"13\"`,`\"14\"`,`\"15\"` | Duration of the generated video in seconds. |\n| `generate_audio` | boolean |  | `true` | — | Generate native audio for the video. Supports Chinese/English; other languages auto-translated to English. |\n| `end_image_url` | string |  | — | — | URL of the image used as the end frame of the video. |\n| `elements` | array&lt;object> |  | — | items: `{ frontal_image_url, reference_image_urls (1–3, ≥1 required), video_url, voice_id }` | Characters/objects to inject. Each entry is either an image set (frontal + reference images) or a video. Reference in prompt as `@Element1`, `@Element2`, etc. Only one element may carry a video. |\n| `shot_type` | string |  | `\"customize\"` | `customize`, `intelligent` | Multi-shot generation type; `intelligent` lets the model auto-determine shot structure. |\n| `negative_prompt` | string |  | `\"blur, distort, and low quality\"` | maxLength 2500 | What to steer away from. |\n| `cfg_scale` | number |  | `0.5` | 0–1 | Classifier-Free Guidance scale — how strictly the model follows the prompt. |\n\nOur wrapper params (not part of the model input schema): `out` (required — output filename) and `mock` (optional — test placeholder). `format` is accepted by our `image` MCP tool but is NOT forwarded to this model (the model has no size/aspect field; YAML `format_field` is empty), so it has no effect here.\n\n**Limits** (model limits):\n- Prompt / negative_prompt: max 2500 characters each.\n- Duration: 3–15 s (top-level); multi-shot element duration 1–15 s.\n- `start_image_url` / `end_image_url` / element images: max file size 10 MB, min 300×300 px, aspect ratio 0.40–2.50; accepted formats jpg, jpeg, png, webp, gif, avif.\n- Element `video_url`: max 200 MB, 720–2160 px per side, 3–10.05 s, 24–60 FPS; accepted formats mp4, mov, webm, m4v, gif.\n- Element `reference_image_urls`: 1–3 images, at least one required.\n\n### Kling v3 Pro Image-to-Video `kling_v3_pro_i2v`\n\nTop-tier image-to-video with cinematic visuals, fluid motion, native audio generation, and custom element (character/object) injection.\n\n**Call it via** — MCP tool `image`, action `animate` with `tier: \"pro\"` (routes `animate_pro` → `kling_v3_pro_i2v`) · raw: `POST /v1/jobs/kling_v3_pro_i2v`\n\nThe `image(animate)` tool exposes the multi-shot timeline directly: pass `multi_prompt` (an array of `{prompt, duration}` shots) and optional `shot_type` instead of a single `prompt`. The tool validates Kling's caps before submitting — **at most 6 shots and a combined duration ≤ 15 s** (each shot 1–15 s, default 5) — and rejects `prompt` + `multi_prompt` together. Billed per second (no per-shot surcharge).\n\n| | |\n|---|---|\n| **Cost** | 112 cr per call |\n| **Mode / timeout** | webhook / 15m (from our YAML) |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `start_image_url` | string | ✓ | — | Max 10MB; min 300×300px; aspect ratio 0.40–2.50 | URL of the start frame image. Aspect ratio of the output is inferred from this image. |\n| `prompt` | string | — | — | maxLength 2500 | Text prompt. Either `prompt` or `multi_prompt` must be provided, but not both. |\n| `multi_prompt` | `KlingV3MultiPromptElement[]` | — | — | array of `{prompt (req), duration}` | Multi-shot prompt list; divides the video into shots. Overrides `prompt`. Each shot `duration` enum `\"1\"`–`\"15\"`, default `\"5\"`. |\n| `duration` | string (enum) | — | `\"5\"` | `\"3\"`,`\"4\"`,`\"5\"`,`\"6\"`,`\"7\"`,`\"8\"`,`\"9\"`,`\"10\"`,`\"11\"`,`\"12\"`,`\"13\"`,`\"14\"`,`\"15\"` | Total video length in seconds. |\n| `generate_audio` | boolean | — | `true` | — | Generate native audio (Chinese/English native; other languages auto-translated to English). |\n| `end_image_url` | string \\| null | — | — | Max 10MB; min 300×300px; aspect ratio 0.40–2.50 | Optional end frame image URL (start-to-end interpolation). |\n| `elements` | `KlingV3ComboElementInput[]` \\| null | — | — | array | Reference characters/objects to inject. Each item is an image set (`frontal_image_url` + `reference_image_urls`) or a video (`video_url`), with optional `voice_id`. Reference in prompt as `@Element1`, `@Element2`. |\n| `shot_type` | string (enum) | — | `\"customize\"` | `customize`, `intelligent` | Multi-shot generation type; `intelligent` lets the model auto-plan shot structure. |\n| `negative_prompt` | string | — | `\"blur, distort, and low quality\"` | maxLength 2500 | Things to avoid. |\n| `cfg_scale` | number | — | `0.5` | 0–1 | Classifier-free guidance scale; higher = stricter prompt adherence. |\n\n`elements[]` sub-fields: `frontal_image_url` (string, main view), `reference_image_urls` (string[], 1–3 images from different angles, at least one required when using image elements), `video_url` (string, max one video element per request), `voice_id` (string; voice binding supported only for video elements, not image elements).\n\nOur wrapper params (not part of the model input schema): `out` (required — workdir-relative output path), `mock` (optional — test placeholder). We do not map a `format` field — there is no model size/aspect_ratio parameter; aspect ratio is inferred from `start_image_url` (`format_field: \"\"`).\n\n**Limits** — model limits:\n- Video duration: 3–15 seconds (single-prompt `duration`); per-shot `multi_prompt` duration 1–15s; shot durations sum to total length.\n- `prompt` / `negative_prompt`: max 2500 characters each.\n- `start_image_url` / `end_image_url` / element images: max 10 MB; min 300×300 px; aspect ratio 0.40–2.50; formats jpg, jpeg, png, webp, gif, avif.\n- Element `reference_image_urls`: 1–3 images.\n- Element `video_url`: max 200 MB; 720–2160 px; 3.0–10.05 s; 24–60 fps; formats mp4, mov, webm, m4v, gif; max one video element per request.\n- Audio: native Chinese and English; other languages auto-translated to English.\n- Cost: ≈22 cr/s (audio off, the catalog default), ≈34 cr/s (audio on).\n\n### Kling O3 Video Edit `kling_o3_video_edit`\n\nVideo-to-video editing with Kling O3 — restyle footage, replace characters/objects, or insert elements into a source video using reference images and structured element definitions.\n\n**Call it via** — `video` tool, action `edit_ref` (`video(edit_ref)` — requires `video_url`, `prompt`, `reference_images`) · raw: `POST /v1/jobs/kling_o3_video_edit`\n\n| | |\n|---|---|\n| **Cost** | 126 cr per call |\n| **Mode / timeout** | webhook / 15m |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `prompt` | string | ✓ | — | maxLength 2500 | Text prompt for the edit. Reference the source video as `@Video1`, elements as `@Element1`–`@ElementN`, and reference images as `@Image1`–`@ImageN`. |\n| `video_url` | string | ✓ | — | .mp4/.mov only; 720–2160px; 3.0–10.05s; 24–60 FPS; ≤200MB | Reference (source) video URL to edit. |\n| `image_urls` | string[] \\| null | — | null | each image ≤10MB, ≥300×300px, aspect 0.40–2.50 | Reference images for style/appearance, cited in prompt as `@Image1`, `@Image2`, … Max 4 total (elements + reference images) when using video. |\n| `keep_audio` | boolean | — | `true` | true / false | Keep the original audio from the source video. |\n| `elements` | object[] \\| null | — | null | array of `{ frontal_image_url: string, reference_image_urls: string[] (1–3) }` | Elements (characters/objects) to inject, cited in prompt as `@Element1`, `@Element2`. Each element needs a frontal image and 1–3 reference images (per-image limits same as `image_urls`). |\n| `shot_type` | string | — | `customize` | const `customize` | Multi-shot generation type (only `customize` is accepted). |\n\nOur wrapper params (not part of the model input schema): `out` (required — workdir-relative output path), `mock` (optional — skip the API call and return a placeholder). This model has no `format` mapping (no model size field). Our `video(edit_ref)` action collects reference photos under `reference_images` and maps them to the model's `image_urls` field; the optional `elements` argument passes through to the model's `elements` input (cite as `@Element1`).\n\n**Limits** — prompt ≤2500 chars · source video .mp4/.mov, 3.0–10.05s, 720–2160px, 24–60 FPS, ≤200MB · reference/element images ≤10MB each, min 300×300px, aspect ratio 0.40–2.50 · max 4 total (elements + reference images) when using video.\n\n### PixVerse Swap `pixverse_swap`\n\nGenerate high-quality video clips by swapping a person, object, or background in source footage using a reference image — keyframe-based, prompt-free.\n\n**Call it via** — `video` tool, action `swap` (routes to `pixverse_swap`) · raw: `POST /v1/jobs/pixverse_swap`\n\n| | |\n|---|---|\n| **Cost** | 30 cr per call |\n| **Mode / timeout** | webhook / 15m (from our YAML) |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `video_url` | string | ✓ | — | URL | URL of the external video to swap. |\n| `image_url` | string | ✓ | — | URL | URL of the target image for swapping (the element to swap IN). |\n| `mode` | string | | `person` | `person`, `object`, `background` | The swap mode to use. |\n| `keyframe_id` | integer | | `1` | min `1`, max = `duration_seconds × 24` | Keyframe ID for face/object mapping. Input video is normalized to 24 FPS, so keyframe 1 = first frame, keyframe 24 = 1s in. |\n| `resolution` | string | | `720p` | `360p`, `540p`, `720p` | Output resolution (1080p not supported). |\n| `original_sound_switch` | boolean | | `true` | true / false | Whether to keep the original audio. |\n| `seed` | integer \\| null | | `null` | any integer | Random seed for generation. |\n\nOur wrapper params (not part of the model input schema): `out` (required — workdir-relative output path), `mock` (optional — skip the API call and return a placeholder for testing). This model does not use our `format`→size mapping (`format_field` is empty).\n\n**Limits**:\n- Input video formats: MP4, MOV, WebM, M4V, GIF.\n- Reference image formats: JPG, JPEG, PNG, WebP, GIF, AVIF.\n- Resolution: 360p / 540p / 720p (1080p listed but not supported).\n- Cost is per 5-second clip; videos longer than 5s cost double. Best quality on clips under ~10 seconds.\n- `keyframe_id` upper bound is `duration_seconds × 24` (24 FPS normalized).\n\n### Wan 2.7 Video Edit `wan_27_video_edit`\n\nVideo-to-video editing driven by a text instruction (and optional reference image) — restyle, transform scenes, or apply style transfer to existing footage using WAN 2.7.\n\n**Call it via** — `video` tool, `action: \"edit\"` (restyle existing footage) · raw: `POST /v1/jobs/wan_27_video_edit`\n\n| | |\n|---|---|\n| **Cost** | 100 cr per call |\n| **Mode / timeout** | webhook / 15m |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `prompt` | string | ✓ | — | minLength 1 | Editing instruction or style-transfer description. |\n| `video_url` | string | ✓ | — | MP4/MOV, 2–10s, ≤100 MB | URL of the input video to edit. |\n| `reference_image_url` | string (nullable) | | null | jpg/jpeg/png/webp/gif/avif | Reference image URL for reference-based editing. |\n| `resolution` | string | | `1080p` | `720p`, `1080p` | Output video resolution tier. |\n| `aspect_ratio` | string (nullable) | | null (matches input) | `16:9`, `9:16`, `1:1`, `4:3`, `3:4` | Aspect ratio of the generated video; defaults to the input video's. |\n| `duration` | integer | | `0` | `0`, `2`–`10` | Output duration in seconds. `0` = match input; when set (2–10) truncates from the start. |\n| `audio_setting` | string | | `auto` | `auto`, `origin` | Audio handling. `auto`: model decides whether to regenerate audio. `origin`: preserve original audio. |\n| `seed` | integer (nullable) | | null | 0–2147483647 | Random seed for reproducibility. |\n| `enable_safety_checker` | boolean | | `true` | `true` / `false` | Enable content moderation on input and output. |\n\nWrapper params (our API, not part of the model input schema): `out` (required — workdir-relative output filename), `mock` (optional — return a test placeholder, skips the model call). This model defines `format_field: \"\"`, so there is no `format` → model-size mapping.\n\n**Limits** — Source video: MP4/MOV, duration 2–10 s, max file size 100 MB (upload timeout 30 s). Reference image formats: jpg, jpeg, png, webp, gif, avif. Output duration: 0 (match input) or 2–10 s. Output resolution: 720p or 1080p. Seed range: 0–2147483647.\n\n### Topaz Video Upscale `topaz_upscale_video`\n\nProfessional-grade video upscaling and enhancement using Topaz technology — upscale resolution, interpolate frames, and clean up noise/compression artifacts.\n\n**Call it via** — `video` tool, action `upscale` (pass `video_url`) · raw: `POST /v1/jobs/topaz_upscale_video`\n\n| | |\n|---|---|\n| **Cost** | 100 cr per call |\n| **Mode / timeout** | webhook / 15m |\n\n**Parameters** — the model's input schema:\n\n| Param | Type | Required | Default | Allowed / range | Description |\n|---|---|---|---|---|---|\n| `video_url` | string | ✓ | — | — | URL of the video to upscale. |\n| `model` | string | | `Proteus` | `Proteus`, `Artemis HQ`, `Artemis MQ`, `Artemis LQ`, `Nyx`, `Nyx Fast`, `Nyx XL`, `Nyx HF`, `Gaia HQ`, `Gaia CG`, `Gaia 2`, `Starlight Precise 1`, `Starlight Precise 2`, `Starlight Precise 2.5`, `Starlight HQ`, `Starlight Mini`, `Starlight Sharp`, `Starlight Fast 1`, `Starlight Fast 2` | Enhancement model. Proteus = most videos; Artemis = denoise+sharpen; Nyx = dedicated denoising; Gaia HQ/CG = rendered content; Gaia 2 = animation/motion graphics at 2x; Starlight = generative diffusion-based upscaling. |\n| `upscale_factor` | number | | `2` | 1–4 | Factor to upscale by (e.g. 2.0 doubles width and height). |\n| `target_fps` | integer | | — (null) | 16–60 | Target FPS for frame interpolation. If set, interpolation is enabled. |\n| `compression` | number | | — (null, model-dependent) | 0.0–1.0 | Compression artifact removal level. |\n| `noise` | number | | — (null, model-dependent) | 0.0–1.0 | Noise reduction level. |\n| `halo` | number | | — (null, model-dependent) | 0.0–1.0 | Halo reduction level. |\n| `grain` | number | | — (null, model-dependent) | 0.0–0.1 (step 0.01) | Film grain amount. |\n| `recover_detail` | number | | — (null) | 0.0–1.0 | Recover original detail; higher preserves more original detail. |\n| `H264_output` | boolean | | `false` | true / false | Use H264 codec for output. Default (false) = H265. |\n\nOur wrapper params (not part of the model input schema): `out` (required — workdir-relative output path), `mock` (optional — test placeholder). This model has no `format` mapping (`format_field` is empty).\n\n**Limits** — accepted input formats: mp4, mov, webm, m4v, gif. Max `upscale_factor` 4x; `target_fps` capped at 60. Pricing scales with duration and resolution: 2 cr/sec up to 720p, 4 cr/sec for 720p–1080p, 16 cr/sec above 1080p; price doubles for 60fps output; Gaia 2 costs half. (No published max duration / resolution / file-size limit.)\n"},{"id":"reference/tools","section":"reference","title":"Tools","url":"https://docs.framehood.ai/reference/tools","text":"# Tools\n\nFramehood exposes a small, functional toolset over [MCP](/guide/mcp). You think\nin outcomes; the server picks the model. Every generation tool requires an `out`\nfilename.\n\n::: tip Discover at runtime\nUse `models(list)` to browse reachable models and `models(guide, model=…)` for a\nmodel's parameters and prompting tips (or `GET /v1/models/{kind}` for the raw\ninput schema).\n:::\n\n## What the MCP server exposes\n\nThe core generation and account tools:\n\n`image` · `video` · `audio` · `qa` · `files` · `billing` · `org` · `get_status`\n\nTwo helpers — browse the catalog and send feedback:\n\n`models` · `submit_feedback`\n\nThree more manage your workspace:\n\n`library` · `project` · `api_keys`\n\n::: warning Actors temporarily disabled\nThe `actor` tool and the actor-dependent actions (`image(actor_sheet)`,\n`video(scene)`, and `actor_id` routing on `image` / `video`) are **temporarily\ndisabled** while we rework them. The rest of the toolset is unaffected.\n:::\n\nThe [CLI](/guide/cli) and the [REST API](/reference/api) wrap these same tools\nand add a few surface-only endpoints (raw model access, uploads, and the Stripe\ncard/portal billing flows) — see [CLI / REST surface](#cli-rest-surface) below.\n\n## Result shape\n\nEvery MCP tool result carries a small, predictable envelope on top of its own\nfields: `ok` (boolean), `result_kind` (`\"job\"` for an async generation still in\nflight or its terminal outcome, `\"result\"` for an immediate data/management\nreply, or `\"error\"`), plus `tool` and `action` identifying the call. Nothing\nabout a tool's existing response changes shape or gets removed — the envelope\nis purely additive, so code written against the old response keeps working\nunchanged.\n\nJob-shaped results (`image`, `video`, `audio`, `qa`, and `get_status` for a\n**single** job — the default `status` action, or `cancel`) also carry\n`artifacts[]` — a normalized `[{url, type?}, ...]` list of the job's output\nfiles — alongside the original `outputs` map, which is unchanged. Use whichever\nis more convenient: `outputs.image_url` etc. for the legacy shape, or\n`artifacts[]` to iterate output files generically regardless of media type.\n\n`get_status`'s batch (`job_ids`) and `list` actions are `result_kind: \"result\"`,\nnot `\"job\"` — the envelope describes the call as a whole (summary-only, no\ntop-level `artifacts[]`); the individual entries inside `jobs[]` keep their\nplain `{job_id, status, ...}` shape (and their own `outputs`, if terminal).\n\nThree optional steering fields may also appear (each omitted entirely when\nthere is nothing to say):\n\n- `note` (string) — advice about this specific result, e.g. \"prompt was\n  truncated\" or \"URL expires in 24h\".\n- `next_step` (object) — the recommended next call, machine-readable:\n  `{tool, action?, args?, why}`. `args` carries concrete values (e.g.\n  `{\"job_id\": \"job_abc123\"}`) you can pass verbatim. Non-terminal job results\n  carry a `get_status` poll here; some errors carry a recovery call on top of\n  the prose `hints`. **Breaking (server 2.12.0):** `next_step` used to be a\n  prose string on submit/poll replies — it is now always this object on MCP.\n- `notifications[]` (`{type, text, id?, url?}`, max 3) — global account-level\n  notices, independent of the tool called. Currently always absent; the\n  channel ships ahead of its producers.\n\n## MCP tools\n\n### image\n\nCreate and edit still images (and image→video).\n\n| Action | Key arguments | Does |\n|--------|---------------|------|\n| `create` | `prompt`, `out`, `tier?`, `format?` | text → image |\n| `edit` | `image_url`, `prompt`, `out` | modify an image |\n| `upscale` | `image_url`, `out` | higher resolution |\n| `animate` | `image_url`, `out`, `duration?` | image → short video |\n| `actor_sheet` | `actor_id`, `out_prefix`, `variations?` | multi-angle character sheet |\n\n`tier`: `draft` < `fine` < `photo`. `actor_id` (`act_…`) routes `create`/`animate`\nthrough an actor's LoRA. Some models are reachable only by name — e.g. Nano Banana\nPro: `image(create, model=\"nano_banana\", …)` (it is not a tier); see the\n[`models`](#models) tool.\n\n### video\n\nGenerate, edit, and compose video.\n\n| Action | Key arguments | Does |\n|--------|---------------|------|\n| `create` | `prompt`, `out`, `reference_images?`, `duration?`, `format?` | text → video (no actor needed) |\n| `edit` | `video_url`, `prompt`, `out`, `duration?` | restyle / edit footage |\n| `edit_ref` | `video_url`, `prompt`, `reference_images`, `out` | edit guided by reference photos (`@Image1…`) |\n| `swap` | `video_url`, `image_url`, `swap_mode`, `out` | replace a person, object, or background |\n| `lipsync` | `video_url`, `audio_url`, `out` | match lips to an audio track |\n| `captions` | `video_url`, `out` | burn in auto subtitles |\n| `upscale` | `video_url`, `out` | higher resolution |\n| `assemble` | `clips`, `out`, audio bed (`music?`, `vo?`, `vo_offset_sec?`, `sfx_ambient?` …) | combine clips + transitions and mix an audio bed into one video |\n| `mix_audio` | `video_url`, `tracks`, `out` | overlay VO / music / SFX onto a video |\n| `scene` | `actor_id`, `scene_prompt`, `out` | actor full-scene composite (image → motion → optional speech) |\n\n`swap_mode`: `person` · `object` · `background`. In `edit_ref`, the images you\npass in `reference_images` are cited in the prompt as `@Image1…@Image4`. `scene`\nroutes through an actor (`act_…`) and can auto-script a spoken line from the\nactor's personality.\n\n### audio\n\n| Action | Key arguments | Does |\n|--------|---------------|------|\n| `speak` | `text`, `out`, `voice?` | text → voice |\n| `sfx` | `prompt`, `out` | sound effects |\n| `music` | `prompt`, `out`, `lyrics?` | music / ambient |\n| `mix` | `tracks` (2+; or 1+ with `music?`/`music_level?`), `out` | blend audio files — flat mix, or a music bed ducked under the voice |\n| `trim` | `audio`, `start_sec?`, `duration_sec?`, `out` | cut an audio file to a window |\n| `concat` | `tracks`, `out` | join audio in sequence |\n\nTo overlay audio onto a video, use `video(mix_audio)`.\n\n### qa\n\nInspect media for problems. Returns a pass/fail report.\n\n| Action | Key arguments | Does |\n|--------|---------------|------|\n| `full` | `video`, `plan?` | run nine checks at once |\n| `person` | `image1`, `image2` | same person in two images? |\n| `voice` | `audio` | consistent voice? |\n| `scene` | `video`, `plan` | does the video match the scene plan? |\n| `transcript` | `video\\|audio`, `expected_text?`, `language?` | transcribe media (video OR audio) with timecoded segments; if `expected_text` is given, also check the spoken words match |\n| `image` | `image_url`, `description` | does the image match the description? |\n| `describe` | `video`, `fps?`, `focus?` | timecoded scene-by-scene description (what happens, speech, sounds, music); returns a `job_id` — poll `get_status` |\n\nOmit `expected_text` on `transcript` for transcription-only mode (no compare, verdict `PASS`); the result carries `actual_transcript`, `duration_sec`, and timecoded `segments` either way.\n\n### files\n\nManage your storage.\n\n| Action | Key arguments | Does |\n|--------|---------------|------|\n| `list` | `prefix?`, `cursor?` | list your stored files |\n| `upload` | `filename` + `url` \\| `data` | bring a file into your storage — from a hosted URL, or a LOCAL file's raw bytes via `data` (data-URI / base64, ≤25 MB) |\n| `create_upload` | `filename`, `content_type` | get a token + `upload_url` to PUT a LARGE / original-quality local file directly (no 25 MB cap) |\n| `import_remote` | `download_url` (or `url`) | pull a remote file (e.g. a file an assistant/chat app handed you as a URL) into your storage in one call — no manual download/re-upload. Returns a Framehood `url` you can pass straight into `image`/`video`/`audio`/`qa`. |\n| `delete` | `filename` | remove a file |\n| `publish` | `filename` | make a file public, returns a `public_url` |\n| `unpublish` | `filename` | make a published file private again |\n| `download` | `filename` | get a usable URL to fetch the bytes |\n\n::: tip Bringing in a file from another app\nIf another app or assistant hands you a file only as a URL (a chat\nattachment, a generated image it made elsewhere), don't pass that URL\nstraight into `image`/`video`/`audio`/`qa` — import it first:\n\n1. `files(import_remote, download_url=\"https://…\")` → get back `{ url:\n   \"https://cdn.framehood.ai/…\", content_type, size }`.\n2. Use that returned `url` as `image_url` / `video_url` / etc. in any\n   generation or `qa` call.\n\nOnly `https://` sources are accepted, and only image/video/audio files —\nanything else (a PDF, an SVG, an `http://` link) is rejected with a clear\nerror. Files are capped at 200 MB. The URL you passed in is never stored or\nreturned — only the resulting Framehood file.\n:::\n\n### actor\n\n::: warning Temporarily disabled\nActors are being reworked — the `actor` tool is not currently advertised over\nMCP. The reference below is kept for when it returns.\n:::\n\nCreate and manage persistent actors. An actor carries visual identity, voice,\npersonality, wardrobe, and motion across image / video / audio generations —\npass its `actor_id` (`act_…`) to those tools to reuse the same actor. Training\na custom actor is a paid feature; the free tier can use built-in actors.\n\n| Action | Key arguments | Does |\n|--------|---------------|------|\n| `create` | `name`, `images_data_url` | train a new actor from a ZIP of reference photos; optional voice, personality, wardrobe, motion, brand |\n| `update` | `actor_id` | change voice, personality, wardrobe, motion, or brand |\n| `list` | `prefix?` | list your actors, optionally filtered by name prefix |\n| `get` | `actor_id` | fetch one actor's full profile |\n| `delete` | `actor_id` | remove an actor |\n| `batch` | `actor_id`, `prompts`, `kind?`, `duration?`, `tier?` | generate many images or videos for one actor in a single call |\n\n`images_data_url` is a public ZIP URL with 4–30 reference photos. Attach a\nvoice via `voice_id` (existing) or `voice_sample_url` (a clean 30–60s sample).\n\n### billing\n\n`billing(action)` — credits, plan, and upgrades for your organization. Read\nactions are open to any member; the rest are owner-only.\n\n| Action | Key arguments | Does |\n|--------|---------------|------|\n| `balance` | — | shared credit balance + your role |\n| `transactions` | `limit?` | recent credit ledger (newest first) |\n| `plan` | — | your tier, monthly allowance, manage link (owner) |\n| `plans` | — | available packages |\n| `subscribe` | `step` | owner: Stripe Checkout link for a package |\n| `preview` | `step` | owner: prorated cost of switching |\n| `change` | `step` | owner: switch package, prorated, charged now |\n| `cancel` | `reactivate?` | owner: cancel at period end, or resume |\n| `manage` | — | owner: Stripe Customer Portal link |\n| `topup` | `amount_eur` | buy a one-off batch of extra credits now (min €20, at the extra-usage rate); returns a hosted invoice link |\n| `extra_usage` | — | owner: view the auto-top-up (Extra usage) config |\n| `set_extra_usage` | `enabled?`, `trigger_below?`, `amount_eur?`, `extra_usage_cap_eur?` | owner: configure automatic overflow top-ups |\n| `request_upgrade` | `note?` | member: email the owner to top up |\n\nSee [Credits & plans](/guide/billing).\n\n### org\n\n`org(action)` — your organization. Members share a credit pool and actors.\n\n| Action | Key arguments | Does |\n|--------|---------------|------|\n| `info` | — | your org + your role |\n| `members` | — | list members (role + suspended state) |\n| `spend` | — | per-member credit spend |\n| `trend` | `days?` | daily org spend (7–90 days, default 30) |\n| `invite` | `email` | owner: invite by email |\n| `accept_invite` | `token` | join an org with an invite token |\n| `remove` | `email` | owner: remove a member |\n| `set_role` | `email`, `role` | owner: `member`↔`admin` |\n| `suspend` / `enable` | `email` | owner or admin: block / unblock a member |\n\nA suspended member loses org access and can't draw on the shared credit pool.\nPlan and payment live in the `billing` tool; out of credits, members use\n`billing(request_upgrade)`.\n\n### get_status\n\n`get_status(action)` — poll or manage jobs.\n\n| Action | Key arguments | Does |\n|--------|---------------|------|\n| `status` | `job_id` | poll one job → `queued` · `running` · `succeeded` (with `outputs`) · `failed` (with `error`) |\n| `status` (batch) | `job_ids` | check up to 50 jobs in ONE call → `{summary, jobs[]}` |\n| `list` | `kind?`, `status?`, `since?`, `limit?`, `cursor?` | the generation-history feed (`since` = RFC3339 or `\"1h\"` / `\"30m\"` shorthand) |\n| `cancel` | `job_id` | cancel a non-terminal job |\n\n`status` is the default action.\n\n**Submitting several jobs?** Fire all the submits in parallel, collect the\nreturned job ids, then poll ONCE with `job_ids=[…]` every ~30–60s until\n`summary.running + summary.queued` is 0 — don't poll jobs individually. Each\nbatch entry carries `status`, `done`, and (when terminal) `outputs`/`credits`\nor `error`; an id you don't own (or that doesn't exist) comes back as\n`status: \"not_found\"` without failing the batch.\n\n### models\n\n`models(action)` — browse the model catalog. Every tool already picks a sensible\ndefault model, so reach for this only when you want a specific one.\n\n| Action | Key arguments | Does |\n|--------|---------------|------|\n| `list` | `tool?` | reachable models grouped by tool (optionally filtered to `image` / `video` / `audio` / `qa`) |\n| `guide` | `model` | one model's parameters + prompting tips |\n\nTo use a specific model, pass `model` (and optional `params`) to the generation\ntool — e.g. `image(create, model=\"flux_pro\", params={…})`.\n\n### submit_feedback\n\n`submit_feedback` — send a note to the Framehood team: a bug, a feature request,\nor general feedback.\n\n| Argument | Does |\n|----------|------|\n| `message` | the feedback text |\n| `category?` | `bug` · `feature` · `other` |\n| `context?` | what you were doing / which tool or model |\n\n## Workspace tools (also over MCP)\n\nThese three are advertised over MCP alongside the generation and account tools\nand let you organize and authenticate your work.\n\n### library\n\n`library(action)` — search and manage your generated assets:\n\n- `list` (`query`, `type`, `project`, `limit`, `offset`) — search past generations\n  by prompt/name, filter by media type or project, paginate.\n- `trashed` — list the trash.\n- `trash` (`id`) / `restore` (`id`) — soft-delete or recover an asset. The trash is\n  auto-purged after 10 days. You can trash your own assets; org owners/admins any.\n\n### project\n\n`project(action)` — group generations into projects:\n\n- `list` — your personal projects + the org's shared ones.\n- `create` (`name`, `visibility` `personal`|`shared`, `description`).\n- `update` (`id`) — rename, change visibility / description, or owner.\n- `delete` (`id`) — owner only; the assets stay in the library.\n- `assign` (`asset_id`, `id`) — put an asset in a project; omit `id` to unassign.\n- `use` (`id`) — set your active/default project; new generations land there\n  automatically. Omit `id` to clear.\n- `current` — show your active project.\n\n### api_keys\n\n`api_keys(action)` — programmatic keys for the REST API and CLI:\n\n- `list` — your keys (metadata only).\n- `create` — mint a key; the secret is returned **once**.\n- `delete` (`key`) — revoke a key.\n\n## CLI / REST surface\n\nThe [CLI](/guide/cli) and the [REST API](/reference/api) drive the same tools\nlisted above — the CLI maps its commands onto them, and the REST API exposes the\nsame operations as plain HTTP endpoints. They additionally surface a few things\nthe MCP tools intentionally don't:\n\n- **Raw job submission** — `POST /v1/jobs/{kind}` to run a specific model kind\n  directly. (Over MCP you don't submit to a kind — pass `model` + `params` to\n  `image` / `video` / `audio` / `qa`, and browse the catalog with the `models`\n  tool.)\n- **Uploads** — `PUT /upload` and `POST /upload-from-url` to bring your own input\n  files into storage.\n- **Browser billing flows** — the Stripe card-entry and customer-portal endpoints\n  (`POST /billing/card`, `/billing/portal`, …), which finish in a browser.\n\nSee the [REST API reference](/reference/api) for the full endpoint list. Over\nMCP your client handles auth, so you rarely need to touch these directly.\n"}]