# Image models

Text-to-image generation, image editing, and upscaling. Parameter tables are each model’s **input schema**; our wrapper params (`out`, `mock`, `format`) are noted per model.

> Generations are charged in credits (see [Credits & plans](/guide/billing)). Every generation model also accepts `mock: true` for a free placeholder result.

### FLUX.1 Schnell `flux_schnell`

Turbo-mode (1-4 step) text-to-image generation from a 12B-parameter FLUX flow transformer — fast enough for prototyping, prompt iteration, and bulk draft runs.

**Call it via** — `image` tool, `action: "create"`, `tier: "draft"` (the default tier) · raw: `POST /v1/jobs/flux_schnell`

| | |
|---|---|
| **Cost** | 1 cr per call |
| **Mode / timeout** | sync / 30s (from our YAML) |

**Parameters** — the model's input schema:

| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
| `prompt` | string | ✓ | — | — | The prompt to generate an image from. |
| `image_size` | string \| object | | `landscape_4_3` | enum: `square_hd`, `square`, `portrait_4_3`, `portrait_16_9`, `landscape_4_3`, `landscape_16_9` — or `{width, height}` object (each 1–14142) | The size of the generated image. |
| `num_inference_steps` | integer | | `4` | 1–12 | The number of inference steps to perform. |
| `num_images` | integer | | `1` | 1–4 | The number of images to generate. |
| `guidance_scale` | number | | `3.5` | 1–20 | CFG scale — how closely the model sticks to the prompt. |
| `seed` | integer \| null | | `null` | — | Same seed + same prompt + same model version → same image. |
| `output_format` | string | | `jpeg` | enum: `jpeg`, `png` | The format of the generated image. |
| `enable_safety_checker` | boolean | | `true` | — | If true, the safety checker is enabled. |
| `acceleration` | string | | `none` | enum: `none`, `regular`, `high` | Generation speed — higher is faster. |
| `sync_mode` | boolean | | `false` | — | If true, media returns as a data URI and isn't stored in request history. |

Our wrapper params (not part of the model input schema): `out` (required — output filename/workdir-relative path), `mock` (optional — test placeholder), and `format` (optional — our size preset `shorts`/`reels`/`horizontal`, mapped to the model's `image_size` field: shorts/reels → `portrait_16_9`, horizontal → `landscape_16_9`, default → `portrait_16_9`).

**Limits** — billed at 1 cr per megapixel, rounded up to the nearest megapixel. Custom `image_size` max 14142 × 14142 px. Up to 4 images per call; 1–12 inference steps. (No prompt character limit, duration, frame count, or file-size limit is published for this model.)

### FLUX 1.1 [pro] ultra `flux_pro`

Text-to-image generation at up to 2K resolution (4 megapixels) with enhanced photorealism, with optional reference-image conditioning.

**Call it via** — `image` tool, `action: "create"`, `tier: "fine"` (MCP) · raw: `POST /v1/jobs/flux_pro`

| | |
|---|---|
| **Cost** | 12 cr per call |
| **Mode / timeout** | sync / 30s (from our YAML) |

**Parameters** — the model's input schema:

| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
| `prompt` | string | ✓ | — | — | The prompt to generate an image from. |
| `seed` | integer | | null | — | Same seed + same prompt + same model version → same image. |
| `sync_mode` | boolean | | `false` | — | If true, media is returned as a data URI and not stored in request history. |
| `num_images` | integer | | `1` | 1–4 | Number of images to generate. |
| `output_format` | string | | `jpeg` | `jpeg`, `png` | Format of the generated image. |
| `safety_tolerance` | string | | `"2"` | `"1"`–`"6"` | Content-filter level; 1 = most strict, 6 = most permissive. |
| `enhance_prompt` | boolean | | `false` | — | Whether to enhance the prompt for better results. |
| `image_url` | string | | null | — | Reference image URL to condition generation on. |
| `image_prompt_strength` | number | | `0.1` | 0–1 | Strength of the image prompt (reference-image influence). |
| `aspect_ratio` | string | | `9:16` | `21:9`, `16:9`, `4:3`, `3:2`, `1:1`, `2:3`, `3:4`, `9:16`, `9:21` (free-form string also accepted) | Aspect ratio of the generated image. |
| `raw` | boolean | | `false` | — | Generate less processed, more natural-looking images. |

Our wrapper params (not part of the model input schema): `out` (required — output filename/path), `mock` (optional — test placeholder), and `format` (optional — size preset mapped to the model's `aspect_ratio` field: `shorts`/`reels`→`9:16`, `horizontal`→`16:9`, default `9:16`).

**Limits** — model limits:
- Max resolution: 4 megapixels (up to 2048×2048). Billing rounds up to the nearest megapixel.
- Max images per call: 4 (`num_images`).
- `image_prompt_strength` range: 0–1.
- Output formats: JPEG, PNG.

### Flux 2 LoRA Realism `flux_realism`

Text-to-image photorealism — FLUX.2 with a realism LoRA tuned for natural lighting, skin texture, and documentary-style detail; ideal for character portraits, people, products, and lifestyle scenes.

**Call it via** — `image(action: "create", tier: "photo")` · raw: `POST /v1/jobs/flux_realism`

| | |
|---|---|
| **Cost** | Billed per megapixel — ≈4–5 cr per image at the ~1 MP presets |
| **Mode / timeout** | sync / 60s (from our YAML) |

**Parameters** — the model's input schema:

| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
| `prompt` | string | ✓ | — | — | The prompt to generate a realistic image with natural lighting and authentic details. |
| `image_size` | enum \| object | | `landscape_4_3` | `square_hd`, `square`, `portrait_4_3`, `portrait_16_9`, `landscape_4_3`, `landscape_16_9` — or an object `{width, height}` (each int, >0, max 14142) | The size of the generated image. |
| `guidance_scale` | number | | `2.5` | `0`–`20` | CFG scale. How closely the model follows the prompt. |
| `num_inference_steps` | integer | | `40` | `4`–`50` | Number of inference steps; higher enhances realism. |
| `acceleration` | enum | | `regular` | `none`, `regular` | Acceleration level; `regular` balances speed and quality. |
| `seed` | integer \| null | | none | — | Random seed for reproducibility; same seed + prompt → same result. |
| `sync_mode` | boolean | | `false` | — | If true, media is returned as a data URI and not saved in history. |
| `enable_safety_checker` | boolean | | `true` | — | Whether to enable the safety checker for the generated image. |
| `output_format` | enum | | `png` | `png`, `jpeg`, `webp` | The format of the output image. |
| `num_images` | integer | | `1` | `1`–`4` | Number of images to generate per call. |
| `lora_scale` | number | | `1` | `0`–`2` | Strength of the realism effect. |

Our wrapper params (not part of the model input schema): `out` (required — output filename), `mock` (optional — test placeholder), and `format` (optional — our friendly aspect preset, e.g. `shorts`/`reels`/`horizontal`, which we map to the model's `image_size` field via `format_mapping`: shorts/reels → `portrait_16_9`, horizontal → `landscape_16_9`).

**Limits** — max **4 images** per call (`num_images` 1–4); inference steps **4–50**; custom `image_size` object dimensions up to **14142 px** per side (max ~4 MP recommended); output formats **PNG / JPEG / WebP**; text prompt only (no image input).

### Nano Banana Pro `nano_banana`

Text-to-image on Google's Nano Banana Pro (Gemini 3 Pro Image): strong prompt adherence and best-in-class text rendering inside the image — posters, labels, UI mockups, and scenes that must follow the brief closely.

**Call it via** — `image` tool, `action: "create"`, `model: "nano_banana"` (explicit model — the tier presets map to the FLUX family) · raw: `POST /v1/jobs/nano_banana`

| | |
|---|---|
| **Cost** | 30 cr per call; 4K outputs charged at 2x |
| **Mode / timeout** | sync / 2m (from our YAML) |

**Parameters** — the model's input schema:

| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
| `prompt` | string | ✓ | — | — | What to generate. |
| `num_images` | integer | | `1` | 1–4 | Number of images to generate. |
| `seed` | integer | | — | any int | Seed for the RNG. |
| `aspect_ratio` | string (enum) | | `1:1` | `21:9`, `16:9`, `3:2`, `4:3`, `5:4`, `1:1`, `4:5`, `3:4`, `2:3`, `9:16` | Aspect ratio of the output. |
| `output_format` | string (enum) | | `png` | `jpeg`, `png`, `webp` | Format of the generated image. |
| `resolution` | string (enum) | | `1K` | `1K`, `2K`, `4K` | Output resolution (4K costs 2x). |

Our wrapper params (not part of the model input schema): `out` (required — workdir-relative output path), `mock` (optional — test placeholder), and `format` (optional — friendly size preset `shorts`/`reels`/`horizontal`, mapped to the model's `aspect_ratio` via `format_mapping`: shorts/reels → `9:16`, horizontal → `16:9`, default `1:1`).

**Limits** — text prompt only (no image input; for instruction-based editing use `nano_banana_edit`); all outputs carry SynthID watermarking.

### Nano Banana Pro Edit `nano_banana_edit`

Instruction-based image editing built on Google's Gemini 3 Pro Image (Nano Banana 2): modify, restyle, inpaint, or compose images via natural-language instructions with no masks.

**Call it via** — `image(edit)` MCP tool/action routes to our default editor (`seedream_v5_edit`); `nano_banana_edit` is a registered editor reachable directly · raw: `POST /v1/jobs/nano_banana_edit`

| | |
|---|---|
| **Cost** | 30 cr per call; 4K outputs charged at 2x; web search adds 3 cr |
| **Mode / timeout** | sync / 60s (from our YAML) |

**Parameters** — the model's input schema:

| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
| `prompt` | string | ✓ | — | length 3–50000 chars | The prompt / editing instruction. |
| `image_urls` | array[string] | ✓ | — | up to 14 images | URLs of the images to edit / compose. |
| `num_images` | integer | | `1` | 1–4 | Number of images to generate. |
| `seed` | integer | | — | any int (nullable) | Seed for the RNG. |
| `aspect_ratio` | string (enum) | | `auto` | `auto`, `21:9`, `16:9`, `3:2`, `4:3`, `5:4`, `1:1`, `4:5`, `3:4`, `2:3`, `9:16` | Aspect ratio of the output (`auto` preserves source proportions). |
| `output_format` | string (enum) | | `png` | `jpeg`, `png`, `webp` | Format of the generated image. |
| `safety_tolerance` | string (enum) | | `4` | `1`–`6` | Content-moderation tolerance (1 strictest, 6 least strict). |
| `sync_mode` | boolean | | `false` | — | If true, media is returned as a data URI and is not kept in request history. |
| `system_prompt` | string | | `""` | length ≤ 50000 chars | Optional system instruction steering persona/output style. |
| `resolution` | string (enum) | | `1K` | `1K`, `2K`, `4K` | Output resolution (4K costs 2x). |
| `limit_generations` | boolean | | `false` | — | Experimental: cap each prompting round to 1 image, ignoring count hints in the prompt. |
| `enable_web_search` | boolean | | `false` | — | Allow the model to use live web data (adds 3 cr). |

Our wrapper params (not part of the model input schema): `out` (required — workdir-relative output path), `mock` (optional — test placeholder), and `format` (optional — friendly size preset `shorts`/`reels`/`horizontal`, which our config maps to the model's `aspect_ratio` field via `format_field: aspect_ratio` → `shorts`/`reels`=`9:16`, `horizontal`=`16:9`; with no explicit `format` the default is `auto` — the edit preserves the source image's aspect ratio).

**Limits** — prompt 3–50000 chars; `system_prompt` ≤ 50000 chars; `num_images` 1–4; up to 14 input images per composition; character consistency for up to 5 people; resolutions 1K (1024px) / 2K (2048px) / 4K; input images capped at ~89,478,485 pixels (oversized inputs rejected with 422 `image_too_large`); output formats PNG / JPEG / WebP; all outputs carry SynthID watermarking.

### Seedream v4.5 Edit `seedream_v45_edit`

Edit and compose images at high resolution from natural-language instructions, referencing up to 10 source images in one unified generation/editing architecture.

**Call it via** — the `image` MCP tool with `action: "edit"` is the user-facing edit route, but note that action currently maps to `seedream_v5_edit`; this v4.5 variant is reached by calling the model directly. · raw: `POST /v1/jobs/seedream_v45_edit`

| | |
|---|---|
| **Cost** | 8 cr per call |
| **Mode / timeout** | sync / 60s (from our YAML) |

**Parameters** — the model's input schema:

| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
| `prompt` | string | ✓ | — | — | Text prompt used to edit the image. |
| `image_urls` | array&lt;string&gt; | ✓ | — | up to 10 URLs | Input images for editing. If more than 10 are sent, only the last 10 are used. |
| `image_size` | object `{width,height}` **or** enum string | | `{width: 2048, height: 2048}` | enum: `square_hd`, `square`, `portrait_4_3`, `portrait_16_9`, `landscape_4_3`, `landscape_16_9`, `auto_2K`, `auto_4K`; or object with width/height each 1920–4096 | Output size. Width and height must each be 1920–4096, and total pixels between 2560×1440 and 4096×4096. |
| `num_images` | integer | | `1` | 1–6 | Number of separate model generations to run with the prompt. |
| `max_images` | integer | | `1` | 1–6 | If &gt;1, enables multi-image output: up to `max_images` per generation, `num_images` generations total. Total images (inputs + outputs) must not exceed 15. |
| `seed` | integer (nullable) | | null | — | Random seed to control stochasticity. |
| `sync_mode` | boolean | | `false` | — | If true, media is returned as a data URI and is not stored in request history. |
| `enable_safety_checker` | boolean | | `true` | — | Enables the safety checker. |

Our wrapper params (not part of the model input schema): `out` (required — output filename / workdir-relative path), `mock` (optional — test placeholder), and `format` (optional — our preset that we map to the model's `image_size` field via `format_mapping`: `shorts`/`reels` → 1080×1920, `horizontal` → 1920×1080).

**Limits** — up to 10 input reference images (last 10 used if more provided); max total images (inputs + outputs) = 15; output resolution 1920–4096 px per axis, total pixels 2560×1440 to 4096×4096 (max 4 MP / 2048×2048 typical); output format PNG via URL or data URI; ~60s inference.

### Seedream v5 Lite Edit `seedream_v5_edit`

Fast, intelligent image editing from Seedream 5.0 Lite — modify existing images, add/remove elements, composite characters into scenes, and apply style/color transfer, with up to 10 reference images per call.

**Call it via** — `image(action: "edit", image_url, prompt)` (MCP tool `image`, action `edit`) · raw: `POST /v1/jobs/seedream_v5_edit`

| | |
|---|---|
| **Cost** | 7 cr per call |
| **Mode / timeout** | sync / 60s (from our YAML) |

**Parameters** — the model's input schema:

| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
| `prompt` | string | ✓ | — | — | Text prompt describing the edit to apply. |
| `image_urls` | string[] | ✓ | — | up to 10 images | URLs of input images to edit. If more than 10 are sent, only the last 10 are used. |
| `image_size` | ImageSize object \| enum string | — | `auto_2K` | enum: `square_hd`, `square`, `portrait_4_3`, `portrait_16_9`, `landscape_4_3`, `landscape_16_9`, `auto_2K`, `auto_3K`, `auto_4K`; or `{width, height}` (each 1–14142). Total pixels must be 2560×1440…4096×4096, else scaled. | Output image size, as a preset enum or explicit width/height. |
| `num_images` | integer | — | `1` | 1–6 | Number of separate generations to run with the prompt. |
| `max_images` | integer | — | `1` | 1–6 | If >1, enables multi-image generation: up to `max_images` images per generation, so total output is between `num_images` and `max_images×num_images`. |
| `sync_mode` | boolean | — | `false` | true / false | If true, media is returned as a data URI and output isn't stored in request history. |
| `enable_safety_checker` | boolean | — | `true` | true / false | If true, the content safety checker is enabled. |

Our wrapper params (not part of the model input schema): `out` (required — workdir-relative output filename), `mock` (optional — test placeholder, no real generation). Our `format` (optional — `shorts`/`reels`/`horizontal`) is a wrapper we map to the model's `image_size` field as an explicit `{width, height}` object (shorts/reels → 1080×1920, horizontal → 1920×1080).

**Limits** — model limits:
- Max reference images: **10** (last 10 used if more are sent).
- Max resolution: **3072×3072** (9 MP); total pixel count supported between 2560×1440 (≈3.7 MP) and 4096×4096 (≈9.43 MP, scaled to fit).
- Batch: **1–6** generations per call (`num_images`), up to **6** images each (`max_images`).
- Output format: **PNG** delivered via HTTPS URL (or data URI when `sync_mode=true`).

### Topaz Image Upscale `topaz_upscale_image`

Topaz image enhancer — upscale and enhance images (add detail, face enhancement, sharpening, denoising, compression-artifact removal, and generative detail).

**Call it via** — `image` tool, `action: "upscale"` (MCP) · raw: `POST /v1/jobs/topaz_upscale_image`

| | |
|---|---|
| **Cost** | 16 cr per call |
| **Mode / timeout** | sync / 120s (from our YAML) |

**Parameters** — the model's input schema:

| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
| `image_url` | string | ✓ | — | non-empty URL | URL of the image to be upscaled. |
| `model` | string (enum) | | `Standard V2` | `Low Resolution V2`, `Standard V2`, `CGI`, `High Fidelity V2`, `Text Refine`, `Recovery`, `Redefine`, `Recovery V2`, `Standard MAX`, `Wonder`, `Wonder 3` | Model to use for image enhancement. |
| `upscale_factor` | number | | `2` | `1`–`4` | Factor to upscale the image by (2.0 doubles width and height). |
| `crop_to_fill` | boolean | | `false` | true / false | Crop the output to fill the target size. |
| `output_format` | string (enum) | | `jpeg` | `jpeg`, `png` | Output format of the upscaled image. |
| `subject_detection` | string (enum) | | `All` | `All`, `Foreground`, `Background` | Subject detection mode. Applies to standard enhance and Recovery V2 models. |
| `face_enhancement` | boolean | | `true` | true / false | Apply face enhancement. Applies to standard enhance and Recovery V2 models. |
| `face_enhancement_creativity` | number | | `0` | `0`–`1` | Creativity for face enhancement; 0 = none, 1 = max. Ignored if face enhancement is disabled. |
| `face_enhancement_strength` | number | | `0.8` | `0`–`1` | Strength of face enhancement; 0 = none, 1 = max. Ignored if face enhancement is disabled. |
| `sharpen` | number | | — | `0`–`1` | Sharpening level. Applies to Standard V2, Low Resolution V2, CGI, High Fidelity V2, Text Refine, Redefine. |
| `denoise` | number | | — | `0`–`1` | Denoising level. Applies to Standard V2, Low Resolution V2, CGI, High Fidelity V2, Text Refine, Redefine. |
| `fix_compression` | number | | — | `0`–`1` | Compression-artifact removal. Applies to Standard V2, Low Resolution V2, High Fidelity V2, Text Refine. |
| `strength` | number | | — | `0.01`–`1` | Enhancement strength. Applies to Text Refine model only. |
| `creativity` | integer | | — | `1`–`6` | Generative creativity (higher = more hallucinated detail). Applies to Redefine model only. |
| `texture` | integer | | — | `1`–`5` | Texture detail level for generative upscaling. Applies to Redefine model only. |
| `prompt` | string | | — | max 1024 chars | Text prompt to guide generative upscaling. Applies to Redefine model only. |
| `autoprompt` | boolean | | — | true / false | Auto-generate the prompt for generative upscaling. Applies to Redefine model only. |
| `detail` | number | | — | `0`–`1` | Detail recovery level. Applies to Recovery V2 model only. |
| `enhancement_strength` | string (enum) | | — | `low`, `medium`, `high` | Enhancement strength for generative upscaling. Applies to Wonder 3 model only; auto-configured when omitted. |

Our wrapper params (not part of the model input schema): `out` (required — workdir-relative output filename), `mock` (optional — test placeholder). This model has no `format` mapping (`format_field` is empty), so no model size field is derived from `format`.

**Limits** — model limits: `upscale_factor` `1`–`4`; `prompt` ≤ 1024 chars; accepted input formats jpg, jpeg, png, webp, gif, avif. Catalog cost is a flat 16 cr per call (covers outputs up to ~24 MP).
