Image models
Text-to-image generation, image editing, and upscaling. Parameter tables are each model’s input schema; our wrapper params (out, mock, format) are noted per model.
Generations are charged in credits (see Credits & plans). Every generation model also accepts
mock: truefor a free placeholder result.
FLUX.1 Schnell flux_schnell
Turbo-mode (1-4 step) text-to-image generation from a 12B-parameter FLUX flow transformer — fast enough for prototyping, prompt iteration, and bulk draft runs.
Call it via — image tool, action: "create", tier: "draft" (the default tier) · raw: POST /v1/jobs/flux_schnell
| Cost | 1 cr per call |
| Mode / timeout | sync / 30s (from our YAML) |
Parameters — the model's input schema:
| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
prompt | string | ✓ | — | — | The prompt to generate an image from. |
image_size | string | object | landscape_4_3 | enum: square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9 — or {width, height} object (each 1–14142) | The size of the generated image. | |
num_inference_steps | integer | 4 | 1–12 | The number of inference steps to perform. | |
num_images | integer | 1 | 1–4 | The number of images to generate. | |
guidance_scale | number | 3.5 | 1–20 | CFG scale — how closely the model sticks to the prompt. | |
seed | integer | null | null | — | Same seed + same prompt + same model version → same image. | |
output_format | string | jpeg | enum: jpeg, png | The format of the generated image. | |
enable_safety_checker | boolean | true | — | If true, the safety checker is enabled. | |
acceleration | string | none | enum: none, regular, high | Generation speed — higher is faster. | |
sync_mode | boolean | false | — | If true, media returns as a data URI and isn't stored in request history. |
Our wrapper params (not part of the model input schema): out (required — output filename/workdir-relative path), mock (optional — test placeholder), and format (optional — our size preset shorts/reels/horizontal, mapped to the model's image_size field: shorts/reels → portrait_16_9, horizontal → landscape_16_9, default → portrait_16_9).
Limits — billed at 1 cr per megapixel, rounded up to the nearest megapixel. Custom image_size max 14142 × 14142 px. Up to 4 images per call; 1–12 inference steps. (No prompt character limit, duration, frame count, or file-size limit is published for this model.)
FLUX 1.1 [pro] ultra flux_pro
Text-to-image generation at up to 2K resolution (4 megapixels) with enhanced photorealism, with optional reference-image conditioning.
Call it via — image tool, action: "create", tier: "fine" (MCP) · raw: POST /v1/jobs/flux_pro
| Cost | 12 cr per call |
| Mode / timeout | sync / 30s (from our YAML) |
Parameters — the model's input schema:
| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
prompt | string | ✓ | — | — | The prompt to generate an image from. |
seed | integer | null | — | Same seed + same prompt + same model version → same image. | |
sync_mode | boolean | false | — | If true, media is returned as a data URI and not stored in request history. | |
num_images | integer | 1 | 1–4 | Number of images to generate. | |
output_format | string | jpeg | jpeg, png | Format of the generated image. | |
safety_tolerance | string | "2" | "1"–"6" | Content-filter level; 1 = most strict, 6 = most permissive. | |
enhance_prompt | boolean | false | — | Whether to enhance the prompt for better results. | |
image_url | string | null | — | Reference image URL to condition generation on. | |
image_prompt_strength | number | 0.1 | 0–1 | Strength of the image prompt (reference-image influence). | |
aspect_ratio | string | 9:16 | 21:9, 16:9, 4:3, 3:2, 1:1, 2:3, 3:4, 9:16, 9:21 (free-form string also accepted) | Aspect ratio of the generated image. | |
raw | boolean | false | — | Generate less processed, more natural-looking images. |
Our wrapper params (not part of the model input schema): out (required — output filename/path), mock (optional — test placeholder), and format (optional — size preset mapped to the model's aspect_ratio field: shorts/reels→9:16, horizontal→16:9, default 9:16).
Limits — model limits:
- Max resolution: 4 megapixels (up to 2048×2048). Billing rounds up to the nearest megapixel.
- Max images per call: 4 (
num_images). image_prompt_strengthrange: 0–1.- Output formats: JPEG, PNG.
Flux 2 LoRA Realism flux_realism
Text-to-image photorealism — FLUX.2 with a realism LoRA tuned for natural lighting, skin texture, and documentary-style detail; ideal for character portraits, people, products, and lifestyle scenes.
Call it via — image(action: "create", tier: "photo") · raw: POST /v1/jobs/flux_realism
| Cost | Billed per megapixel — ≈4–5 cr per image at the ~1 MP presets |
| Mode / timeout | sync / 60s (from our YAML) |
Parameters — the model's input schema:
| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
prompt | string | ✓ | — | — | The prompt to generate a realistic image with natural lighting and authentic details. |
image_size | enum | object | landscape_4_3 | square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9 — or an object {width, height} (each int, >0, max 14142) | The size of the generated image. | |
guidance_scale | number | 2.5 | 0–20 | CFG scale. How closely the model follows the prompt. | |
num_inference_steps | integer | 40 | 4–50 | Number of inference steps; higher enhances realism. | |
acceleration | enum | regular | none, regular | Acceleration level; regular balances speed and quality. | |
seed | integer | null | none | — | Random seed for reproducibility; same seed + prompt → same result. | |
sync_mode | boolean | false | — | If true, media is returned as a data URI and not saved in history. | |
enable_safety_checker | boolean | true | — | Whether to enable the safety checker for the generated image. | |
output_format | enum | png | png, jpeg, webp | The format of the output image. | |
num_images | integer | 1 | 1–4 | Number of images to generate per call. | |
lora_scale | number | 1 | 0–2 | Strength of the realism effect. |
Our wrapper params (not part of the model input schema): out (required — output filename), mock (optional — test placeholder), and format (optional — our friendly aspect preset, e.g. shorts/reels/horizontal, which we map to the model's image_size field via format_mapping: shorts/reels → portrait_16_9, horizontal → landscape_16_9).
Limits — max 4 images per call (num_images 1–4); inference steps 4–50; custom image_size object dimensions up to 14142 px per side (max ~4 MP recommended); output formats PNG / JPEG / WebP; text prompt only (no image input).
Nano Banana Pro nano_banana
Text-to-image on Google's Nano Banana Pro (Gemini 3 Pro Image): strong prompt adherence and best-in-class text rendering inside the image — posters, labels, UI mockups, and scenes that must follow the brief closely.
Call it via — image tool, action: "create", model: "nano_banana" (explicit model — the tier presets map to the FLUX family) · raw: POST /v1/jobs/nano_banana
| Cost | 30 cr per call; 4K outputs charged at 2x |
| Mode / timeout | sync / 2m (from our YAML) |
Parameters — the model's input schema:
| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
prompt | string | ✓ | — | — | What to generate. |
num_images | integer | 1 | 1–4 | Number of images to generate. | |
seed | integer | — | any int | Seed for the RNG. | |
aspect_ratio | string (enum) | 1:1 | 21:9, 16:9, 3:2, 4:3, 5:4, 1:1, 4:5, 3:4, 2:3, 9:16 | Aspect ratio of the output. | |
output_format | string (enum) | png | jpeg, png, webp | Format of the generated image. | |
resolution | string (enum) | 1K | 1K, 2K, 4K | Output resolution (4K costs 2x). |
Our wrapper params (not part of the model input schema): out (required — workdir-relative output path), mock (optional — test placeholder), and format (optional — friendly size preset shorts/reels/horizontal, mapped to the model's aspect_ratio via format_mapping: shorts/reels → 9:16, horizontal → 16:9, default 1:1).
Limits — text prompt only (no image input; for instruction-based editing use nano_banana_edit); all outputs carry SynthID watermarking.
Nano Banana Pro Edit nano_banana_edit
Instruction-based image editing built on Google's Gemini 3 Pro Image (Nano Banana 2): modify, restyle, inpaint, or compose images via natural-language instructions with no masks.
Call it via — image(edit) MCP tool/action routes to our default editor (seedream_v5_edit); nano_banana_edit is a registered editor reachable directly · raw: POST /v1/jobs/nano_banana_edit
| Cost | 30 cr per call; 4K outputs charged at 2x; web search adds 3 cr |
| Mode / timeout | sync / 60s (from our YAML) |
Parameters — the model's input schema:
| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
prompt | string | ✓ | — | length 3–50000 chars | The prompt / editing instruction. |
image_urls | array[string] | ✓ | — | up to 14 images | URLs of the images to edit / compose. |
num_images | integer | 1 | 1–4 | Number of images to generate. | |
seed | integer | — | any int (nullable) | Seed for the RNG. | |
aspect_ratio | string (enum) | auto | auto, 21:9, 16:9, 3:2, 4:3, 5:4, 1:1, 4:5, 3:4, 2:3, 9:16 | Aspect ratio of the output (auto preserves source proportions). | |
output_format | string (enum) | png | jpeg, png, webp | Format of the generated image. | |
safety_tolerance | string (enum) | 4 | 1–6 | Content-moderation tolerance (1 strictest, 6 least strict). | |
sync_mode | boolean | false | — | If true, media is returned as a data URI and is not kept in request history. | |
system_prompt | string | "" | length ≤ 50000 chars | Optional system instruction steering persona/output style. | |
resolution | string (enum) | 1K | 1K, 2K, 4K | Output resolution (4K costs 2x). | |
limit_generations | boolean | false | — | Experimental: cap each prompting round to 1 image, ignoring count hints in the prompt. | |
enable_web_search | boolean | false | — | Allow the model to use live web data (adds 3 cr). |
Our wrapper params (not part of the model input schema): out (required — workdir-relative output path), mock (optional — test placeholder), and format (optional — friendly size preset shorts/reels/horizontal, which our config maps to the model's aspect_ratio field via format_field: aspect_ratio → shorts/reels=9:16, horizontal=16:9; with no explicit format the default is auto — the edit preserves the source image's aspect ratio).
Limits — prompt 3–50000 chars; system_prompt ≤ 50000 chars; num_images 1–4; up to 14 input images per composition; character consistency for up to 5 people; resolutions 1K (1024px) / 2K (2048px) / 4K; input images capped at ~89,478,485 pixels (oversized inputs rejected with 422 image_too_large); output formats PNG / JPEG / WebP; all outputs carry SynthID watermarking.
Seedream v4.5 Edit seedream_v45_edit
Edit and compose images at high resolution from natural-language instructions, referencing up to 10 source images in one unified generation/editing architecture.
Call it via — the image MCP tool with action: "edit" is the user-facing edit route, but note that action currently maps to seedream_v5_edit; this v4.5 variant is reached by calling the model directly. · raw: POST /v1/jobs/seedream_v45_edit
| Cost | 8 cr per call |
| Mode / timeout | sync / 60s (from our YAML) |
Parameters — the model's input schema:
| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
prompt | string | ✓ | — | — | Text prompt used to edit the image. |
image_urls | array<string> | ✓ | — | up to 10 URLs | Input images for editing. If more than 10 are sent, only the last 10 are used. |
image_size | object {width,height} or enum string | {width: 2048, height: 2048} | enum: square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9, auto_2K, auto_4K; or object with width/height each 1920–4096 | Output size. Width and height must each be 1920–4096, and total pixels between 2560×1440 and 4096×4096. | |
num_images | integer | 1 | 1–6 | Number of separate model generations to run with the prompt. | |
max_images | integer | 1 | 1–6 | If >1, enables multi-image output: up to max_images per generation, num_images generations total. Total images (inputs + outputs) must not exceed 15. | |
seed | integer (nullable) | null | — | Random seed to control stochasticity. | |
sync_mode | boolean | false | — | If true, media is returned as a data URI and is not stored in request history. | |
enable_safety_checker | boolean | true | — | Enables the safety checker. |
Our wrapper params (not part of the model input schema): out (required — output filename / workdir-relative path), mock (optional — test placeholder), and format (optional — our preset that we map to the model's image_size field via format_mapping: shorts/reels → 1080×1920, horizontal → 1920×1080).
Limits — up to 10 input reference images (last 10 used if more provided); max total images (inputs + outputs) = 15; output resolution 1920–4096 px per axis, total pixels 2560×1440 to 4096×4096 (max 4 MP / 2048×2048 typical); output format PNG via URL or data URI; ~60s inference.
Seedream v5 Lite Edit seedream_v5_edit
Fast, intelligent image editing from Seedream 5.0 Lite — modify existing images, add/remove elements, composite characters into scenes, and apply style/color transfer, with up to 10 reference images per call.
Call it via — image(action: "edit", image_url, prompt) (MCP tool image, action edit) · raw: POST /v1/jobs/seedream_v5_edit
| Cost | 7 cr per call |
| Mode / timeout | sync / 60s (from our YAML) |
Parameters — the model's input schema:
| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
prompt | string | ✓ | — | — | Text prompt describing the edit to apply. |
image_urls | string[] | ✓ | — | up to 10 images | URLs of input images to edit. If more than 10 are sent, only the last 10 are used. |
image_size | ImageSize object | enum string | — | auto_2K | enum: square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9, auto_2K, auto_3K, auto_4K; or {width, height} (each 1–14142). Total pixels must be 2560×1440…4096×4096, else scaled. | Output image size, as a preset enum or explicit width/height. |
num_images | integer | — | 1 | 1–6 | Number of separate generations to run with the prompt. |
max_images | integer | — | 1 | 1–6 | If >1, enables multi-image generation: up to max_images images per generation, so total output is between num_images and max_images×num_images. |
sync_mode | boolean | — | false | true / false | If true, media is returned as a data URI and output isn't stored in request history. |
enable_safety_checker | boolean | — | true | true / false | If true, the content safety checker is enabled. |
Our wrapper params (not part of the model input schema): out (required — workdir-relative output filename), mock (optional — test placeholder, no real generation). Our format (optional — shorts/reels/horizontal) is a wrapper we map to the model's image_size field as an explicit {width, height} object (shorts/reels → 1080×1920, horizontal → 1920×1080).
Limits — model limits:
- Max reference images: 10 (last 10 used if more are sent).
- Max resolution: 3072×3072 (9 MP); total pixel count supported between 2560×1440 (≈3.7 MP) and 4096×4096 (≈9.43 MP, scaled to fit).
- Batch: 1–6 generations per call (
num_images), up to 6 images each (max_images). - Output format: PNG delivered via HTTPS URL (or data URI when
sync_mode=true).
Topaz Image Upscale topaz_upscale_image
Topaz image enhancer — upscale and enhance images (add detail, face enhancement, sharpening, denoising, compression-artifact removal, and generative detail).
Call it via — image tool, action: "upscale" (MCP) · raw: POST /v1/jobs/topaz_upscale_image
| Cost | 16 cr per call |
| Mode / timeout | sync / 120s (from our YAML) |
Parameters — the model's input schema:
| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
image_url | string | ✓ | — | non-empty URL | URL of the image to be upscaled. |
model | string (enum) | Standard V2 | Low Resolution V2, Standard V2, CGI, High Fidelity V2, Text Refine, Recovery, Redefine, Recovery V2, Standard MAX, Wonder, Wonder 3 | Model to use for image enhancement. | |
upscale_factor | number | 2 | 1–4 | Factor to upscale the image by (2.0 doubles width and height). | |
crop_to_fill | boolean | false | true / false | Crop the output to fill the target size. | |
output_format | string (enum) | jpeg | jpeg, png | Output format of the upscaled image. | |
subject_detection | string (enum) | All | All, Foreground, Background | Subject detection mode. Applies to standard enhance and Recovery V2 models. | |
face_enhancement | boolean | true | true / false | Apply face enhancement. Applies to standard enhance and Recovery V2 models. | |
face_enhancement_creativity | number | 0 | 0–1 | Creativity for face enhancement; 0 = none, 1 = max. Ignored if face enhancement is disabled. | |
face_enhancement_strength | number | 0.8 | 0–1 | Strength of face enhancement; 0 = none, 1 = max. Ignored if face enhancement is disabled. | |
sharpen | number | — | 0–1 | Sharpening level. Applies to Standard V2, Low Resolution V2, CGI, High Fidelity V2, Text Refine, Redefine. | |
denoise | number | — | 0–1 | Denoising level. Applies to Standard V2, Low Resolution V2, CGI, High Fidelity V2, Text Refine, Redefine. | |
fix_compression | number | — | 0–1 | Compression-artifact removal. Applies to Standard V2, Low Resolution V2, High Fidelity V2, Text Refine. | |
strength | number | — | 0.01–1 | Enhancement strength. Applies to Text Refine model only. | |
creativity | integer | — | 1–6 | Generative creativity (higher = more hallucinated detail). Applies to Redefine model only. | |
texture | integer | — | 1–5 | Texture detail level for generative upscaling. Applies to Redefine model only. | |
prompt | string | — | max 1024 chars | Text prompt to guide generative upscaling. Applies to Redefine model only. | |
autoprompt | boolean | — | true / false | Auto-generate the prompt for generative upscaling. Applies to Redefine model only. | |
detail | number | — | 0–1 | Detail recovery level. Applies to Recovery V2 model only. | |
enhancement_strength | string (enum) | — | low, medium, high | Enhancement strength for generative upscaling. Applies to Wonder 3 model only; auto-configured when omitted. |
Our wrapper params (not part of the model input schema): out (required — workdir-relative output filename), mock (optional — test placeholder). This model has no format mapping (format_field is empty), so no model size field is derived from format.
Limits — model limits: upscale_factor 1–4; prompt ≤ 1024 chars; accepted input formats jpg, jpeg, png, webp, gif, avif. Catalog cost is a flat 16 cr per call (covers outputs up to ~24 MP).