Skip to content

Image models

Text-to-image generation, image editing, and upscaling. Parameter tables are each model’s input schema; our wrapper params (out, mock, format) are noted per model.

Generations are charged in credits (see Credits & plans). Every generation model also accepts mock: true for a free placeholder result.

FLUX.1 Schnell flux_schnell

Turbo-mode (1-4 step) text-to-image generation from a 12B-parameter FLUX flow transformer — fast enough for prototyping, prompt iteration, and bulk draft runs.

Call it viaimage tool, action: "create", tier: "draft" (the default tier) · raw: POST /v1/jobs/flux_schnell

Cost1 cr per call
Mode / timeoutsync / 30s (from our YAML)

Parameters — the model's input schema:

ParamTypeRequiredDefaultAllowed / rangeDescription
promptstringThe prompt to generate an image from.
image_sizestring | objectlandscape_4_3enum: square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9 — or {width, height} object (each 1–14142)The size of the generated image.
num_inference_stepsinteger41–12The number of inference steps to perform.
num_imagesinteger11–4The number of images to generate.
guidance_scalenumber3.51–20CFG scale — how closely the model sticks to the prompt.
seedinteger | nullnullSame seed + same prompt + same model version → same image.
output_formatstringjpegenum: jpeg, pngThe format of the generated image.
enable_safety_checkerbooleantrueIf true, the safety checker is enabled.
accelerationstringnoneenum: none, regular, highGeneration speed — higher is faster.
sync_modebooleanfalseIf true, media returns as a data URI and isn't stored in request history.

Our wrapper params (not part of the model input schema): out (required — output filename/workdir-relative path), mock (optional — test placeholder), and format (optional — our size preset shorts/reels/horizontal, mapped to the model's image_size field: shorts/reels → portrait_16_9, horizontal → landscape_16_9, default → portrait_16_9).

Limits — billed at 1 cr per megapixel, rounded up to the nearest megapixel. Custom image_size max 14142 × 14142 px. Up to 4 images per call; 1–12 inference steps. (No prompt character limit, duration, frame count, or file-size limit is published for this model.)

FLUX 1.1 [pro] ultra flux_pro

Text-to-image generation at up to 2K resolution (4 megapixels) with enhanced photorealism, with optional reference-image conditioning.

Call it viaimage tool, action: "create", tier: "fine" (MCP) · raw: POST /v1/jobs/flux_pro

Cost12 cr per call
Mode / timeoutsync / 30s (from our YAML)

Parameters — the model's input schema:

ParamTypeRequiredDefaultAllowed / rangeDescription
promptstringThe prompt to generate an image from.
seedintegernullSame seed + same prompt + same model version → same image.
sync_modebooleanfalseIf true, media is returned as a data URI and not stored in request history.
num_imagesinteger11–4Number of images to generate.
output_formatstringjpegjpeg, pngFormat of the generated image.
safety_tolerancestring"2""1""6"Content-filter level; 1 = most strict, 6 = most permissive.
enhance_promptbooleanfalseWhether to enhance the prompt for better results.
image_urlstringnullReference image URL to condition generation on.
image_prompt_strengthnumber0.10–1Strength of the image prompt (reference-image influence).
aspect_ratiostring9:1621:9, 16:9, 4:3, 3:2, 1:1, 2:3, 3:4, 9:16, 9:21 (free-form string also accepted)Aspect ratio of the generated image.
rawbooleanfalseGenerate less processed, more natural-looking images.

Our wrapper params (not part of the model input schema): out (required — output filename/path), mock (optional — test placeholder), and format (optional — size preset mapped to the model's aspect_ratio field: shorts/reels9:16, horizontal16:9, default 9:16).

Limits — model limits:

  • Max resolution: 4 megapixels (up to 2048×2048). Billing rounds up to the nearest megapixel.
  • Max images per call: 4 (num_images).
  • image_prompt_strength range: 0–1.
  • Output formats: JPEG, PNG.

Flux 2 LoRA Realism flux_realism

Text-to-image photorealism — FLUX.2 with a realism LoRA tuned for natural lighting, skin texture, and documentary-style detail; ideal for character portraits, people, products, and lifestyle scenes.

Call it viaimage(action: "create", tier: "photo") · raw: POST /v1/jobs/flux_realism

CostBilled per megapixel — ≈4–5 cr per image at the ~1 MP presets
Mode / timeoutsync / 60s (from our YAML)

Parameters — the model's input schema:

ParamTypeRequiredDefaultAllowed / rangeDescription
promptstringThe prompt to generate a realistic image with natural lighting and authentic details.
image_sizeenum | objectlandscape_4_3square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9 — or an object {width, height} (each int, >0, max 14142)The size of the generated image.
guidance_scalenumber2.5020CFG scale. How closely the model follows the prompt.
num_inference_stepsinteger40450Number of inference steps; higher enhances realism.
accelerationenumregularnone, regularAcceleration level; regular balances speed and quality.
seedinteger | nullnoneRandom seed for reproducibility; same seed + prompt → same result.
sync_modebooleanfalseIf true, media is returned as a data URI and not saved in history.
enable_safety_checkerbooleantrueWhether to enable the safety checker for the generated image.
output_formatenumpngpng, jpeg, webpThe format of the output image.
num_imagesinteger114Number of images to generate per call.
lora_scalenumber102Strength of the realism effect.

Our wrapper params (not part of the model input schema): out (required — output filename), mock (optional — test placeholder), and format (optional — our friendly aspect preset, e.g. shorts/reels/horizontal, which we map to the model's image_size field via format_mapping: shorts/reels → portrait_16_9, horizontal → landscape_16_9).

Limits — max 4 images per call (num_images 1–4); inference steps 4–50; custom image_size object dimensions up to 14142 px per side (max ~4 MP recommended); output formats PNG / JPEG / WebP; text prompt only (no image input).

Nano Banana Pro nano_banana

Text-to-image on Google's Nano Banana Pro (Gemini 3 Pro Image): strong prompt adherence and best-in-class text rendering inside the image — posters, labels, UI mockups, and scenes that must follow the brief closely.

Call it viaimage tool, action: "create", model: "nano_banana" (explicit model — the tier presets map to the FLUX family) · raw: POST /v1/jobs/nano_banana

Cost30 cr per call; 4K outputs charged at 2x
Mode / timeoutsync / 2m (from our YAML)

Parameters — the model's input schema:

ParamTypeRequiredDefaultAllowed / rangeDescription
promptstringWhat to generate.
num_imagesinteger11–4Number of images to generate.
seedintegerany intSeed for the RNG.
aspect_ratiostring (enum)1:121:9, 16:9, 3:2, 4:3, 5:4, 1:1, 4:5, 3:4, 2:3, 9:16Aspect ratio of the output.
output_formatstring (enum)pngjpeg, png, webpFormat of the generated image.
resolutionstring (enum)1K1K, 2K, 4KOutput resolution (4K costs 2x).

Our wrapper params (not part of the model input schema): out (required — workdir-relative output path), mock (optional — test placeholder), and format (optional — friendly size preset shorts/reels/horizontal, mapped to the model's aspect_ratio via format_mapping: shorts/reels → 9:16, horizontal → 16:9, default 1:1).

Limits — text prompt only (no image input; for instruction-based editing use nano_banana_edit); all outputs carry SynthID watermarking.

Nano Banana Pro Edit nano_banana_edit

Instruction-based image editing built on Google's Gemini 3 Pro Image (Nano Banana 2): modify, restyle, inpaint, or compose images via natural-language instructions with no masks.

Call it viaimage(edit) MCP tool/action routes to our default editor (seedream_v5_edit); nano_banana_edit is a registered editor reachable directly · raw: POST /v1/jobs/nano_banana_edit

Cost30 cr per call; 4K outputs charged at 2x; web search adds 3 cr
Mode / timeoutsync / 60s (from our YAML)

Parameters — the model's input schema:

ParamTypeRequiredDefaultAllowed / rangeDescription
promptstringlength 3–50000 charsThe prompt / editing instruction.
image_urlsarray[string]up to 14 imagesURLs of the images to edit / compose.
num_imagesinteger11–4Number of images to generate.
seedintegerany int (nullable)Seed for the RNG.
aspect_ratiostring (enum)autoauto, 21:9, 16:9, 3:2, 4:3, 5:4, 1:1, 4:5, 3:4, 2:3, 9:16Aspect ratio of the output (auto preserves source proportions).
output_formatstring (enum)pngjpeg, png, webpFormat of the generated image.
safety_tolerancestring (enum)416Content-moderation tolerance (1 strictest, 6 least strict).
sync_modebooleanfalseIf true, media is returned as a data URI and is not kept in request history.
system_promptstring""length ≤ 50000 charsOptional system instruction steering persona/output style.
resolutionstring (enum)1K1K, 2K, 4KOutput resolution (4K costs 2x).
limit_generationsbooleanfalseExperimental: cap each prompting round to 1 image, ignoring count hints in the prompt.
enable_web_searchbooleanfalseAllow the model to use live web data (adds 3 cr).

Our wrapper params (not part of the model input schema): out (required — workdir-relative output path), mock (optional — test placeholder), and format (optional — friendly size preset shorts/reels/horizontal, which our config maps to the model's aspect_ratio field via format_field: aspect_ratioshorts/reels=9:16, horizontal=16:9; with no explicit format the default is auto — the edit preserves the source image's aspect ratio).

Limits — prompt 3–50000 chars; system_prompt ≤ 50000 chars; num_images 1–4; up to 14 input images per composition; character consistency for up to 5 people; resolutions 1K (1024px) / 2K (2048px) / 4K; input images capped at ~89,478,485 pixels (oversized inputs rejected with 422 image_too_large); output formats PNG / JPEG / WebP; all outputs carry SynthID watermarking.

Seedream v4.5 Edit seedream_v45_edit

Edit and compose images at high resolution from natural-language instructions, referencing up to 10 source images in one unified generation/editing architecture.

Call it via — the image MCP tool with action: "edit" is the user-facing edit route, but note that action currently maps to seedream_v5_edit; this v4.5 variant is reached by calling the model directly. · raw: POST /v1/jobs/seedream_v45_edit

Cost8 cr per call
Mode / timeoutsync / 60s (from our YAML)

Parameters — the model's input schema:

ParamTypeRequiredDefaultAllowed / rangeDescription
promptstringText prompt used to edit the image.
image_urlsarray<string>up to 10 URLsInput images for editing. If more than 10 are sent, only the last 10 are used.
image_sizeobject {width,height} or enum string{width: 2048, height: 2048}enum: square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9, auto_2K, auto_4K; or object with width/height each 1920–4096Output size. Width and height must each be 1920–4096, and total pixels between 2560×1440 and 4096×4096.
num_imagesinteger11–6Number of separate model generations to run with the prompt.
max_imagesinteger11–6If >1, enables multi-image output: up to max_images per generation, num_images generations total. Total images (inputs + outputs) must not exceed 15.
seedinteger (nullable)nullRandom seed to control stochasticity.
sync_modebooleanfalseIf true, media is returned as a data URI and is not stored in request history.
enable_safety_checkerbooleantrueEnables the safety checker.

Our wrapper params (not part of the model input schema): out (required — output filename / workdir-relative path), mock (optional — test placeholder), and format (optional — our preset that we map to the model's image_size field via format_mapping: shorts/reels → 1080×1920, horizontal → 1920×1080).

Limits — up to 10 input reference images (last 10 used if more provided); max total images (inputs + outputs) = 15; output resolution 1920–4096 px per axis, total pixels 2560×1440 to 4096×4096 (max 4 MP / 2048×2048 typical); output format PNG via URL or data URI; ~60s inference.

Seedream v5 Lite Edit seedream_v5_edit

Fast, intelligent image editing from Seedream 5.0 Lite — modify existing images, add/remove elements, composite characters into scenes, and apply style/color transfer, with up to 10 reference images per call.

Call it viaimage(action: "edit", image_url, prompt) (MCP tool image, action edit) · raw: POST /v1/jobs/seedream_v5_edit

Cost7 cr per call
Mode / timeoutsync / 60s (from our YAML)

Parameters — the model's input schema:

ParamTypeRequiredDefaultAllowed / rangeDescription
promptstringText prompt describing the edit to apply.
image_urlsstring[]up to 10 imagesURLs of input images to edit. If more than 10 are sent, only the last 10 are used.
image_sizeImageSize object | enum stringauto_2Kenum: square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9, auto_2K, auto_3K, auto_4K; or {width, height} (each 1–14142). Total pixels must be 2560×1440…4096×4096, else scaled.Output image size, as a preset enum or explicit width/height.
num_imagesinteger11–6Number of separate generations to run with the prompt.
max_imagesinteger11–6If >1, enables multi-image generation: up to max_images images per generation, so total output is between num_images and max_images×num_images.
sync_modebooleanfalsetrue / falseIf true, media is returned as a data URI and output isn't stored in request history.
enable_safety_checkerbooleantruetrue / falseIf true, the content safety checker is enabled.

Our wrapper params (not part of the model input schema): out (required — workdir-relative output filename), mock (optional — test placeholder, no real generation). Our format (optional — shorts/reels/horizontal) is a wrapper we map to the model's image_size field as an explicit {width, height} object (shorts/reels → 1080×1920, horizontal → 1920×1080).

Limits — model limits:

  • Max reference images: 10 (last 10 used if more are sent).
  • Max resolution: 3072×3072 (9 MP); total pixel count supported between 2560×1440 (≈3.7 MP) and 4096×4096 (≈9.43 MP, scaled to fit).
  • Batch: 1–6 generations per call (num_images), up to 6 images each (max_images).
  • Output format: PNG delivered via HTTPS URL (or data URI when sync_mode=true).

Topaz Image Upscale topaz_upscale_image

Topaz image enhancer — upscale and enhance images (add detail, face enhancement, sharpening, denoising, compression-artifact removal, and generative detail).

Call it viaimage tool, action: "upscale" (MCP) · raw: POST /v1/jobs/topaz_upscale_image

Cost16 cr per call
Mode / timeoutsync / 120s (from our YAML)

Parameters — the model's input schema:

ParamTypeRequiredDefaultAllowed / rangeDescription
image_urlstringnon-empty URLURL of the image to be upscaled.
modelstring (enum)Standard V2Low Resolution V2, Standard V2, CGI, High Fidelity V2, Text Refine, Recovery, Redefine, Recovery V2, Standard MAX, Wonder, Wonder 3Model to use for image enhancement.
upscale_factornumber214Factor to upscale the image by (2.0 doubles width and height).
crop_to_fillbooleanfalsetrue / falseCrop the output to fill the target size.
output_formatstring (enum)jpegjpeg, pngOutput format of the upscaled image.
subject_detectionstring (enum)AllAll, Foreground, BackgroundSubject detection mode. Applies to standard enhance and Recovery V2 models.
face_enhancementbooleantruetrue / falseApply face enhancement. Applies to standard enhance and Recovery V2 models.
face_enhancement_creativitynumber001Creativity for face enhancement; 0 = none, 1 = max. Ignored if face enhancement is disabled.
face_enhancement_strengthnumber0.801Strength of face enhancement; 0 = none, 1 = max. Ignored if face enhancement is disabled.
sharpennumber01Sharpening level. Applies to Standard V2, Low Resolution V2, CGI, High Fidelity V2, Text Refine, Redefine.
denoisenumber01Denoising level. Applies to Standard V2, Low Resolution V2, CGI, High Fidelity V2, Text Refine, Redefine.
fix_compressionnumber01Compression-artifact removal. Applies to Standard V2, Low Resolution V2, High Fidelity V2, Text Refine.
strengthnumber0.011Enhancement strength. Applies to Text Refine model only.
creativityinteger16Generative creativity (higher = more hallucinated detail). Applies to Redefine model only.
textureinteger15Texture detail level for generative upscaling. Applies to Redefine model only.
promptstringmax 1024 charsText prompt to guide generative upscaling. Applies to Redefine model only.
autopromptbooleantrue / falseAuto-generate the prompt for generative upscaling. Applies to Redefine model only.
detailnumber01Detail recovery level. Applies to Recovery V2 model only.
enhancement_strengthstring (enum)low, medium, highEnhancement strength for generative upscaling. Applies to Wonder 3 model only; auto-configured when omitted.

Our wrapper params (not part of the model input schema): out (required — workdir-relative output filename), mock (optional — test placeholder). This model has no format mapping (format_field is empty), so no model size field is derived from format.

Limits — model limits: upscale_factor 14; prompt ≤ 1024 chars; accepted input formats jpg, jpeg, png, webp, gif, avif. Catalog cost is a flat 16 cr per call (covers outputs up to ~24 MP).

Framehood