Actor models

Temporarily disabled

Actor models — LoRA training, actor-consistent generation, and voice cloning — are temporarily disabled while we rework them. The actor tool and the actor-dependent actions (image(actor_sheet), video(scene), actor_id routing) are not currently available. This page is kept for reference.

Persistent characters: LoRA training, actor-consistent generation, and voice cloning.

Generations are charged in credits (see Credits & plans). Every generation model also accepts mock: true for a free placeholder result.

Actor LoRA Training `actor_lora_train`

Fine-tunes a Flux LoRA on a ZIP of 4–30 reference images, producing a downloadable LoRA .safetensors URL for a persistent actor identity.

Call it via — MCP tool actor action create (the create path submits training to actor_lora_train after registering the actor) · raw: POST /v1/jobs/actor_lora_train


Cost	500 cr per call
Mode / timeout	webhook / 20m

Parameters — the model's input schema:

Param	Type	Required	Default	Allowed / range	Description
`images_data_url`	string	✓	—	publicly-accessible URL or base64 data URI	URL to a ZIP archive of training images. Use at least 4 (more is better). The archive may also contain per-image caption `.txt` files and `*_mask.jpg` mask files sharing the image's name.
`trigger_word`	string		— (none)	—	Trigger word used in captions. If omitted, no trigger word is used; if no captions are supplied, the trigger word is used in place of captions.
`create_masks`	boolean		`true`	true / false	If true, segmentation masks weight the training loss (a face mask is used for people when possible).
`steps`	integer		— (unspecified)	—	Number of training steps for the LoRA.
`is_style`	boolean		`false`	true / false	If true, trains a style LoRA: deactivates segmentation and auto-captioning and uses the trigger word to specify the style.
`is_input_format_already_preprocessed`	boolean		`false`	true / false	If false, expects raw input (image + matching caption file by name). Set true if the data is already in the preprocessed format.
`data_archive_format`	string		— (inferred from URL)	e.g. `zip`	Archive format. If unspecified, inferred from the URL.

Our wrapper params (not part of the model schema): out (required — output filename for the resulting LoRA file) and mock (optional — returns a test placeholder instead of running real training). No format/size mapping applies to this model (it has no size field; our YAML format_field is empty).

Limits — the model states a practical minimum of ~4 images (more recommended); our wrapper documents a 4–30 reference-image range. No max resolution, file-size, or hard image-count cap is published for this endpoint, so none is asserted here.

Flux LoRA Inference `actor_lora_inference`

FLUX.1 [dev] text-to-image generation with one or more custom LoRA adaptations — used internally to render an actor with its trained likeness LoRA.

Call it via — image(create, actor_id=…, prompt=…) (also used internally by image(actor_sheet), image(animate) first-frame, actor(batch), and video(scene); the Worker injects the actor's LoRA path + scale and prepends the trigger_word) · raw: POST /v1/jobs/actor_lora_inference


Cost	Billed per megapixel — ≈7 cr per image at the ~1 MP presets
Mode / timeout	sync / 60s (from our YAML)

Parameters — the model's input schema:

Param	Type	Required	Default	Allowed / range	Description
`prompt`	string	✓	—	—	The prompt to generate an image from.
`image_size`	ImageSize \| enum		`portrait_16_9` (our default; the model's own is `landscape_4_3`)	`square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9` — or an object `{width, height}` for custom size	The size of the generated image.
`num_inference_steps`	integer		`28`	—	The number of inference steps to perform.
`seed`	integer		—	—	Same seed + same prompt + same model version yields the same image.
`loras`	list<LoraWeight>		—	each item: `{path` (string, required)`, scale` (float, default `1`)`}`	The LoRAs to use; any number may be supplied and are merged.
`guidance_scale`	float		`3.5`	—	CFG scale — how closely the model sticks to the prompt.
`sync_mode`	boolean		—	—	If true, media is returned as a data URI and not stored in request history.
`num_images`	integer		`1`	—	Number of images to generate (always 1 for streaming output).
`enable_safety_checker`	boolean		`true`	—	Enables the safety checker.
`output_format`	enum		`jpeg`	`jpeg, png`	The format of the generated image.
`acceleration`	enum		`none`	`none, regular`	Acceleration level; `regular` balances speed and quality.

Our wrapper params (not part of the model schema): out (required — workdir-relative output filename), mock (optional — test placeholder). The image/video MCP tools accept the same friendly format names on the actor path as on the plain path and normalize them to this model's image_size enum before submitting: portrait/vertical/shorts/reels/9:16 → portrait_16_9 (default), landscape/horizontal/wide/16:9 → landscape_16_9, square/1:1 → square_hd, 3:4 → portrait_4_3, 4:3 → landscape_4_3. Matching is case-insensitive (the value is lowercased before lookup). (Raw image_size enum values pass through unchanged; an unrecognised value falls back to the default rather than erroring.)

Limits — image_size enum is fixed to the six named values above; custom sizes are passed as a {width, height} object (model default 512×512). num_images defaults to 1 and is forced to 1 for streaming output. No max-resolution / file-size / character limit is published for the model beyond these.

Actor Voice Clone (IVC) `actor_voice_clone`

Instant Voice Cloning from one or more audio samples; returns an ElevenLabs voice_id that can be stored on an actor and reused for text-to-speech.

Call it via — actor tool, create or update action with voice_sample_url set (the worker submits the clone job automatically and stores the returned voice_id on the actor) · raw: POST /v1/jobs/actor_voice_clone


Cost	1 cr per call
Mode / timeout	sync / 120s

Parameters — the model's input schema (POST /v1/voices/add, multipart form):

Param	Type	Required	Default	Allowed / range	Description
`name`	string	✓	—	—	The name that identifies this voice (shown in the voice dropdown).
`files`	file[] (multipart)	✓	—	audio recordings	A list of audio recordings intended for voice cloning.
`remove_background_noise`	boolean	—	`false`	`true` / `false`	If set, removes background noise from samples via the audio-isolation model. If samples have no background noise, this can reduce quality.
`description`	string \| null	—	`null`	—	A description of the voice.
`labels`	map<string,string> \| string \| null	—	`null`	keys: language, accent, gender, age	Labels for the voice (free-form metadata).

Our wrapper params (not part of the model schema): out (required — output filename for the job result), mock (optional — test placeholder, skips real generation). Our YAML exposes the model's files field as sample_urls (an array of public audio URLs); the Go adapter downloads each URL and submits it as a multipart files entry. This model has no format→size mapping (format_field is empty).

Limits — Request is a multipart form accepting multiple audio files. No hard max-file-count or per-file size is published in this endpoint's reference; documented guidance is to provide clean samples totaling ~1–3 minutes. labels keys are restricted to language, accent, gender, or age. Other hard limits (exact max file count / file size / supported codecs) are not stated in the endpoint reference and are omitted.

Actor models ​

Actor LoRA Training actor_lora_train ​

Flux LoRA Inference actor_lora_inference ​

Actor Voice Clone (IVC) actor_voice_clone ​

Actor models

Actor LoRA Training `actor_lora_train`

Flux LoRA Inference `actor_lora_inference`

Actor Voice Clone (IVC) `actor_voice_clone`