# Actor models

::: warning Temporarily disabled
Actor models — LoRA training, actor-consistent generation, and voice cloning —
are **temporarily disabled** while we rework them. The `actor` tool and the
actor-dependent actions (`image(actor_sheet)`, `video(scene)`, `actor_id`
routing) are not currently available. This page is kept for reference.
:::

Persistent characters: LoRA training, actor-consistent generation, and voice cloning.

> Generations are charged in credits (see [Credits & plans](/guide/billing)). Every generation model also accepts `mock: true` for a free placeholder result.

### Actor LoRA Training `actor_lora_train`

Fine-tunes a Flux LoRA on a ZIP of 4–30 reference images, producing a downloadable LoRA `.safetensors` URL for a persistent actor identity.

**Call it via** — MCP tool `actor` action `create` (the `create` path submits training to `actor_lora_train` after registering the actor) · raw: `POST /v1/jobs/actor_lora_train`

| | |
|---|---|
| **Cost** | 500 cr per call |
| **Mode / timeout** | webhook / 20m |

**Parameters** — the model's input schema:

| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
| `images_data_url` | string | ✓ | — | publicly-accessible URL or base64 data URI | URL to a ZIP archive of training images. Use at least 4 (more is better). The archive may also contain per-image caption `.txt` files and `*_mask.jpg` mask files sharing the image's name. |
| `trigger_word` | string |  | — (none) | — | Trigger word used in captions. If omitted, no trigger word is used; if no captions are supplied, the trigger word is used in place of captions. |
| `create_masks` | boolean |  | `true` | true / false | If true, segmentation masks weight the training loss (a face mask is used for people when possible). |
| `steps` | integer |  | — (unspecified) | — | Number of training steps for the LoRA. |
| `is_style` | boolean |  | `false` | true / false | If true, trains a style LoRA: deactivates segmentation and auto-captioning and uses the trigger word to specify the style. |
| `is_input_format_already_preprocessed` | boolean |  | `false` | true / false | If false, expects raw input (image + matching caption file by name). Set true if the data is already in the preprocessed format. |
| `data_archive_format` | string |  | — (inferred from URL) | e.g. `zip` | Archive format. If unspecified, inferred from the URL. |

Our wrapper params (not part of the model schema): `out` (required — output filename for the resulting LoRA file) and `mock` (optional — returns a test placeholder instead of running real training). No `format`/size mapping applies to this model (it has no size field; our YAML `format_field` is empty).

**Limits** — the model states a practical minimum of ~4 images (more recommended); our wrapper documents a 4–30 reference-image range. No max resolution, file-size, or hard image-count cap is published for this endpoint, so none is asserted here.

### Flux LoRA Inference `actor_lora_inference`

FLUX.1 [dev] text-to-image generation with one or more custom LoRA adaptations — used internally to render an actor with its trained likeness LoRA.

**Call it via** — `image(create, actor_id=…, prompt=…)` (also used internally by `image(actor_sheet)`, `image(animate)` first-frame, `actor(batch)`, and `video(scene)`; the Worker injects the actor's LoRA path + scale and prepends the trigger_word) · raw: `POST /v1/jobs/actor_lora_inference`

| | |
|---|---|
| **Cost** | Billed per megapixel — ≈7 cr per image at the ~1 MP presets |
| **Mode / timeout** | sync / 60s (from our YAML) |

**Parameters** — the model's input schema:

| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
| `prompt` | string | ✓ | — | — | The prompt to generate an image from. |
| `image_size` | ImageSize \| enum | | `portrait_16_9` (our default; the model's own is `landscape_4_3`) | `square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9` — or an object `{width, height}` for custom size | The size of the generated image. |
| `num_inference_steps` | integer | | `28` | — | The number of inference steps to perform. |
| `seed` | integer | | — | — | Same seed + same prompt + same model version yields the same image. |
| `loras` | list&lt;LoraWeight&gt; | | — | each item: `{path` (string, required)`, scale` (float, default `1`)`}` | The LoRAs to use; any number may be supplied and are merged. |
| `guidance_scale` | float | | `3.5` | — | CFG scale — how closely the model sticks to the prompt. |
| `sync_mode` | boolean | | — | — | If true, media is returned as a data URI and not stored in request history. |
| `num_images` | integer | | `1` | — | Number of images to generate (always 1 for streaming output). |
| `enable_safety_checker` | boolean | | `true` | — | Enables the safety checker. |
| `output_format` | enum | | `jpeg` | `jpeg, png` | The format of the generated image. |
| `acceleration` | enum | | `none` | `none, regular` | Acceleration level; `regular` balances speed and quality. |

Our wrapper params (not part of the model schema): `out` (required — workdir-relative output filename), `mock` (optional — test placeholder). The `image`/`video` MCP tools accept the same friendly `format` names on the actor path as on the plain path and normalize them to this model's `image_size` enum before submitting: `portrait`/`vertical`/`shorts`/`reels`/`9:16` → `portrait_16_9` (default), `landscape`/`horizontal`/`wide`/`16:9` → `landscape_16_9`, `square`/`1:1` → `square_hd`, `3:4` → `portrait_4_3`, `4:3` → `landscape_4_3`. Matching is case-insensitive (the value is lowercased before lookup). (Raw `image_size` enum values pass through unchanged; an unrecognised value falls back to the default rather than erroring.)

**Limits** — `image_size` enum is fixed to the six named values above; custom sizes are passed as a `{width, height}` object (model default `512×512`). `num_images` defaults to 1 and is forced to 1 for streaming output. No max-resolution / file-size / character limit is published for the model beyond these.

### Actor Voice Clone (IVC) `actor_voice_clone`

Instant Voice Cloning from one or more audio samples; returns an ElevenLabs `voice_id` that can be stored on an actor and reused for text-to-speech.

**Call it via** — `actor` tool, `create` or `update` action with `voice_sample_url` set (the worker submits the clone job automatically and stores the returned `voice_id` on the actor) · raw: `POST /v1/jobs/actor_voice_clone`

| | |
|---|---|
| **Cost** | 1 cr per call |
| **Mode / timeout** | sync / 120s |

**Parameters** — the model's input schema (`POST /v1/voices/add`, multipart form):

| Param | Type | Required | Default | Allowed / range | Description |
|---|---|---|---|---|---|
| `name` | string | ✓ | — | — | The name that identifies this voice (shown in the voice dropdown). |
| `files` | file[] (multipart) | ✓ | — | audio recordings | A list of audio recordings intended for voice cloning. |
| `remove_background_noise` | boolean | — | `false` | `true` / `false` | If set, removes background noise from samples via the audio-isolation model. If samples have no background noise, this can reduce quality. |
| `description` | string \| null | — | `null` | — | A description of the voice. |
| `labels` | map&lt;string,string> \| string \| null | — | `null` | keys: language, accent, gender, age | Labels for the voice (free-form metadata). |

**Our wrapper params** (not part of the model schema): `out` (required — output filename for the job result), `mock` (optional — test placeholder, skips real generation). Our YAML exposes the model's `files` field as `sample_urls` (an array of public audio URLs); the Go adapter downloads each URL and submits it as a multipart `files` entry. This model has no `format`→size mapping (`format_field` is empty).

**Limits** — Request is a multipart form accepting multiple audio files. No hard max-file-count or per-file size is published in this endpoint's reference; documented guidance is to provide clean samples totaling ~1–3 minutes. `labels` keys are restricted to language, accent, gender, or age. Other hard limits (exact max file count / file size / supported codecs) are not stated in the endpoint reference and are omitted.