Skip to main content

Action Protocol

Estuary characters can trigger actions in your application by embedding XML-style tags in their text responses. This lets AI characters do more than just talk -- they can animate, navigate, play sounds, or control any aspect of your application.

How Actions Work

Actions are embedded directly in bot_response text using self-closing XML tags:

Sure, I'd love to wave at you! <action name="wave" />

When the SDK receives a bot_response containing action tags:

  1. The tags are parsed to extract the action name and parameters
  2. Action events are dispatched to your registered handlers
  3. The tags are optionally stripped from the display text

This means the user sees: "Sure, I'd love to wave at you!" while your code receives a wave action to trigger an animation.

Action Tag Format

Actions use self-closing XML tags with a required name attribute:

<action name="actionName" />

Actions can include additional parameters as attributes:

<action name="navigate" target="kitchen" speed="fast" />
<action name="emote" type="laugh" intensity="0.8" />
<action name="play_sound" clip="doorbell" volume="0.5" />

Rules

  • The name attribute is required and identifies which handler to invoke
  • All other attributes are optional parameters passed to the handler
  • Attribute values are always strings (parse them to numbers/booleans in your handler)
  • Tags are case-insensitive (<action> and <Action> are equivalent)
  • Multiple actions can appear in a single response

Defining Custom Actions

Actions are defined as part of your character's configuration in the Estuary Dashboard. When creating or editing a character, you specify which actions are available and what they do. The LLM uses these definitions to decide when and how to emit actions.

Example character action configuration:

Available actions:
- wave: Wave at the user. Use when greeting or saying goodbye.
- sit: Sit down. Use when the conversation is relaxed.
- navigate(target): Move to a location. Use when the user asks you to go somewhere.
- play_sound(clip): Play a sound effect. Use for emphasis or reactions.

The LLM learns from these descriptions when each action is appropriate and includes them in its responses naturally.

Client-Side Handling

Parsing

SDKs include a built-in action parser that extracts action tags from bot_response text. Parsed actions contain:

  • Name -- The action identifier (from the name attribute)
  • Parameters -- A dictionary of all other attributes

Handler Registration

Each SDK provides a way to register handlers for specific action names. When an action is parsed from a response, the matching handler is invoked.

Unity (C#):

// Using EstuaryActionManager component
// Configure action bindings in the Inspector, or register in code:
actionManager.OnAnyActionReceived += (action) =>
{
Debug.Log($"Action: {action.Name}");

switch (action.Name)
{
case "wave":
animator.SetTrigger("Wave");
break;
case "navigate":
var target = action.GetParameter("target");
navAgent.SetDestination(GetWaypoint(target));
break;
case "play_sound":
var clip = action.GetParameter("clip");
audioSource.PlayOneShot(GetClip(clip));
break;
}
};

TypeScript:

client.on('action', (action) => {
console.log(`Action: ${action.name}`);

if (action.name === 'wave') {
playAnimation('wave');
} else if (action.name === 'navigate') {
navigateTo(action.parameters.target);
}
});

Stripping Action Tags

By default, SDKs strip action tags from the text before displaying it to the user. The original text with tags is still available if needed.

Original textDisplay text
Let me wave! <action name="wave" />Let me wave!
<action name="sit" /> I'll sit down here.I'll sit down here.

Examples

Triggering Animations

<action name="emote" type="happy" />
<action name="gesture" animation="point_forward" />
<action name="idle" pose="thinking" />
<action name="navigate" target="door" speed="walk" />
<action name="look_at" target="user" />
<action name="turn" direction="left" degrees="90" />

Sound Effects

<action name="play_sound" clip="notification" volume="0.7" />
<action name="play_music" track="ambient_forest" />

UI Control

<action name="show_image" url="photo.jpg" />
<action name="open_panel" panel="inventory" />
<action name="highlight" element="submit_button" />

Next Steps

  • Conversation Protocol -- The full event reference
  • For SDK-specific action implementation details, see your SDK's action system guide