Skip to main content

Core Concepts

Understand the architecture and design patterns of the Estuary Unity SDK.

Architecture

The SDK is organized into three layers:

┌────────────────────────────────────────────────────────────────┐
│ Your Game / App │
├────────────────────────────────────────────────────────────────┤
│ │
│ COMPONENTS (MonoBehaviours — attach to GameObjects) │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ EstuaryManager │ EstuaryCharacter │ EstuaryMicrophone │ │
│ │ EstuaryAudioSource │ EstuaryWebcam │ EstuaryActionMgr │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ CORE (Plain C# classes) │ │
│ ┌──────────────────────────▼─────────────────────────────┐ │
│ │ EstuaryClient │ EstuaryConfig │ LiveKitVoiceManager │ │
│ │ EstuaryEvents │ LiveKitVideoManager │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ MODELS & UTILITIES │ │
│ ┌──────────────────────────▼─────────────────────────────┐ │
│ │ SessionInfo │ BotResponse │ BotVoice │ AgentAction │ │
│ │ AudioConverter │ ActionParser │ Base64Helper │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────┘

Socket.IO v4 (WebSocket) + LiveKit (WebRTC)

┌────────────────┐
│ Estuary Server │
└────────────────┘

Components Layer

MonoBehaviour components that you attach to GameObjects in the Unity Editor. These provide the high-level API:

  • EstuaryManager -- Singleton that manages the connection
  • EstuaryCharacter -- Per-character conversation interface
  • EstuaryMicrophone -- Microphone capture for voice input
  • EstuaryAudioSource -- TTS audio playback
  • EstuaryWebcam -- Video streaming for spatial awareness
  • EstuaryActionManager -- Routes parsed actions to handlers

Core Layer

Plain C# classes that handle the protocol and transport:

  • EstuaryClient -- Socket.IO v4 client (manual WebSocket implementation, no third-party library)
  • EstuaryConfig -- ScriptableObject for global settings
  • EstuaryEvents -- C# delegate definitions for all events
  • LiveKitVoiceManager -- WebRTC voice via the LiveKit Unity SDK
  • LiveKitVideoManager -- WebRTC video streaming

Models & Utilities

Data classes matching the SDK Contract and helper utilities for audio conversion, action parsing, and base64 encoding.


EstuaryManager (Singleton)

EstuaryManager is the central coordinator. It:

  1. Creates and owns the EstuaryClient (Socket.IO connection)
  2. Creates and owns the LiveKitVoiceManager (WebRTC voice)
  3. Routes server events to the active EstuaryCharacter
  4. Manages character registration and switching

Access it from anywhere:

// Check if connected
if (EstuaryManager.Instance.IsConnected)
{
await EstuaryManager.Instance.SendTextAsync("Hello!");
}

// Check LiveKit state
if (EstuaryManager.Instance.IsLiveKitReady)
{
Debug.Log("Voice is ready");
}

The manager is created automatically if accessed via Instance and no instance exists in the scene. It marks itself DontDestroyOnLoad.


EstuaryConfig (ScriptableObject)

EstuaryConfig stores global settings as a Unity asset. Create one via Assets > Create > Estuary > Config.

FieldDefaultDescription
ServerUrlhttps://api.estuary-ai.comServer URL
ApiKey(empty)Your est_... API key
VoiceModeLiveKitLiveKit (WebRTC) or WebSocket
RecordingSampleRate16000Mic sample rate (auto 16kHz for LiveKit)
PlaybackSampleRate48000Expected TTS sample rate
AudioChunkDurationMs100WebSocket audio chunk size
DebugLoggingfalseEnable verbose logs

You can set the API key at runtime to avoid embedding it in the asset:

EstuaryManager.Instance.Config.SetApiKeyRuntime("est_...");

EstuaryCharacter (Per-Character)

Each AI character in your scene gets an EstuaryCharacter component. It holds character-specific settings and exposes conversation methods.

// Send a message
character.SendText("What's the weather like?");

// Start voice
character.StartVoiceSession();

// Listen for responses
character.OnBotResponse += (response) =>
{
if (response.IsFinal)
chatUI.AddMessage(response.Text);
};

Key properties:

PropertyTypeDescription
CharacterIdstringCharacter UUID
PlayerIdstringEnd user identifier
IsConnectedboolConnection status
IsVoiceSessionActiveboolWhether voice is active
CurrentPartialResponsestringAccumulated streaming text
CurrentSessionSessionInfoSession details

Event System

The SDK uses C# delegates and Unity Events for all callbacks.

C# Delegates

Subscribe in code for programmatic control:

character.OnBotResponse += (BotResponse response) => { ... };
character.OnTranscript += (SttResponse stt) => { ... };
character.OnInterrupt += (InterruptData data) => { ... };
character.OnActionReceived += (AgentAction action) => { ... };
character.OnConnected += (SessionInfo session) => { ... };
character.OnDisconnected += () => { ... };
character.OnError += (string error) => { ... };

UnityEvents (Inspector)

Drag-and-drop event binding in the Inspector. The EstuaryCharacter component exposes these as serialized UnityEvents:

  • onConnected (SessionInfo)
  • onDisconnected
  • onBotResponse (BotResponse)
  • onVoiceReceived (BotVoice)
  • onTranscript (SttResponse)
  • onInterrupt
  • onError (string)
  • onActionReceived (AgentAction)

Connection Lifecycle

┌──────────────┐
│ Disconnected │
└──────┬───────┘
│ Connect()

┌──────────────┐
│ Connecting │───────────────────────┐
└──────┬───────┘ │
│ WebSocket open + auth │ Error
▼ ▼
┌──────────────┐ ┌─────────────┐
│ Connected │ │ Error │
└──────┬───────┘ └─────────────┘
│ ▲
│ connection lost │
▼ │
┌──────────────┐ │
│ Reconnecting │───────────────────────┘
└──────────────┘ max attempts reached

Connection states are defined in the ConnectionState enum:

  • Disconnected -- Not connected
  • Connecting -- Attempting to connect
  • Connected -- Ready to communicate
  • Reconnecting -- Lost connection, retrying (up to 5 attempts with linear backoff)
  • Error -- Connection failed

Main Thread Dispatching

The Socket.IO connection runs on a background thread. All events are dispatched to Unity's main thread via a ConcurrentQueue that EstuaryManager.Update() drains every frame. You can safely update UI, trigger animations, and call Unity APIs from any event callback.


Next Steps