Core Concepts
Understand the architecture and design patterns of the Estuary Unity SDK.
Architecture
The SDK is organized into three layers:
┌────────────────────────────────────────────────────────────────┐
│ Your Game / App │
├────────────────────────────────────────────────────────────────┤
│ │
│ COMPONENTS (MonoBehaviours — attach to GameObjects) │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ EstuaryManager │ EstuaryCharacter │ EstuaryMicrophone │ │
│ │ EstuaryAudioSource │ EstuaryWebcam │ EstuaryActionMgr │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ CORE (Plain C# classes) │ │
│ ┌──────────────────────────▼─────────────────────────────┐ │
│ │ EstuaryClient │ EstuaryConfig │ LiveKitVoiceManager │ │
│ │ EstuaryEvents │ LiveKitVideoManager │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ MODELS & UTILITIES │ │
│ ┌──────────────────────────▼─────────────────────────────┐ │
│ │ SessionInfo │ BotResponse │ BotVoice │ AgentAction │ │
│ │ AudioConverter │ ActionParser │ Base64Helper │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────┘
│
Socket.IO v4 (WebSocket) + LiveKit (WebRTC)
▼
┌────────────────┐
│ Estuary Server │
└────────────────┘
Components Layer
MonoBehaviour components that you attach to GameObjects in the Unity Editor. These provide the high-level API:
- EstuaryManager -- Singleton that manages the connection
- EstuaryCharacter -- Per-character conversation interface
- EstuaryMicrophone -- Microphone capture for voice input
- EstuaryAudioSource -- TTS audio playback
- EstuaryWebcam -- Video streaming for spatial awareness
- EstuaryActionManager -- Routes parsed actions to handlers
Core Layer
Plain C# classes that handle the protocol and transport:
- EstuaryClient -- Socket.IO v4 client (manual WebSocket implementation, no third-party library)
- EstuaryConfig -- ScriptableObject for global settings
- EstuaryEvents -- C# delegate definitions for all events
- LiveKitVoiceManager -- WebRTC voice via the LiveKit Unity SDK
- LiveKitVideoManager -- WebRTC video streaming
Models & Utilities
Data classes matching the SDK Contract and helper utilities for audio conversion, action parsing, and base64 encoding.
EstuaryManager (Singleton)
EstuaryManager is the central coordinator. It:
- Creates and owns the
EstuaryClient(Socket.IO connection) - Creates and owns the
LiveKitVoiceManager(WebRTC voice) - Routes server events to the active
EstuaryCharacter - Manages character registration and switching
Access it from anywhere:
// Check if connected
if (EstuaryManager.Instance.IsConnected)
{
await EstuaryManager.Instance.SendTextAsync("Hello!");
}
// Check LiveKit state
if (EstuaryManager.Instance.IsLiveKitReady)
{
Debug.Log("Voice is ready");
}
The manager is created automatically if accessed via Instance and no instance exists in the scene. It marks itself DontDestroyOnLoad.
EstuaryConfig (ScriptableObject)
EstuaryConfig stores global settings as a Unity asset. Create one via Assets > Create > Estuary > Config.
| Field | Default | Description |
|---|---|---|
ServerUrl | https://api.estuary-ai.com | Server URL |
ApiKey | (empty) | Your est_... API key |
VoiceMode | LiveKit | LiveKit (WebRTC) or WebSocket |
RecordingSampleRate | 16000 | Mic sample rate (auto 16kHz for LiveKit) |
PlaybackSampleRate | 48000 | Expected TTS sample rate |
AudioChunkDurationMs | 100 | WebSocket audio chunk size |
DebugLogging | false | Enable verbose logs |
You can set the API key at runtime to avoid embedding it in the asset:
EstuaryManager.Instance.Config.SetApiKeyRuntime("est_...");
EstuaryCharacter (Per-Character)
Each AI character in your scene gets an EstuaryCharacter component. It holds character-specific settings and exposes conversation methods.
// Send a message
character.SendText("What's the weather like?");
// Start voice
character.StartVoiceSession();
// Listen for responses
character.OnBotResponse += (response) =>
{
if (response.IsFinal)
chatUI.AddMessage(response.Text);
};
Key properties:
| Property | Type | Description |
|---|---|---|
CharacterId | string | Character UUID |
PlayerId | string | End user identifier |
IsConnected | bool | Connection status |
IsVoiceSessionActive | bool | Whether voice is active |
CurrentPartialResponse | string | Accumulated streaming text |
CurrentSession | SessionInfo | Session details |
Event System
The SDK uses C# delegates and Unity Events for all callbacks.
C# Delegates
Subscribe in code for programmatic control:
character.OnBotResponse += (BotResponse response) => { ... };
character.OnTranscript += (SttResponse stt) => { ... };
character.OnInterrupt += (InterruptData data) => { ... };
character.OnActionReceived += (AgentAction action) => { ... };
character.OnConnected += (SessionInfo session) => { ... };
character.OnDisconnected += () => { ... };
character.OnError += (string error) => { ... };
UnityEvents (Inspector)
Drag-and-drop event binding in the Inspector. The EstuaryCharacter component exposes these as serialized UnityEvents:
onConnected(SessionInfo)onDisconnectedonBotResponse(BotResponse)onVoiceReceived(BotVoice)onTranscript(SttResponse)onInterruptonError(string)onActionReceived(AgentAction)
Connection Lifecycle
┌──────────────┐
│ Disconnected │
└──────┬───────┘
│ Connect()
▼
┌──────────────┐
│ Connecting │───────────────────────┐
└──────┬───────┘ │
│ WebSocket open + auth │ Error
▼ ▼
┌──────────────┐ ┌─────────────┐
│ Connected │ │ Error │
└──────┬───────┘ └─────────────┘
│ ▲
│ connection lost │
▼ │
┌──────────────┐ │
│ Reconnecting │───────────────────────┘
└──────────────┘ max attempts reached
Connection states are defined in the ConnectionState enum:
Disconnected-- Not connectedConnecting-- Attempting to connectConnected-- Ready to communicateReconnecting-- Lost connection, retrying (up to 5 attempts with linear backoff)Error-- Connection failed
Main Thread Dispatching
The Socket.IO connection runs on a background thread. All events are dispatched to Unity's main thread via a ConcurrentQueue that EstuaryManager.Update() drains every frame. You can safely update UI, trigger animations, and call Unity APIs from any event callback.
Next Steps
- Text Chat -- Build a text chat interface
- Voice Connection -- Set up voice input and output
- API Reference -- Full component documentation