Core Concepts

Understand the architecture and key concepts of the Estuary SDK to build effective conversational AI experiences.

SDK Architecture

The Estuary SDK is organized into three main layers:

┌─────────────────────────────────────────────────────────────────┐
│                        Your Application                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   COMPONENTS LAYER (High-Level API)                             │
│   ┌──────────────────────────────────────────────────────────┐  │
│   │ EstuaryCredentials │ EstuaryCharacter │ EstuaryMicrophone│  │
│   │ EstuaryManager     │ EstuaryActionManager                │  │
│   └──────────────────────────────────────────────────────────┘  │
│                              │                                  │
│   CORE LAYER (Low-Level)     │                                  │
│   ┌──────────────────────────↓──────────────────────────────┐   │
│   │     EstuaryClient │ EstuaryConfig │ EstuaryEvents       │   │
│   └─────────────────────────────────────────────────────────┘   │
│                              │                                  │
│   UTILITIES                  │                                  │
│   ┌──────────────────────────↓──────────────────────────────┐   │
│   │     AudioConverter │ Models (BotResponse, SessionInfo)  │   │
│   └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
                               │
                               ▼ WebSocket (Socket.IO v4)
                      ┌───────────────────┐
                      │   Estuary Server  │
                      └───────────────────┘

Connection Lifecycle

Understanding the connection lifecycle is essential for building robust applications.

Connection States

The SDK tracks connection state through the ConnectionState enum:

enum ConnectionState {
    Disconnected = 'disconnected',  // Not connected
    Connecting = 'connecting',       // Attempting to connect
    Connected = 'connected',         // Ready to communicate
    Reconnecting = 'reconnecting',   // Lost connection, retrying
    Error = 'error'                  // Connection failed
}

Connection Flow

┌──────────────┐
│ Disconnected │
└──────┬───────┘
       │ connect()
       ▼
┌──────────────┐
│  Connecting  │──────────────────────┐
└──────┬───────┘                      │
       │ WebSocket open               │ Error
       ▼                              ▼
┌──────────────┐               ┌──────────────┐
│Authenticating│               │    Error     │
└──────┬───────┘               └──────────────┘
       │ session_info received         ▲
       ▼                               │
┌──────────────┐                       │
│  Connected   │───────────────────────┘
└──────┬───────┘    connection lost
       │
       ▼
┌──────────────┐
│ Reconnecting │ (auto-reconnect enabled)
└──────────────┘

Event System

The SDK uses an event-driven architecture. All major components extend EventEmitter.

Core Events

Event	Data	Description
`sessionConnected`	`SessionInfo`	Successfully connected and authenticated
`disconnected`	`reason: string`	Connection closed
`botResponse`	`BotResponse`	AI text response received
`botVoice`	`BotVoice`	AI voice audio chunk received
`sttResponse`	`SttResponse`	Speech-to-text transcription
`interrupt`	`InterruptData`	Conversation interrupted
`error`	`string`	Error occurred
`connectionStateChanged`	`ConnectionState`	Connection state changed

Event Subscription Pattern

// Subscribe to events
character.on('connected', (session: SessionInfo) => {
    print(`Connected with session: ${session.sessionId}`);
});

character.on('botResponse', (response: BotResponse) => {
    if (response.isFinal) {
        print(`AI said: ${response.text}`);
    }
});

// Unsubscribe when done
character.off('botResponse', myHandler);

Data Models

SessionInfo

Returned when a connection is established:

interface SessionInfo {
    sessionId: string;      // Unique session identifier
    conversationId: string; // Persists across sessions for same user
    characterId: string;    // The AI character ID
    playerId: string;       // Your user/player ID
}

BotResponse

AI text responses:

interface BotResponse {
    text: string;           // The response text
    isFinal: boolean;       // True if complete response
    partial: boolean;       // True if streaming chunk
    messageId: string;      // Unique message ID
    chunkIndex: number;     // Chunk number for streaming
    isInterjection: boolean;// Proactive message (not reply)
}

BotVoice

AI voice audio data:

interface BotVoice {
    audio: string;          // Base64-encoded PCM16 audio
    chunkIndex: number;     // Audio chunk number
    isFinal: boolean;       // True if last chunk
    sampleRate: number;     // Audio sample rate (usually 24000)
}

SttResponse

Speech-to-text transcription:

interface SttResponse {
    text: string;           // Transcribed text
    isFinal: boolean;       // True if final transcription
    confidence: number;     // Confidence score (0-1)
}

Audio Pipeline

The SDK handles audio in both directions:

Voice Input (User to Server)

┌────────────────────┐    ┌───────────────────┐    ┌─────────────────┐
│ MicrophoneRecorder │--→ │ EstuaryMicrophone │--→ │  EstuaryClient  │
│      (Float32)     │    │ (PCM16 → Base64)  │    │  (WebSocket)    │
└────────────────────┘    └───────────────────┘    └─────────────────┘

MicrophoneRecorder captures Float32 audio at 16kHz
EstuaryMicrophone converts to PCM16 and Base64 encodes
EstuaryClient streams to server via WebSocket

Voice Output (Server to User)

┌─────────────────┐    ┌──────────────────┐    ┌──────────────────┐
│  EstuaryClient  │---→│ SimpleAutoConnect│---→│DynamicAudioOutput│
│  (WebSocket)    │    │ (Base64 → PCM16) │    │    (AudioComp)   │
└─────────────────┘    └──────────────────┘    └──────────────────┘

EstuaryClient receives Base64 audio from server
SimpleAutoConnect decodes to PCM16 bytes
DynamicAudioOutput plays through AudioComponent

Audio Specifications

Direction	Sample Rate	Format	Encoding
Input (STT)	16,000 Hz	PCM16	Base64
Output (TTS)	24,000 Hz	PCM16	Base64

Singleton Pattern

Several SDK components use the singleton pattern for easy global access:

EstuaryCredentials

// Access from anywhere
if (EstuaryCredentials.hasInstance) {
    const apiKey = EstuaryCredentials.instance.apiKey;
    const characterId = EstuaryCredentials.instance.characterId;
}

EstuaryManager

// Central connection manager
EstuaryManager.instance.connect();
EstuaryManager.instance.sendText("Hello!");

EstuaryActions (Global Event System)

// Subscribe to actions from any script
EstuaryActions.on("wave", (action) => {
    playWaveAnimation();
});

Configuration Options

The EstuaryConfig interface provides connection settings:

interface EstuaryConfig {
    serverUrl: string;           // Estuary server URL
    apiKey?: string;             // Your API key
    characterId: string;         // Character UUID
    playerId: string;            // User identifier
    
    // Audio settings
    recordingSampleRate?: number;    // Default: 16000
    playbackSampleRate?: number;     // Default: 24000
    audioChunkDurationMs?: number;   // Default: 100
    
    // Connection settings
    autoReconnect?: boolean;         // Default: true
    maxReconnectAttempts?: number;   // Default: 5
    reconnectDelayMs?: number;       // Default: 2000
    
    // Debug
    debugLogging?: boolean;          // Default: false
}

Best Practices

Always Handle Disconnections

character.on('disconnected', () => {
    // Clean up UI state
    // Stop animations
    // Show reconnecting indicator
});

character.on('error', (error) => {
    // Log error for debugging
    // Show user-friendly message
    print(`Connection error: ${error}`);
});

Use Debug Mode During Development

// Enable in EstuaryCredentials
debugMode: true

Start Voice Session Before Streaming

character.on('connected', (session) => {
    // MUST call this before streaming audio
    character.startVoiceSession();
    microphone.startRecording();
});

Handle Interrupts Gracefully

character.on('interrupt', () => {
    // Stop current audio playback
    dynamicAudioOutput.interruptAudioOutput();
    
    // Reset any UI state
    hideResponseIndicator();
});

Clean Up Resources

onDestroy() {
    if (this.microphone) {
        this.microphone.dispose();
    }
    if (this.character) {
        this.character.dispose();
    }
}

Next Steps

Voice Connection - Implement voice conversations
User Management - Handle user persistence
Action System - React to AI actions

SDK Architecture​

Connection Lifecycle​

Connection States​

Connection Flow​

Event System​

Core Events​

Event Subscription Pattern​

Data Models​

SessionInfo​

BotResponse​

BotVoice​

SttResponse​

Audio Pipeline​

Voice Input (User to Server)​

Voice Output (Server to User)​

Audio Specifications​

Singleton Pattern​

EstuaryCredentials​

EstuaryManager​

EstuaryActions (Global Event System)​

Configuration Options​

Best Practices​

Always Handle Disconnections​

Use Debug Mode During Development​

Start Voice Session Before Streaming​

Handle Interrupts Gracefully​

Clean Up Resources​

Next Steps​