Skip to main content

Core Concepts

Understand the architecture and key concepts of the Estuary SDK to build effective conversational AI experiences.

SDK Architecture

The Estuary SDK is organized into three main layers:

┌─────────────────────────────────────────────────────────────────┐
│ Your Application │
├─────────────────────────────────────────────────────────────────┤
│ │
│ COMPONENTS LAYER (High-Level API) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ EstuaryCredentials │ EstuaryCharacter │ EstuaryMicrophone│ │
│ │ EstuaryManager │ EstuaryActionManager │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ CORE LAYER (Low-Level) │ │
│ ┌──────────────────────────↓──────────────────────────────┐ │
│ │ EstuaryClient │ EstuaryConfig │ EstuaryEvents │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ UTILITIES │ │
│ ┌──────────────────────────↓──────────────────────────────┐ │
│ │ AudioConverter │ Models (BotResponse, SessionInfo) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

▼ WebSocket (Socket.IO v4)
┌───────────────────┐
│ Estuary Server │
└───────────────────┘

Connection Lifecycle

Understanding the connection lifecycle is essential for building robust applications.

Connection States

The SDK tracks connection state through the ConnectionState enum:

enum ConnectionState {
Disconnected = 'disconnected', // Not connected
Connecting = 'connecting', // Attempting to connect
Connected = 'connected', // Ready to communicate
Reconnecting = 'reconnecting', // Lost connection, retrying
Error = 'error' // Connection failed
}

Connection Flow

┌──────────────┐
│ Disconnected │
└──────┬───────┘
│ connect()

┌──────────────┐
│ Connecting │──────────────────────┐
└──────┬───────┘ │
│ WebSocket open │ Error
▼ ▼
┌──────────────┐ ┌──────────────┐
│Authenticating│ │ Error │
└──────┬───────┘ └──────────────┘
│ session_info received ▲
▼ │
┌──────────────┐ │
│ Connected │───────────────────────┘
└──────┬───────┘ connection lost


┌──────────────┐
│ Reconnecting │ (auto-reconnect enabled)
└──────────────┘

Event System

The SDK uses an event-driven architecture. All major components extend EventEmitter.

Core Events

EventDataDescription
sessionConnectedSessionInfoSuccessfully connected and authenticated
disconnectedreason: stringConnection closed
botResponseBotResponseAI text response received
botVoiceBotVoiceAI voice audio chunk received
sttResponseSttResponseSpeech-to-text transcription
interruptInterruptDataConversation interrupted
errorstringError occurred
connectionStateChangedConnectionStateConnection state changed

Event Subscription Pattern

// Subscribe to events
character.on('connected', (session: SessionInfo) => {
print(`Connected with session: ${session.sessionId}`);
});

character.on('botResponse', (response: BotResponse) => {
if (response.isFinal) {
print(`AI said: ${response.text}`);
}
});

// Unsubscribe when done
character.off('botResponse', myHandler);

Data Models

SessionInfo

Returned when a connection is established:

interface SessionInfo {
sessionId: string; // Unique session identifier
conversationId: string; // Persists across sessions for same user
characterId: string; // The AI character ID
playerId: string; // Your user/player ID
}

BotResponse

AI text responses:

interface BotResponse {
text: string; // The response text
isFinal: boolean; // True if complete response
partial: boolean; // True if streaming chunk
messageId: string; // Unique message ID
chunkIndex: number; // Chunk number for streaming
isInterjection: boolean;// Proactive message (not reply)
}

BotVoice

AI voice audio data:

interface BotVoice {
audio: string; // Base64-encoded PCM16 audio
chunkIndex: number; // Audio chunk number
isFinal: boolean; // True if last chunk
sampleRate: number; // Audio sample rate (usually 24000)
}

SttResponse

Speech-to-text transcription:

interface SttResponse {
text: string; // Transcribed text
isFinal: boolean; // True if final transcription
confidence: number; // Confidence score (0-1)
}

Audio Pipeline

The SDK handles audio in both directions:

Voice Input (User to Server)

┌────────────────────┐    ┌───────────────────┐    ┌─────────────────┐
│ MicrophoneRecorder │--→ │ EstuaryMicrophone │--→ │ EstuaryClient │
│ (Float32) │ │ (PCM16 → Base64) │ │ (WebSocket) │
└────────────────────┘ └───────────────────┘ └─────────────────┘
  1. MicrophoneRecorder captures Float32 audio at 16kHz
  2. EstuaryMicrophone converts to PCM16 and Base64 encodes
  3. EstuaryClient streams to server via WebSocket

Voice Output (Server to User)

┌─────────────────┐    ┌──────────────────┐    ┌──────────────────┐
│ EstuaryClient │---→│ SimpleAutoConnect│---→│DynamicAudioOutput│
│ (WebSocket) │ │ (Base64 → PCM16) │ │ (AudioComp) │
└─────────────────┘ └──────────────────┘ └──────────────────┘
  1. EstuaryClient receives Base64 audio from server
  2. SimpleAutoConnect decodes to PCM16 bytes
  3. DynamicAudioOutput plays through AudioComponent

Audio Specifications

DirectionSample RateFormatEncoding
Input (STT)16,000 HzPCM16Base64
Output (TTS)24,000 HzPCM16Base64

Singleton Pattern

Several SDK components use the singleton pattern for easy global access:

EstuaryCredentials

// Access from anywhere
if (EstuaryCredentials.hasInstance) {
const apiKey = EstuaryCredentials.instance.apiKey;
const characterId = EstuaryCredentials.instance.characterId;
}

EstuaryManager

// Central connection manager
EstuaryManager.instance.connect();
EstuaryManager.instance.sendText("Hello!");

EstuaryActions (Global Event System)

// Subscribe to actions from any script
EstuaryActions.on("wave", (action) => {
playWaveAnimation();
});

Configuration Options

The EstuaryConfig interface provides connection settings:

interface EstuaryConfig {
serverUrl: string; // Estuary server URL
apiKey?: string; // Your API key
characterId: string; // Character UUID
playerId: string; // User identifier

// Audio settings
recordingSampleRate?: number; // Default: 16000
playbackSampleRate?: number; // Default: 24000
audioChunkDurationMs?: number; // Default: 100

// Connection settings
autoReconnect?: boolean; // Default: true
maxReconnectAttempts?: number; // Default: 5
reconnectDelayMs?: number; // Default: 2000

// Debug
debugLogging?: boolean; // Default: false
}

Best Practices

Always Handle Disconnections

character.on('disconnected', () => {
// Clean up UI state
// Stop animations
// Show reconnecting indicator
});

character.on('error', (error) => {
// Log error for debugging
// Show user-friendly message
print(`Connection error: ${error}`);
});

Use Debug Mode During Development

// Enable in EstuaryCredentials
debugMode: true

Start Voice Session Before Streaming

character.on('connected', (session) => {
// MUST call this before streaming audio
character.startVoiceSession();
microphone.startRecording();
});

Handle Interrupts Gracefully

character.on('interrupt', () => {
// Stop current audio playback
dynamicAudioOutput.interruptAudioOutput();

// Reset any UI state
hideResponseIndicator();
});

Clean Up Resources

onDestroy() {
if (this.microphone) {
this.microphone.dispose();
}
if (this.character) {
this.character.dispose();
}
}

Next Steps