Core Concepts
Understand the architecture and key concepts of the Estuary SDK to build effective conversational AI experiences.
SDK Architecture
The Estuary SDK is organized into three main layers:
┌─────────────────────────────────────────────────────────────────┐
│ Your Application │
├─────────────────────────────────────────────────────────────────┤
│ │
│ COMPONENTS LAYER (High-Level API) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ EstuaryCredentials │ EstuaryCharacter │ EstuaryMicrophone│ │
│ │ EstuaryManager │ EstuaryActionManager │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ CORE LAYER (Low-Level) │ │
│ ┌──────────────────────────↓──────────────────────────────┐ │
│ │ EstuaryClient │ EstuaryConfig │ EstuaryEvents │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ UTILITIES │ │
│ ┌──────────────────────────↓──────────────────────────────┐ │
│ │ AudioConverter │ Models (BotResponse, SessionInfo) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
│
▼ WebSocket (Socket.IO v4)
┌───────────────────┐
│ Estuary Server │
└───────────────────┘
Connection Lifecycle
Understanding the connection lifecycle is essential for building robust applications.
Connection States
The SDK tracks connection state through the ConnectionState enum:
enum ConnectionState {
Disconnected = 'disconnected', // Not connected
Connecting = 'connecting', // Attempting to connect
Connected = 'connected', // Ready to communicate
Reconnecting = 'reconnecting', // Lost connection, retrying
Error = 'error' // Connection failed
}
Connection Flow
┌──────────────┐
│ Disconnected │
└──────┬───────┘
│ connect()
▼
┌──────────────┐
│ Connecting │──────────────────────┐
└──────┬───────┘ │
│ WebSocket open │ Error
▼ ▼
┌──────────────┐ ┌──────────────┐
│Authenticating│ │ Error │
└──────┬───────┘ └──────────────┘
│ session_info received ▲
▼ │
┌──────────────┐ │
│ Connected │───────────────────────┘
└──────┬───────┘ connection lost
│
▼
┌──────────────┐
│ Reconnecting │ (auto-reconnect enabled)
└──────────────┘
Event System
The SDK uses an event-driven architecture. All major components extend EventEmitter.
Core Events
| Event | Data | Description |
|---|---|---|
sessionConnected | SessionInfo | Successfully connected and authenticated |
disconnected | reason: string | Connection closed |
botResponse | BotResponse | AI text response received |
botVoice | BotVoice | AI voice audio chunk received |
sttResponse | SttResponse | Speech-to-text transcription |
interrupt | InterruptData | Conversation interrupted |
error | string | Error occurred |
connectionStateChanged | ConnectionState | Connection state changed |
Event Subscription Pattern
// Subscribe to events
character.on('connected', (session: SessionInfo) => {
print(`Connected with session: ${session.sessionId}`);
});
character.on('botResponse', (response: BotResponse) => {
if (response.isFinal) {
print(`AI said: ${response.text}`);
}
});
// Unsubscribe when done
character.off('botResponse', myHandler);
Data Models
SessionInfo
Returned when a connection is established:
interface SessionInfo {
sessionId: string; // Unique session identifier
conversationId: string; // Persists across sessions for same user
characterId: string; // The AI character ID
playerId: string; // Your user/player ID
}
BotResponse
AI text responses:
interface BotResponse {
text: string; // The response text
isFinal: boolean; // True if complete response
partial: boolean; // True if streaming chunk
messageId: string; // Unique message ID
chunkIndex: number; // Chunk number for streaming
isInterjection: boolean;// Proactive message (not reply)
}
BotVoice
AI voice audio data:
interface BotVoice {
audio: string; // Base64-encoded PCM16 audio
chunkIndex: number; // Audio chunk number
isFinal: boolean; // True if last chunk
sampleRate: number; // Audio sample rate (usually 24000)
}
SttResponse
Speech-to-text transcription:
interface SttResponse {
text: string; // Transcribed text
isFinal: boolean; // True if final transcription
confidence: number; // Confidence score (0-1)
}
Audio Pipeline
The SDK handles audio in both directions:
Voice Input (User to Server)
┌────────────────────┐ ┌───────────────────┐ ┌─────────────────┐
│ MicrophoneRecorder │--→ │ EstuaryMicrophone │--→ │ EstuaryClient │
│ (Float32) │ │ (PCM16 → Base64) │ │ (WebSocket) │
└────────────────────┘ └───────────────────┘ └─────────────────┘
- MicrophoneRecorder captures Float32 audio at 16kHz
- EstuaryMicrophone converts to PCM16 and Base64 encodes
- EstuaryClient streams to server via WebSocket
Voice Output (Server to User)
┌─────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ EstuaryClient │---→│ SimpleAutoConnect│---→│DynamicAudioOutput│
│ (WebSocket) │ │ (Base64 → PCM16) │ │ (AudioComp) │
└─────────────────┘ └──────────────────┘ └──────────────────┘
- EstuaryClient receives Base64 audio from server
- SimpleAutoConnect decodes to PCM16 bytes
- DynamicAudioOutput plays through AudioComponent
Audio Specifications
| Direction | Sample Rate | Format | Encoding |
|---|---|---|---|
| Input (STT) | 16,000 Hz | PCM16 | Base64 |
| Output (TTS) | 24,000 Hz | PCM16 | Base64 |
Singleton Pattern
Several SDK components use the singleton pattern for easy global access:
EstuaryCredentials
// Access from anywhere
if (EstuaryCredentials.hasInstance) {
const apiKey = EstuaryCredentials.instance.apiKey;
const characterId = EstuaryCredentials.instance.characterId;
}
EstuaryManager
// Central connection manager
EstuaryManager.instance.connect();
EstuaryManager.instance.sendText("Hello!");
EstuaryActions (Global Event System)
// Subscribe to actions from any script
EstuaryActions.on("wave", (action) => {
playWaveAnimation();
});
Configuration Options
The EstuaryConfig interface provides connection settings:
interface EstuaryConfig {
serverUrl: string; // Estuary server URL
apiKey?: string; // Your API key
characterId: string; // Character UUID
playerId: string; // User identifier
// Audio settings
recordingSampleRate?: number; // Default: 16000
playbackSampleRate?: number; // Default: 24000
audioChunkDurationMs?: number; // Default: 100
// Connection settings
autoReconnect?: boolean; // Default: true
maxReconnectAttempts?: number; // Default: 5
reconnectDelayMs?: number; // Default: 2000
// Debug
debugLogging?: boolean; // Default: false
}
Best Practices
Always Handle Disconnections
character.on('disconnected', () => {
// Clean up UI state
// Stop animations
// Show reconnecting indicator
});
character.on('error', (error) => {
// Log error for debugging
// Show user-friendly message
print(`Connection error: ${error}`);
});
Use Debug Mode During Development
// Enable in EstuaryCredentials
debugMode: true
Start Voice Session Before Streaming
character.on('connected', (session) => {
// MUST call this before streaming audio
character.startVoiceSession();
microphone.startRecording();
});
Handle Interrupts Gracefully
character.on('interrupt', () => {
// Stop current audio playback
dynamicAudioOutput.interruptAudioOutput();
// Reset any UI state
hideResponseIndicator();
});
Clean Up Resources
onDestroy() {
if (this.microphone) {
this.microphone.dispose();
}
if (this.character) {
this.character.dispose();
}
}
Next Steps
- Voice Connection - Implement voice conversations
- User Management - Handle user persistence
- Action System - React to AI actions