Platform Overview
Estuary is a real-time AI conversation platform that gives characters persistent memory, voice, vision, and the ability to take actions in your application. This page describes the capabilities that all Estuary SDKs interact with.
Core Capabilities
Persistent Memory
Every conversation is remembered. The memory system tracks:
- Facts and preferences -- What the character knows about each user
- Events and relationships -- What happened and who's connected
- Character self-knowledge -- The character's own evolving identity
- Core facts -- Key structured information (name, location, interests) always in context
- Entities -- People, places, and things mentioned across conversations
See Memory System for details.
Real-Time Voice
Estuary supports two voice transports:
- WebSocket voice -- Audio streamed as base64-encoded PCM over Socket.IO. Works everywhere, higher latency.
- LiveKit voice -- Audio streamed via WebRTC. Low latency with acoustic echo cancellation (AEC). Preferred when available.
Both modes use the same server-side pipeline: speech-to-text, LLM response generation, and text-to-speech.
Action System
Characters can trigger actions in your application by embedding XML tags in their responses:
<action name="wave" />
<action name="navigate" target="kitchen" />
SDKs parse these tags and dispatch them to your application code. See Action Protocol for details.
How SDKs Connect
All Estuary SDKs communicate with the gateway over two channels:
Socket.IO (Primary)
Socket.IO v4 over WebSocket is the primary transport. SDKs connect to the /sdk namespace with authentication credentials and exchange events for text, voice, vision, and control signals.
Client Server
│ │
│──── connect /sdk (auth) ─────>│
│<──── session_info ────────────│
│ │
│──── text {text} ─────────────>│
│<──── bot_response (stream) ───│
│<──── bot_voice (stream) ──────│
│ │
│──── start_voice ─────────────>│
│──── stream_audio ────────────>│
│<──── stt_response ────────────│
│ │
REST API
The REST API provides access to configuration and data that does not require real-time streaming:
- Character management
- Conversation history
- Memory and knowledge graph queries
- Analytics and usage data
REST endpoints use the X-API-Key header for authentication.
Next Steps
- Authentication -- How API keys and session handshakes work
- Conversation Protocol -- Full Socket.IO event reference
- Memory System -- How memory works across conversations
- Action Protocol -- Triggering application actions from AI responses