Platform Overview

Estuary is a real-time AI conversation platform that gives characters persistent memory, voice, vision, and the ability to take actions in your application. This page describes the capabilities that all Estuary SDKs interact with.

Core Capabilities

Persistent Memory

Every conversation is remembered. The memory system tracks:

Facts and preferences -- What the character knows about each user
Events and relationships -- What happened and who's connected
Character self-knowledge -- The character's own evolving identity
Core facts -- Key structured information (name, location, interests) always in context
Entities -- People, places, and things mentioned across conversations

See Memory System for details.

Real-Time Voice

Estuary supports two voice transports:

WebSocket voice -- Audio streamed as base64-encoded PCM over Socket.IO. Works everywhere, higher latency.
LiveKit voice -- Audio streamed via WebRTC. Low latency with acoustic echo cancellation (AEC). Preferred when available.

Both modes use the same server-side pipeline: speech-to-text, LLM response generation, and text-to-speech.

Action System

Characters can trigger actions in your application by embedding XML tags in their responses:

<action name="wave" />
<action name="navigate" target="kitchen" />

SDKs parse these tags and dispatch them to your application code. See Action Protocol for details.

How SDKs Connect

All Estuary SDKs communicate with the gateway over two channels:

Socket.IO (Primary)

Socket.IO v4 over WebSocket is the primary transport. SDKs connect to the /sdk namespace with authentication credentials and exchange events for text, voice, vision, and control signals.

Client                          Server
  │                               │
  │──── connect /sdk (auth) ─────>│
  │<──── session_info ────────────│
  │                               │
  │──── text {text} ─────────────>│
  │<──── bot_response (stream) ───│
  │<──── bot_voice (stream) ──────│
  │                               │
  │──── start_voice ─────────────>│
  │──── stream_audio ────────────>│
  │<──── stt_response ────────────│
  │                               │

REST API

The REST API provides access to configuration and data that does not require real-time streaming:

Character management
Conversation history
Memory and knowledge graph queries
Analytics and usage data

REST endpoints use the X-API-Key header for authentication.

Next Steps

Authentication -- How API keys and session handshakes work
Conversation Protocol -- Full Socket.IO event reference
Memory System -- How memory works across conversations
Action Protocol -- Triggering application actions from AI responses

Core Capabilities​

Persistent Memory​

Real-Time Voice​

Action System​

How SDKs Connect​

Socket.IO (Primary)​

REST API​

Next Steps​