Voice Connection
Learn how to implement real-time voice conversations with AI characters using the Estuary SDK.
Overview
The Estuary SDK provides full duplex voice communication:
- Voice Input: Capture user speech via microphone → Speech-to-Text (Deepgram)
- Voice Output: AI responses → Text-to-Speech (ElevenLabs) → Audio playback
┌─────────────────────────────────────────────────────────────────┐
│ Voice Conversation Flow │
├─────────────────────────────────────────────────────────────────┤
│ │
│ USER SPEAKS │
│ ┌──────────┐ ┌─────────────┐ ┌───────────────────────┐ │
│ │ Mic │---→│ SDK Encodes │---→│ Estuary Server │ │
│ │ (16kHz) │ │ (Base64) │ │ ┌─────────────────┐ │ │
│ └──────────┘ └─────────────┘ │ │ Deepgram (STT) │ │ │
│ │ └────────┬────────┘ │ │
│ │ ▼ │ │
│ │ ┌─────────────────┐ │ │
│ AI RESPONDS │ │ AI Character │ │ │
│ ┌──────────┐ ┌─────────────┐ │ └────────┬────────┘ │ │
│ │ Speaker │←---│ SDK Decodes │←---│ ▼ │ │
│ │ (24kHz) │ │ (Base64) │ │ ┌─────────────────┐ │ │
│ └──────────┘ └─────────────┘ │ │ ElevenLabs TTS │ │ │
│ │ └─────────────────┘ │ │
│ └───────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Quick Start with SimpleAutoConnect
The easiest way to implement voice is using the SimpleAutoConnect component:
Step 1: Scene Setup
Create these SceneObjects:
| Object Name | Components |
|---|---|
| Estuary Credentials | EstuaryCredentials script |
| Estuary Connection | SimpleAutoConnect script |
| Microphone | MicrophoneRecorder script |
| Audio Output | DynamicAudioOutput script + AudioComponent |
| Internet Module | InternetModule resource |
Step 2: Configure SimpleAutoConnect
In the Inspector, connect:
credentialsObject → Estuary Credentials
internetModule → Internet Module
microphoneRecorderObject → Microphone
dynamicAudioOutputObject → Audio Output
That's It!
SimpleAutoConnect handles:
- Connection management
- Microphone streaming
- Voice Activity Detection (server-side via Deepgram)
- Audio playback
- Interrupt handling
- Auto-reconnection
Manual Voice Implementation
For more control, you can implement voice manually.
Set Up the Character
import { EstuaryCharacter } from 'estuary-lens-studio-sdk';
import { EstuaryConfig } from 'estuary-lens-studio-sdk';
import { setInternetModule } from 'estuary-lens-studio-sdk';
@component
export class VoiceController extends BaseScriptComponent {
@input
internetModule: InternetModule;
private character: EstuaryCharacter;
onAwake() {
// REQUIRED: Set up InternetModule for WebSocket
setInternetModule(this.internetModule);
// Create character
this.character = new EstuaryCharacter(
"your-character-id",
"unique-player-id"
);
// Configure and connect
const config: EstuaryConfig = {
serverUrl: "wss://api.estuary-ai.com",
apiKey: "your-api-key",
characterId: "your-character-id",
playerId: "unique-player-id",
debugLogging: true
};
this.character.initialize(config);
}
}
Set Up Microphone Input
import { EstuaryMicrophone, MicrophoneRecorder } from 'estuary-lens-studio-sdk';
@component
export class VoiceController extends BaseScriptComponent {
@input
microphoneRecorderObject: SceneObject;
private microphone: EstuaryMicrophone;
onAwake() {
// ... character setup ...
// Create microphone handler
this.microphone = new EstuaryMicrophone(this.character);
this.microphone.debugLogging = true;
// Find MicrophoneRecorder on the SceneObject
const recorder = this.findMicrophoneRecorder(this.microphoneRecorderObject);
if (recorder) {
this.microphone.setMicrophoneRecorder(recorder);
}
// Connect microphone to character
this.character.microphone = this.microphone;
}
private findMicrophoneRecorder(obj: SceneObject): MicrophoneRecorder | null {
const count = obj.getComponentCount("Component.ScriptComponent");
for (let i = 0; i < count; i++) {
const comp = obj.getComponentByIndex("Component.ScriptComponent", i) as any;
if (comp?.onAudioFrame && typeof comp.startRecording === 'function') {
return comp as MicrophoneRecorder;
}
}
return null;
}
}
Handle Voice Events
private setupEventHandlers() {
// When connected, start voice session
this.character.on('connected', (session) => {
print(`Connected: ${session.sessionId}`);
// IMPORTANT: Start voice session before streaming audio
this.character.startVoiceSession();
// Start microphone
this.microphone.startRecording();
});
// Handle transcription (what user said)
this.character.on('transcript', (stt) => {
if (stt.isFinal) {
print(`[You] ${stt.text}`);
}
});
// Handle AI text response
this.character.on('botResponse', (response) => {
if (response.isFinal) {
print(`[AI] ${response.text}`);
}
});
// Handle AI voice response
this.character.on('voiceReceived', (voice) => {
// Play audio via DynamicAudioOutput
this.playVoiceAudio(voice);
});
// Handle interrupts (user speaks while AI is talking)
this.character.on('interrupt', () => {
// Stop current audio playback
this.stopAudioPlayback();
});
}
Audio Playback
interface DynamicAudioOutput {
initialize(sampleRate: number): void;
addAudioFrame(uint8Array: Uint8Array, channels: number): void;
interruptAudioOutput(): void;
}
private dynamicAudioOutput: DynamicAudioOutput;
private audioInitialized: boolean = false;
private setupAudioOutput(audioOutputObject: SceneObject) {
// Find DynamicAudioOutput component
const count = audioOutputObject.getComponentCount("Component.ScriptComponent");
for (let i = 0; i < count; i++) {
const comp = audioOutputObject.getComponentByIndex("Component.ScriptComponent", i) as any;
if (comp?.initialize && comp?.addAudioFrame) {
this.dynamicAudioOutput = comp;
break;
}
}
if (this.dynamicAudioOutput) {
// Initialize with TTS sample rate (24kHz for ElevenLabs)
this.dynamicAudioOutput.initialize(24000);
this.audioInitialized = true;
}
}
private playVoiceAudio(voice: BotVoice) {
if (!this.dynamicAudioOutput || !voice.audio) return;
// Decode Base64 to PCM bytes
const pcmBytes = Base64.decode(voice.audio);
// Play audio (mono = 1 channel)
this.dynamicAudioOutput.addAudioFrame(pcmBytes, 1);
}
private stopAudioPlayback() {
if (this.dynamicAudioOutput) {
this.dynamicAudioOutput.interruptAudioOutput();
}
}
Voice Session Management
Starting a Voice Session
You must start a voice session before streaming audio:
character.on('connected', (session) => {
// This enables audio streaming to the server
character.startVoiceSession();
// Now you can start recording
microphone.startRecording();
});
Ending a Voice Session
// Stop voice input (but keep connection open)
character.endVoiceSession();
microphone.stopRecording();
Voice Session State
// Check if voice session is active
if (character.isVoiceSessionActive) {
// Audio streaming is enabled
}
Handling Interrupts
When the user speaks while the AI is responding, an interrupt is triggered:
character.on('interrupt', (data) => {
// 1. Stop audio playback immediately
dynamicAudioOutput.interruptAudioOutput();
// 2. Clear any pending response text
// 3. Update UI to show user is speaking
print("User interrupted - stopping AI audio");
});
The server automatically:
- Stops generating the current response
- Clears the TTS queue
- Starts processing the new user input
Text-Only Fallback
You can also send text messages without voice:
// Send text directly to the AI
character.sendText("Hello, how are you?");
// Listen for text response
character.on('botResponse', (response) => {
if (response.isFinal) {
print(`AI responded: ${response.text}`);
}
});
Audio Format Details
Input Audio (Microphone to Server)
| Property | Value |
|---|---|
| Sample Rate | 16,000 Hz |
| Format | 16-bit PCM (signed, little-endian) |
| Channels | Mono (1) |
| Encoding | Base64 string |
| Chunk Size | ~100ms of audio |
Output Audio (Server to Speaker)
| Property | Value |
|---|---|
| Sample Rate | 24,000 Hz (ElevenLabs default) |
| Format | 16-bit PCM (signed, little-endian) |
| Channels | Mono (1) |
| Encoding | Base64 string |
Troubleshooting
No Audio Being Sent
WARNING: Audio dropped: voice session not active! Call startVoiceSession() first.
Solution: Call character.startVoiceSession() after connection before recording.
No Audio Playback
- Ensure DynamicAudioOutput has an
AudioComponentattached - Verify an
AudioTrackasset is assigned - Check
initialize()was called with correct sample rate (24000)
Audio Cuts Out
The SDK includes throttling to prevent WebSocket buffer overflow. If audio still cuts out:
- Check network stability
- Reduce other network traffic in your Lens
WebSocket NOT available in Lens Studio Preview
This is expected - WebSocket only works on actual Spectacles hardware. Deploy to device for testing.
Best Practices
Always Initialize Audio Early
// Initialize audio output immediately after finding the component
dynamicAudioOutput.initialize(24000);
Handle Connection Loss
character.on('disconnected', () => {
microphone.stopRecording();
// Show UI indicator
});
character.on('error', (error) => {
print(`Voice error: ${error}`);
// Attempt recovery or show message
});
Provide Visual Feedback
Users benefit from knowing:
- When their voice is being captured
- When the AI is "thinking"
- When the AI is speaking
Respect Voice Sessions
// Don't stream audio without an active session
if (character.isConnected && character.isVoiceSessionActive) {
character.streamAudio(audioBase64);
}
Next Steps
- User Management - Persist conversations across sessions
- Action System - Trigger actions from voice responses
- API Reference - Detailed component documentation