Voice Connection

Learn how to implement real-time voice conversations with AI characters using the Estuary SDK.

Overview

The Estuary SDK provides full duplex voice communication:

Voice Input: Capture user speech via microphone → Speech-to-Text (Deepgram)
Voice Output: AI responses → Text-to-Speech (ElevenLabs) → Audio playback

┌─────────────────────────────────────────────────────────────────┐
│                     Voice Conversation Flow                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  USER SPEAKS                                                    │
│  ┌──────────┐    ┌─────────────┐    ┌───────────────────────┐   │
│  │   Mic    │---→│ SDK Encodes │---→│    Estuary Server     │   │
│  │ (16kHz)  │    │  (Base64)   │    │  ┌─────────────────┐  │   │
│  └──────────┘    └─────────────┘    │  │ Deepgram (STT)  │  │   │
│                                     │  └────────┬────────┘  │   │
│                                     │           ▼           │   │
│                                     │  ┌─────────────────┐  │   │
│  AI RESPONDS                        │  │   AI Character  │  │   │
│  ┌──────────┐    ┌─────────────┐    │  └────────┬────────┘  │   │
│  │ Speaker  │←---│ SDK Decodes │←---│           ▼           │   │
│  │ (24kHz)  │    │  (Base64)   │    │  ┌─────────────────┐  │   │
│  └──────────┘    └─────────────┘    │  │ ElevenLabs TTS  │  │   │
│                                     │  └─────────────────┘  │   │
│                                     └───────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Quick Start with SimpleAutoConnect

The easiest way to implement voice is using the SimpleAutoConnect component:

Step 1: Scene Setup

Create these SceneObjects:

Object Name	Components
Estuary Credentials	`EstuaryCredentials` script
Estuary Connection	`SimpleAutoConnect` script
Microphone	`MicrophoneRecorder` script
Audio Output	`DynamicAudioOutput` script + `AudioComponent`
Internet Module	InternetModule resource

Step 2: Configure SimpleAutoConnect

In the Inspector, connect:

credentialsObject    → Estuary Credentials
internetModule       → Internet Module
microphoneRecorderObject → Microphone
dynamicAudioOutputObject → Audio Output

That's It!

SimpleAutoConnect handles:

Connection management
Microphone streaming
Voice Activity Detection (server-side via Deepgram)
Audio playback
Interrupt handling
Auto-reconnection

Manual Voice Implementation

For more control, you can implement voice manually.

Set Up the Character

import { EstuaryCharacter } from 'estuary-lens-studio-sdk';
import { EstuaryConfig } from 'estuary-lens-studio-sdk';
import { setInternetModule } from 'estuary-lens-studio-sdk';

@component
export class VoiceController extends BaseScriptComponent {
    
    @input
    internetModule: InternetModule;
    
    private character: EstuaryCharacter;
    
    onAwake() {
        // REQUIRED: Set up InternetModule for WebSocket
        setInternetModule(this.internetModule);
        
        // Create character
        this.character = new EstuaryCharacter(
            "your-character-id",
            "unique-player-id"
        );
        
        // Configure and connect
        const config: EstuaryConfig = {
            serverUrl: "wss://api.estuary-ai.com",
            apiKey: "your-api-key",
            characterId: "your-character-id",
            playerId: "unique-player-id",
            debugLogging: true
        };
        
        this.character.initialize(config);
    }
}

Set Up Microphone Input

import { EstuaryMicrophone, MicrophoneRecorder } from 'estuary-lens-studio-sdk';

@component
export class VoiceController extends BaseScriptComponent {
    
    @input
    microphoneRecorderObject: SceneObject;
    
    private microphone: EstuaryMicrophone;
    
    onAwake() {
        // ... character setup ...
        
        // Create microphone handler
        this.microphone = new EstuaryMicrophone(this.character);
        this.microphone.debugLogging = true;
        
        // Find MicrophoneRecorder on the SceneObject
        const recorder = this.findMicrophoneRecorder(this.microphoneRecorderObject);
        if (recorder) {
            this.microphone.setMicrophoneRecorder(recorder);
        }
        
        // Connect microphone to character
        this.character.microphone = this.microphone;
    }
    
    private findMicrophoneRecorder(obj: SceneObject): MicrophoneRecorder | null {
        const count = obj.getComponentCount("Component.ScriptComponent");
        for (let i = 0; i < count; i++) {
            const comp = obj.getComponentByIndex("Component.ScriptComponent", i) as any;
            if (comp?.onAudioFrame && typeof comp.startRecording === 'function') {
                return comp as MicrophoneRecorder;
            }
        }
        return null;
    }
}

Handle Voice Events

private setupEventHandlers() {
    // When connected, start voice session
    this.character.on('connected', (session) => {
        print(`Connected: ${session.sessionId}`);
        
        // IMPORTANT: Start voice session before streaming audio
        this.character.startVoiceSession();
        
        // Start microphone
        this.microphone.startRecording();
    });
    
    // Handle transcription (what user said)
    this.character.on('transcript', (stt) => {
        if (stt.isFinal) {
            print(`[You] ${stt.text}`);
        }
    });
    
    // Handle AI text response
    this.character.on('botResponse', (response) => {
        if (response.isFinal) {
            print(`[AI] ${response.text}`);
        }
    });
    
    // Handle AI voice response
    this.character.on('voiceReceived', (voice) => {
        // Play audio via DynamicAudioOutput
        this.playVoiceAudio(voice);
    });
    
    // Handle interrupts (user speaks while AI is talking)
    this.character.on('interrupt', () => {
        // Stop current audio playback
        this.stopAudioPlayback();
    });
}

Audio Playback

interface DynamicAudioOutput {
    initialize(sampleRate: number): void;
    addAudioFrame(uint8Array: Uint8Array, channels: number): void;
    interruptAudioOutput(): void;
}

private dynamicAudioOutput: DynamicAudioOutput;
private audioInitialized: boolean = false;

private setupAudioOutput(audioOutputObject: SceneObject) {
    // Find DynamicAudioOutput component
    const count = audioOutputObject.getComponentCount("Component.ScriptComponent");
    for (let i = 0; i < count; i++) {
        const comp = audioOutputObject.getComponentByIndex("Component.ScriptComponent", i) as any;
        if (comp?.initialize && comp?.addAudioFrame) {
            this.dynamicAudioOutput = comp;
            break;
        }
    }
    
    if (this.dynamicAudioOutput) {
        // Initialize with TTS sample rate (24kHz for ElevenLabs)
        this.dynamicAudioOutput.initialize(24000);
        this.audioInitialized = true;
    }
}

private playVoiceAudio(voice: BotVoice) {
    if (!this.dynamicAudioOutput || !voice.audio) return;
    
    // Decode Base64 to PCM bytes
    const pcmBytes = Base64.decode(voice.audio);
    
    // Play audio (mono = 1 channel)
    this.dynamicAudioOutput.addAudioFrame(pcmBytes, 1);
}

private stopAudioPlayback() {
    if (this.dynamicAudioOutput) {
        this.dynamicAudioOutput.interruptAudioOutput();
    }
}

Voice Session Management

Starting a Voice Session

You must start a voice session before streaming audio:

character.on('connected', (session) => {
    // This enables audio streaming to the server
    character.startVoiceSession();
    
    // Now you can start recording
    microphone.startRecording();
});

Ending a Voice Session

// Stop voice input (but keep connection open)
character.endVoiceSession();
microphone.stopRecording();

Voice Session State

// Check if voice session is active
if (character.isVoiceSessionActive) {
    // Audio streaming is enabled
}

Handling Interrupts

When the user speaks while the AI is responding, an interrupt is triggered:

character.on('interrupt', (data) => {
    // 1. Stop audio playback immediately
    dynamicAudioOutput.interruptAudioOutput();
    
    // 2. Clear any pending response text
    // 3. Update UI to show user is speaking
    
    print("User interrupted - stopping AI audio");
});

The server automatically:

Stops generating the current response
Clears the TTS queue
Starts processing the new user input

Text-Only Fallback

You can also send text messages without voice:

// Send text directly to the AI
character.sendText("Hello, how are you?");

// Listen for text response
character.on('botResponse', (response) => {
    if (response.isFinal) {
        print(`AI responded: ${response.text}`);
    }
});

Audio Format Details

Input Audio (Microphone to Server)

Property	Value
Sample Rate	16,000 Hz
Format	16-bit PCM (signed, little-endian)
Channels	Mono (1)
Encoding	Base64 string
Chunk Size	~100ms of audio

Output Audio (Server to Speaker)

Property	Value
Sample Rate	24,000 Hz (ElevenLabs default)
Format	16-bit PCM (signed, little-endian)
Channels	Mono (1)
Encoding	Base64 string

Troubleshooting

No Audio Being Sent

WARNING: Audio dropped: voice session not active! Call startVoiceSession() first.

Solution: Call character.startVoiceSession() after connection before recording.

No Audio Playback

Ensure DynamicAudioOutput has an AudioComponent attached
Verify an AudioTrack asset is assigned
Check initialize() was called with correct sample rate (24000)

Audio Cuts Out

The SDK includes throttling to prevent WebSocket buffer overflow. If audio still cuts out:

Check network stability
Reduce other network traffic in your Lens

WebSocket NOT available in Lens Studio Preview

This is expected - WebSocket only works on actual Spectacles hardware. Deploy to device for testing.

Best Practices

Always Initialize Audio Early

// Initialize audio output immediately after finding the component
dynamicAudioOutput.initialize(24000);

Handle Connection Loss

character.on('disconnected', () => {
    microphone.stopRecording();
    // Show UI indicator
});

character.on('error', (error) => {
    print(`Voice error: ${error}`);
    // Attempt recovery or show message
});

Provide Visual Feedback

Users benefit from knowing:

When their voice is being captured
When the AI is "thinking"
When the AI is speaking

Respect Voice Sessions

// Don't stream audio without an active session
if (character.isConnected && character.isVoiceSessionActive) {
    character.streamAudio(audioBase64);
}

Next Steps

User Management - Persist conversations across sessions
Action System - Trigger actions from voice responses
API Reference - Detailed component documentation

Overview​

Quick Start with SimpleAutoConnect​

Manual Voice Implementation​

Set Up the Character​

Set Up Microphone Input​

Handle Voice Events​

Audio Playback​

Voice Session Management​

Starting a Voice Session​

Ending a Voice Session​

Voice Session State​

Handling Interrupts​

Text-Only Fallback​

Audio Format Details​

Input Audio (Microphone to Server)​

Output Audio (Server to Speaker)​

Troubleshooting​

No Audio Being Sent​

No Audio Playback​

Audio Cuts Out​

WebSocket NOT available in Lens Studio Preview​

Best Practices​

Always Initialize Audio Early​

Handle Connection Loss​

Provide Visual Feedback​

Respect Voice Sessions​

Next Steps​