Voice (WebSocket)

Add real-time voice conversations using the WebSocket voice transport. The SDK captures microphone audio, streams it to the server for speech-to-text, and plays back the character's voice response.

How It Works

The SDK requests microphone permission and captures audio at the configured audioSampleRate (default 24 kHz).
Audio is encoded as base64 PCM and streamed to the server via stream_audio WebSocket events.
The server runs speech-to-text and emits sttResponse events with transcription results.
Once a final transcription is produced, the server processes it through the AI pipeline.
The response arrives as botResponse (text) and botVoice (audio) events.
The built-in AudioPlayer decodes and plays the voice audio automatically.

Starting Voice

await client.connect();

// Start voice input -- requests microphone permission
await client.startVoice();

By default, the SDK uses voiceTransport: 'auto', which prefers LiveKit if livekit-client is installed, and falls back to WebSocket otherwise. To explicitly use WebSocket:

const client = new EstuaryClient({
  serverUrl: 'https://api.estuary-ai.com',
  apiKey: 'est_your_api_key',
  characterId: 'your-character-uuid',
  playerId: 'user-123',
  voiceTransport: 'websocket',
});

warning

startVoice() requires a browser environment with getUserMedia support. It throws EstuaryError with code MICROPHONE_DENIED if the user denies microphone permission, or VOICE_NOT_SUPPORTED if no voice transport is available.

Speech-to-Text Events

As the user speaks, the server streams back transcription results:

client.on('sttResponse', (response) => {
  if (response.isFinal) {
    console.log('User said:', response.text);
  } else {
    // Interim transcription -- useful for showing live captions
    console.log('Hearing:', response.text);
  }
});

After a final sttResponse, the server automatically triggers the AI pipeline. The response flows through the same botResponse and botVoice events used by text chat.

Bot Voice Audio

Voice audio arrives as base64-encoded PCM chunks via the botVoice event:

client.on('botVoice', (voice) => {
  console.log(`Audio chunk ${voice.chunkIndex} for message ${voice.messageId}`);
  if (voice.isFinal) {
    console.log('Last audio chunk received');
  }
});

In a browser environment, the SDK's built-in AudioPlayer automatically decodes and plays these chunks. You do not need to handle playback manually unless you want custom audio processing.

Playback Events

Track audio playback state with these events:

client.on('audioPlaybackStarted', (messageId) => {
  console.log('Started playing audio for:', messageId);
});

client.on('audioPlaybackComplete', (messageId) => {
  console.log('Finished playing audio for:', messageId);
  // The SDK automatically notifies the server via audio_playback_complete
});

The audioPlaybackComplete notification tells the server that the client has finished playing the audio, which allows the server to track conversation pacing.

Stopping Voice

await client.stopVoice();

This stops the microphone, cleans up the voice manager, and emits a voiceStopped event. The client remains connected for text chat. stopVoice() returns a Promise<void> -- await it if you need to know the voice manager has fully shut down before continuing.

Muting

Toggle the microphone without stopping the voice session:

client.toggleMute();
console.log('Muted:', client.isMuted);

When muted, the microphone stream is paused but the voice session remains active. The character's voice responses continue to play.

info

toggleMute() throws EstuaryError with code VOICE_NOT_ACTIVE if voice has not been started.

Voice Lifecycle Events

client.on('voiceStarted', () => {
  console.log('Voice session started');
  showMicrophoneIndicator();
});

client.on('voiceStopped', () => {
  console.log('Voice session stopped');
  hideMicrophoneIndicator();
});

Example: Voice Chat

import { EstuaryClient } from '@estuary-ai/sdk';

const client = new EstuaryClient({
  serverUrl: 'https://api.estuary-ai.com',
  apiKey: 'est_your_api_key',
  characterId: 'your-character-uuid',
  playerId: 'user-123',
  voiceTransport: 'websocket',
});

client.on('sttResponse', (response) => {
  if (response.isFinal) {
    console.log('You:', response.text);
  }
});

client.on('botResponse', (response) => {
  if (response.isFinal) {
    console.log('Bot:', response.text);
  }
});

client.on('error', (err) => {
  console.error('Error:', err.message);
});

async function main() {
  await client.connect();
  console.log('Connected! Starting voice...');

  await client.startVoice();
  console.log('Listening -- speak into your microphone.');
}

main().catch(console.error);

Next Steps

Voice (LiveKit) -- Lower latency voice with WebRTC
Text Chat -- Send text alongside voice
API Reference: EstuaryClient -- Full method reference

How It Works​

Starting Voice​

Speech-to-Text Events​

Bot Voice Audio​

Playback Events​

Stopping Voice​

Muting​

Voice Lifecycle Events​

Example: Voice Chat​

Next Steps​