Voice (WebSocket)
Add real-time voice conversations using the WebSocket voice transport. The SDK captures microphone audio, streams it to the server for speech-to-text, and plays back the character's voice response.
How It Works
┌───────────┐ PCM → Base64 ┌──────────────┐ stream_audio ┌────────────────┐
│ Microphone│──────────────────►│ VoiceManager │─────────────────►│ Estuary Server │
└───────────┘ └──────────────┘ └───────┬────────┘
│
┌───────────┐ decode + play ┌──────────────┐ bot_voice │
│ Speaker │◄──────────────────│ AudioPlayer │◄────────────────────────┘
└───────────┘ └──────────────┘
- The SDK requests microphone permission and captures audio at 16 kHz.
- Audio is encoded as base64 PCM and streamed to the server via
stream_audioWebSocket events. - The server runs speech-to-text (Deepgram) and emits
sttResponseevents with transcription results. - Once a final transcription is produced, the server processes it through the AI pipeline.
- The response arrives as
botResponse(text) andbotVoice(audio) events. - The built-in
AudioPlayerdecodes and plays the voice audio automatically.
Starting Voice
await client.connect();
// Start voice input -- requests microphone permission
await client.startVoice();
By default, the SDK uses voiceTransport: 'auto', which prefers LiveKit if livekit-client is installed, and falls back to WebSocket otherwise. To explicitly use WebSocket:
const client = new EstuaryClient({
serverUrl: 'https://api.estuary-ai.com',
apiKey: 'est_your_api_key',
characterId: 'your-character-uuid',
playerId: 'user-123',
voiceTransport: 'websocket',
});
startVoice() requires a browser environment with getUserMedia support. It throws EstuaryError with code MICROPHONE_DENIED if the user denies microphone permission, or VOICE_NOT_SUPPORTED if no voice transport is available.
Speech-to-Text Events
As the user speaks, the server streams back transcription results:
client.on('sttResponse', (response) => {
if (response.isFinal) {
console.log('User said:', response.text);
} else {
// Interim transcription -- useful for showing live captions
console.log('Hearing:', response.text);
}
});
After a final sttResponse, the server automatically triggers the AI pipeline. The response flows through the same botResponse and botVoice events used by text chat.
Bot Voice Audio
Voice audio arrives as base64-encoded PCM chunks via the botVoice event:
client.on('botVoice', (voice) => {
console.log(`Audio chunk ${voice.chunkIndex} for message ${voice.messageId}`);
if (voice.isFinal) {
console.log('Last audio chunk received');
}
});
In a browser environment, the SDK's built-in AudioPlayer automatically decodes and plays these chunks. You do not need to handle playback manually unless you want custom audio processing.
Playback Events
Track audio playback state with these events:
client.on('audioPlaybackStarted', (messageId) => {
console.log('Started playing audio for:', messageId);
});
client.on('audioPlaybackComplete', (messageId) => {
console.log('Finished playing audio for:', messageId);
// The SDK automatically notifies the server via audio_playback_complete
});
The audioPlaybackComplete notification tells the server that the client has finished playing the audio, which allows the server to track conversation pacing.
Stopping Voice
client.stopVoice();
This stops the microphone, cleans up the voice manager, and emits a voiceStopped event. The client remains connected for text chat.
Muting
Toggle the microphone without stopping the voice session:
client.toggleMute();
console.log('Muted:', client.isMuted);
When muted, the microphone stream is paused but the voice session remains active. The character's voice responses continue to play.
toggleMute() throws EstuaryError with code VOICE_NOT_ACTIVE if voice has not been started.
Voice Lifecycle Events
client.on('voiceStarted', () => {
console.log('Voice session started');
showMicrophoneIndicator();
});
client.on('voiceStopped', () => {
console.log('Voice session stopped');
hideMicrophoneIndicator();
});
Example: Voice Chat
import { EstuaryClient } from '@estuary-ai/sdk';
const client = new EstuaryClient({
serverUrl: 'https://api.estuary-ai.com',
apiKey: 'est_your_api_key',
characterId: 'your-character-uuid',
playerId: 'user-123',
voiceTransport: 'websocket',
});
client.on('sttResponse', (response) => {
if (response.isFinal) {
console.log('You:', response.text);
}
});
client.on('botResponse', (response) => {
if (response.isFinal) {
console.log('Bot:', response.text);
}
});
client.on('error', (err) => {
console.error('Error:', err.message);
});
async function main() {
await client.connect();
console.log('Connected! Starting voice...');
await client.startVoice();
console.log('Listening -- speak into your microphone.');
}
main().catch(console.error);
Next Steps
- Voice (LiveKit) -- Lower latency voice with WebRTC
- Text Chat -- Send text alongside voice
- API Reference: EstuaryClient -- Full method reference