Skip to main content

Input Components

EstuaryMicrophone

MonoBehaviour -- Captures microphone audio and sends it to the server. Supports both WebSocket (Unity Microphone API) and LiveKit (WebRTC RtcAudioSource) modes.

Namespace: Estuary

Inspector Fields

FieldTypeDefaultDescription
SampleRateint16000Recording sample rate in Hz
ChunkDurationMsint100Audio chunk size in milliseconds
PushToTalkKeyKeyCodeNoneKey for push-to-talk. None = always-on
UseVoiceActivityDetectionboolfalseEnable client-side VAD (WebSocket mode)
VadThresholdfloat0.5VAD sensitivity: 0 = most sensitive, 1 = least sensitive

Properties

PropertyTypeAccessDescription
IsRecordingboolgettrue while the microphone is capturing
IsMutedboolgettrue when muted (LiveKit mode)
CurrentVolumefloatgetCurrent audio volume level (0--1)
IsSpeechDetectedboolgettrue when VAD detects speech

Methods

MethodReturnsDescription
StartRecording()voidStart capturing audio
StopRecording()voidStop capturing audio
Configure(int sampleRate, int chunkDurationMs)voidReconfigure audio settings
Mute()voidMute the microphone (LiveKit: mutes the track)
Unmute()voidUnmute the microphone

Events

EventSignatureDescription
OnRecordingStartedActionMicrophone started capturing
OnRecordingStoppedActionMicrophone stopped capturing
OnVolumeChangedAction<float>Volume level changed (0--1)
OnSpeechDetectedActionUser started speaking (VAD)
OnSilenceDetectedActionUser stopped speaking (VAD)

WebSocket vs LiveKit Behavior

BehaviorWebSocket ModeLiveKit Mode
CaptureUnity Microphone.Start()LiveKit RtcAudioSource
TransportBase64 PCM over Socket.IOWebRTC audio track
AECNone (manual echo handling)Native platform AEC
MuteStops sending chunksMutes the WebRTC track
VADClient-side amplitude checkServer-side (Deepgram); client VAD used for interrupt detection

Push-to-Talk

When PushToTalkKey is set to a key other than None:

  • WebSocket mode: Audio is only captured and sent while the key is held
  • LiveKit mode: The microphone track is muted when the key is released and unmuted when pressed

EstuaryWebcam

MonoBehaviour -- Streams camera video to the Estuary world model for spatial awareness and scene understanding.

Namespace: Estuary

Inspector Fields

FieldTypeDefaultDescription
StreamModeWebcamStreamModeLiveKitLiveKit (WebRTC) or WebSocket (MJPEG)
AutoFallbackbooltrueFall back to WebSocket if LiveKit is unavailable
TargetFpsint10Capture frame rate
TargetWidthint1280Capture width in pixels
TargetHeightint720Capture height in pixels
AutoStartOnConnectboolfalseStart streaming when the connection is ready
AutoSubscribeSceneGraphbooltrueSubscribe to scene graph updates automatically
SendPoseboolfalseSend device pose data (AR applications)
CameraTransformTransformCamera transform for pose data
UseFrontCameraboolfalsePrefer front-facing camera (mobile/AR)

Properties

PropertyTypeAccessDescription
IsStreamingboolgettrue while video is being streamed
WebcamTextureWebCamTexturegetThe active webcam texture (for preview rendering)
AvailableDevicesWebCamDevice[]getList of available camera devices
CurrentSceneGraphSceneGraphgetMost recent scene graph

Methods

MethodReturnsDescription
StartStreaming(string sessionId)voidBegin streaming video
StopStreaming()voidStop streaming video
SetDevice(string deviceName)voidSwitch to a specific camera by name
SubscribeToSceneGraphAsync()TaskSubscribe to scene graph updates
UnsubscribeFromSceneGraphAsync()TaskUnsubscribe from scene graph updates
SendPoseAsync(Matrix4x4 localToWorld)TaskSend a device pose matrix

Events

EventSignatureDescription
OnSceneGraphUpdatedAction<SceneGraph>Scene graph update received
OnRoomIdentifiedAction<RoomIdentified>Room identification result received
OnStreamingStartedActionVideo streaming started
OnStreamingStoppedActionVideo streaming stopped

Streaming Modes

LiveKit mode:

  1. Captures frames from WebCamTexture via DirectWebcamVideoSource
  2. Publishes a LiveKit video track to the shared room (same room as voice)
  3. Notifies the backend to subscribe to the video track (enable_livekit_video)

WebSocket mode:

  1. Captures frames from WebCamTexture
  2. Encodes frames as JPEG at TargetFps
  3. Sends frames as base64-encoded video_frame events over Socket.IO

Scene Graph Updates

When subscribed, the backend processes video frames through its world model pipeline (object detection, scene understanding, spatial reasoning) and returns structured SceneGraph data. See Data Models for the full SceneGraph schema.