Skip to main content

Video Streaming

The EstuaryWebcam component streams camera video to the Estuary world model for spatial awareness. The character can see and describe the user's environment, detect objects, and track scene changes.

Streaming Modes

ModeTransportLatencyCodecPlatform
LiveKitWebRTC video trackLowVP8/H264Desktop, Android, iOS
WebSocketSocket.IO MJPEGHigherJPEGAll platforms

LiveKit is the preferred mode. If LiveKit is unavailable, the component can automatically fall back to WebSocket streaming.


Setup

Add the Component

  1. Create a new GameObject (e.g., "Webcam")
  2. Add the Estuary Webcam component
  3. Configure the settings:
FieldDefaultDescription
Stream ModeLiveKitLiveKit or WebSocket
Auto FallbacktrueFall back to WebSocket if LiveKit unavailable
Target Fps10Frames per second (lower = less bandwidth)
Target Width1280Capture resolution width
Target Height720Capture resolution height
Auto Start On ConnectfalseStart streaming when EstuaryManager connects
Auto Subscribe Scene GraphtrueSubscribe to scene graph updates

Starting and Stopping

// Start streaming with a session ID
webcam.StartStreaming(sessionId);

// Stop streaming
webcam.StopStreaming();

// Auto-start: set autoStartOnConnect = true in Inspector
// The component will start streaming when LiveKit is ready

LiveKit Video

In LiveKit mode, the component:

  1. Captures frames from WebCamTexture via DirectWebcamVideoSource
  2. Publishes a LiveKit video track to the shared room (same room as voice)
  3. Notifies the backend to subscribe to the video track (enable_livekit_video)

The backend processes frames through the world model pipeline: object detection, scene understanding, and graph construction.

// Access the webcam texture for preview
RawImage preview = GetComponent<RawImage>();
preview.texture = webcam.WebcamTexture;

WebSocket Video (Fallback)

In WebSocket mode, the component:

  1. Captures frames from WebCamTexture
  2. Encodes frames as JPEG
  3. Sends frames as base64-encoded video_frame events at the target FPS
// Force WebSocket mode
webcam.StreamMode = WebcamStreamMode.WebSocket;

Scene Graph

The world model builds a scene graph from the video feed. Subscribe to updates:

webcam.OnSceneGraphUpdated += (SceneGraph graph) =>
{
Debug.Log($"Scene: {graph.EntityCount} entities");
Debug.Log($"Location: {graph.LocationType}");
Debug.Log($"Activity: {graph.UserActivity}");

foreach (var entity in graph.Entities)
{
Debug.Log($" {entity.ClassName}: {entity.Label} at {entity.Position}");
}
};

webcam.OnRoomIdentified += (RoomIdentified room) =>
{
Debug.Log($"Room: {room.RoomName} ({room.Status})");
};

Manual Subscription

// Subscribe
await webcam.SubscribeToSceneGraphAsync();

// Unsubscribe
await webcam.UnsubscribeFromSceneGraphAsync();

Camera Capture (On-Demand)

For on-demand image capture (not continuous streaming), the server can request a camera image:

// The server sends camera_capture events when it wants an image
// (e.g., when the character detects vision intent in user speech)
// Handle this in your code to capture and send an image

The camera_image event payload:

{
"image": "<base64 JPEG>",
"mime_type": "image/jpeg",
"text": "What do you see?"
}

Device Pose (AR)

For AR applications, send device pose data alongside video:

// Enable in Inspector
webcam.SendPose = true;
webcam.CameraTransform = arCamera.transform;

// Or send manually
await webcam.SendPoseAsync(camera.transform.localToWorldMatrix);

Camera Selection

// List available cameras
var devices = webcam.AvailableDevices;
foreach (var device in devices)
{
Debug.Log($"{device.name} (front: {device.isFrontFacing})");
}

// Switch camera
webcam.SetDevice("HD Webcam");

// Use front-facing camera (mobile/AR)
webcam.UseFrontCamera = true;

Next Steps