Camera Module

Enable AI vision capabilities by capturing and analyzing images from the device camera.

Overview

The Camera Module allows your AI characters to "see" what the user is looking at through the device camera. This enables powerful multimodal interactions:

Visual identification - "What breed of dog is this?"
Object analysis - "Is this fruit ripe enough to eat?"
Scene understanding - "What do you think of this painting?"
Reading assistance - "Can you read this sign for me?"

┌──────────────────────────────────────────────────────────────────────┐
│                      Camera Module Flow                              │
├──────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  User: "Hey, what do you think of this vase I'm looking at?"         │
│                                                                      │
│  ┌──────────────────┐    ┌──────────────────────────────┐            │
│  │ Server AI Agent  │──→ │ LLM decides visual context   │            │
│  │ (Function Call)  │    │ is needed via tool use        │            │
│  └──────────────────┘    └───────────────┬──────────────┘            │
│                                          │                           │
│                                          ▼                           │
│  ┌──────────────────┐    ┌──────────────────────────────┐            │
│  │ cameraCaptureReq │ ←──│ Server sends capture request │            │
│  │ Event to Client  │    │ to client SDK                │            │
│  └──────────────────┘    └───────────────┬──────────────┘            │
│                                          │                           │
│                                          ▼                           │
│  ┌──────────────────┐    ┌──────────────────────────────┐            │
│  │ EstuaryCamera    │──→ │ Captures image using         │            │
│  │ (Example)        │    │ Spectacles CameraModule      │            │
│  └──────────────────┘    └───────────────┬──────────────┘            │
│                                          │                           │
│                                          ▼                           │
│  ┌──────────────────┐    ┌──────────────────────────────┐            │
│  │ EstuaryManager   │──→ │ Sends Base64 image to        │            │
│  │ .sendCameraImage │    │ server for AI analysis       │            │
│  └──────────────────┘    └───────────────┬──────────────┘            │
│                                          │                           │
│                                          ▼                           │
│  ┌──────────────────────────────────────────────────────────────┐    │
│  │  AI: "That's a beautiful ceramic vase with blue floral       │    │
│  │       patterns! It appears to be hand-painted..."            │    │
│  └──────────────────────────────────────────────────────────────┘    │
│                                                                      │
└──────────────────────────────────────────────────────────────────────┘

How It Works

Vision in the Estuary SDK is server-driven. The server's AI agent has a camera_capture tool available to it. When the AI determines — based on conversation context — that it needs visual information, it invokes this tool. This works for both explicit and implicit visual requests:

Explicit requests:

"What am I looking at?"
"Describe what you see"
"Take a picture and tell me about it"

Implicit visual context:

"Hey what do you think of this vase I'm looking at?"
"Can you help me identify this plant?"
"Is this ripe enough to eat?"

The flow is:

The server AI detects a need for visual context via its LLM function calling system
Server sends a cameraCaptureRequest event to the client
The client SDK captures an image from the device camera
The client sends the image back via sendCameraImage()
The AI analyzes the image and responds via botResponse and botVoice events

No client-side intent detection is needed — the server handles all visual intent analysis.

Quick Start

Prerequisites

Estuary SDK set up with EstuaryVoiceConnection or EstuaryManager
EstuaryCredentials configured
Spectacles hardware (CameraModule is device-only)
Extended Permissions enabled in Project Settings (for development)

Step 1: Add EstuaryCamera Component

The EstuaryCamera example component handles camera capture. Copy it from the SDK's Examples/ folder to your project:

Examples/EstuaryCamera.ts

Then add it to a SceneObject in your scene.

Step 2: Configure Settings

EstuaryCamera (camera capture):

Setting	Default	Description
`captureResolution`	512	Image resolution (smaller dimension in pixels)
`enableVisionAcknowledgment`	true	Character says acknowledgment before analyzing
`debugMode`	true	Enable debug logging

EstuaryCamera Example

The EstuaryCamera example component (in Examples/EstuaryCamera.ts) handles the actual image capture on Spectacles hardware. It automatically listens for cameraCaptureRequest events from the server and captures images in response.

Setup in Lens Studio

Copy Examples/EstuaryCamera.ts to your project
Create a SceneObject in your scene
Add the EstuaryCamera script as a component
Configure settings in the Inspector

Inspector Properties

Property	Type	Default	Description
`debugMode`	boolean	true	Enable debug logging
`captureResolution`	number	512	Camera resolution (smaller dimension)
`enableVisionAcknowledgment`	boolean	true	AI says acknowledgment before analyzing

Vision Acknowledgment

When enableVisionAcknowledgment is enabled, the AI character will say a brief phrase (e.g., "Let me take a look!") immediately when the camera triggers. This provides instant feedback while the image is being captured and processed.

Resolution Guidelines

Resolution	Quality	Use Case
256	Low	Fastest transfer, basic recognition
512	Good	Recommended - balanced quality/speed
768	High	Detailed analysis needs
1024	Very High	Maximum detail, slower transfer

Manual Capture

You can trigger a capture programmatically:

// Get reference to EstuaryCamera component
const estuaryCamera = /* your EstuaryCamera instance */;

// Trigger manual capture
estuaryCamera.manualCapture("What do you see?");

Integration Flow

Complete Event Flow

// 1. User speaks (handled by EstuaryVoiceConnection)
// "What do you think of this vase?"

// 2. Server AI decides it needs visual context
// (handled automatically by server-side LLM function calling)

// 3. Server sends cameraCaptureRequest to client
character.on('cameraCaptureRequest', (request: CameraCaptureRequest) => {
    print(`Server wants to see: ${request.request_id}`);
    // EstuaryCamera handles this automatically if added to scene
});

// 4. EstuaryCamera captures and sends image
// (automatic when subscribed to cameraCaptureRequest)

// 5. AI responds with image analysis
character.on('botResponse', (response) => {
    // "That's a beautiful ceramic vase with blue patterns..."
});

Event Flow Diagram

User Speaks → Server AI Processing → LLM Function Call
                                          │
                                          ▼
                         ┌────────────────────────────┐
                         │ Server sends               │
                         │ cameraCaptureRequest event  │
                         └──────────┬─────────────────┘
                                    │
                                    ▼
                         ┌────────────────────────────┐
                         │ EstuaryCamera Component    │
                         │ captures image             │
                         └──────────┬─────────────────┘
                                    │
                                    ▼
                         ┌────────────────────────────┐
                         │ Encode to Base64           │
                         └──────────┬─────────────────┘
                                    │
                                    ▼
                         ┌────────────────────────────┐
                         │ sendCameraImage()          │
                         └──────────┬─────────────────┘
                                    │
                                    ▼
                         ┌────────────────────────────┐
                         │ Server AI Analysis         │
                         │ → Bot Response + Voice     │
                         └────────────────────────────┘

EstuaryManager Camera Methods

The EstuaryManager provides methods for camera integration:

sendCameraImage

Send a captured image to the server for AI analysis:

import { EstuaryManager } from 'estuary-lens-studio-sdk';

EstuaryManager.instance.sendCameraImage(
    imageBase64,      // Base64-encoded image data
    'image/jpeg',     // MIME type
    requestId,        // Optional: request ID for correlation
    text,             // Optional: context text
    24000             // Optional: TTS sample rate
);

Practical Examples

Basic Setup

The camera system works automatically when you add the example components to your scene:

Add EstuaryVoiceConnection (from Examples/EstuaryVoiceConnection.ts)
Add EstuaryCamera (from Examples/EstuaryCamera.ts)

That's it! The components integrate automatically:

The server AI decides when it needs to see something
It sends a cameraCaptureRequest event to the client
EstuaryCamera listens for this event and captures the image
The captured image is sent back to the server for analysis

Manual Camera Capture

Trigger a capture programmatically from EstuaryCamera:

// Get reference to EstuaryCamera component
const estuaryCamera = /* your EstuaryCamera instance */;

// Trigger manual capture with a prompt
estuaryCamera.manualCapture("Describe what you see in detail.");

Custom Camera Handler

If you need more control, subscribe to cameraCaptureRequest directly:

import { EstuaryManager, CameraCaptureRequest } from 'estuary-lens-studio-sdk';

character.on('cameraCaptureRequest', (request: CameraCaptureRequest) => {
    // Custom capture logic
    const imageBase64 = captureMyCustomImage();
    
    EstuaryManager.instance.sendCameraImage(
        imageBase64,
        'image/jpeg',
        request.request_id,
        request.text
    );
});

Requirements & Limitations

Platform Requirements

Requirement	Details
Hardware	Spectacles only (CameraModule is device-specific)
Permissions	Extended Permissions required for development
Internet	Required for AI analysis

Development Notes

CameraModule Limitations

CameraModule APIs cannot be called in onAwake() - use OnStartEvent or later
In Lens Studio Preview, the camera captures the preview scene content (not a real camera feed)
Using CameraModule disables open internet for publicly released Lenses

Testing Workflow

Development: Enable Extended Permissions in Project Settings
Preview: Camera capture works but only captures the preview scene content
Device: Deploy to Spectacles to capture real-world camera feed

Troubleshooting

Camera Not Capturing

Check Lifecycle: Ensure camera is initialized after OnStartEvent
Check Permissions: Extended Permissions must be enabled
Check Connection: Verify EstuaryManager.isConnected is true
Preview vs Device: In Preview, camera captures the preview scene; on Spectacles, it captures the real camera

Image Not Sending

Check Encoding: Verify Base64 encoding is successful
Check Size: Large images may timeout - reduce captureResolution
Check Connection: WebSocket must be connected

AI Not Responding to Image

Check Request ID: If responding to a server request, include the request_id from the CameraCaptureRequest
Check Server Logs: Verify image was received and processed
Check Vision Model: Ensure the character's VLM model supports vision

Best Practices

Optimize for Performance

// Use appropriate resolution for your use case
captureResolution: 512  // Good balance of quality/speed

// Enable acknowledgment for user feedback
enableVisionAcknowledgment: true

Handle Edge Cases

// Check camera availability before capture
if (!cameraModule) {
    print("Camera not available on this device");
    return;
}

// Handle capture failures gracefully
try {
    this.captureAndSend();
} catch (error) {
    print(`Capture failed: ${error}`);
    this.notifyUser("Sorry, I couldn't capture that image");
}

Provide User Feedback

// Show visual indicator during capture
character.on('cameraCaptureRequest', () => {
    this.captureIndicator.enabled = true;
    this.showProcessingUI();
});

SDK Structure

The camera functionality is split between core and example components:

estuary-lens-studio-sdk/
├── src/
│   └── Components/
│       └── EstuaryManager.ts          ← Core: sendCameraImage()
│   └── Core/
│       └── EstuaryEvents.ts           ← CameraCaptureRequest type
└── Examples/
    ├── EstuaryVoiceConnection.ts      ← Example: voice + camera integration
    └── EstuaryCamera.ts               ← Example: camera capture implementation

Next Steps

API Reference: Camera Module - Complete API documentation
Voice Connection - Audio setup for transcripts
Action System - Trigger actions from AI responses

Overview​

How It Works​

Quick Start​

Prerequisites​

Step 1: Add EstuaryCamera Component​

Step 2: Configure Settings​

EstuaryCamera Example​

Setup in Lens Studio​

Inspector Properties​

Vision Acknowledgment​

Resolution Guidelines​

Manual Capture​

Integration Flow​

Complete Event Flow​

Event Flow Diagram​

EstuaryManager Camera Methods​

sendCameraImage​

Practical Examples​

Basic Setup​

Manual Camera Capture​

Custom Camera Handler​

Requirements & Limitations​

Platform Requirements​

Development Notes​

Testing Workflow​

Troubleshooting​

Camera Not Capturing​

Image Not Sending​

AI Not Responding to Image​

Best Practices​

Optimize for Performance​

Handle Edge Cases​

Provide User Feedback​

SDK Structure​

Next Steps​