Skip to main content

Camera Module

Enable AI vision capabilities by capturing and analyzing images from the device camera.

Overview

The Camera Module allows your AI characters to "see" what the user is looking at through the device camera. This enables powerful multimodal interactions:

  • Visual identification - "What breed of dog is this?"
  • Object analysis - "Is this fruit ripe enough to eat?"
  • Scene understanding - "What do you think of this painting?"
  • Reading assistance - "Can you read this sign for me?"
┌──────────────────────────────────────────────────────────────────────┐
│ Camera Module Flow │
├──────────────────────────────────────────────────────────────────────┤
│ │
│ User: "Hey, what do you think of this vase I'm looking at?" │
│ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ VisionIntent │--→ │ Detects visual intent │ │
│ │ Detector │ │ (heuristic) │ │
│ └──────────────────┘ └───────────────┬──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ Vision Pending │ ←--│ Signals server that image │ │
│ │ Signal to Server │ │ is about to be sent │ │
│ └──────────────────┘ └───────────────┬──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ EstuaryCamera │--→ │ Captures image using │ │
│ │ (Example) │ │ Spectacles CameraModule │ │
│ └──────────────────┘ └───────────────┬──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ EstuaryManager │--→ │ Sends Base64 image to │ │
│ │ .sendCameraImage │ │ server for AI analysis │ │
│ └──────────────────┘ └───────────────┬──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ AI: "That's a beautiful ceramic vase with blue floral │ │
│ │ patterns! It appears to be hand-painted..." │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘

How It Works

The Camera Module operates through two complementary systems:

1. Server-Side Detection (Explicit Commands)

For explicit visual requests like:

  • "What am I looking at?"
  • "Describe what you see"
  • "Take a picture and tell me about it"

The server automatically detects these commands and sends a cameraCaptureRequest event to the client.

2. VisionIntentDetector (Natural Language)

For natural language that implies visual context:

  • "Hey what do you think of this vase I'm looking at?"
  • "Can you help me identify this plant?"
  • "Is this ripe enough to eat?"
  • "What breed of dog is this?"

The VisionIntentDetector (in src/Components/VisionIntentDetector.ts) analyzes speech transcripts using smart heuristic detection to understand when the user wants the AI to see something.


Quick Start

Prerequisites

  • Estuary SDK set up with EstuaryVoiceConnection or EstuaryManager
  • EstuaryCredentials configured
  • Spectacles hardware (CameraModule is device-only)
  • Extended Permissions enabled in Project Settings (for development)

Step 1: Add EstuaryCamera Component

The EstuaryCamera example component handles camera capture. Copy it from the SDK's Examples/ folder to your project:

Examples/EstuaryCamera.ts

Then add it to a SceneObject in your scene.

For natural language camera activation, add the VisionIntentDetectorComponent from the core SDK:

import { VisionIntentDetectorComponent } from 'estuary-lens-studio-sdk';

This component is located in src/Components/VisionIntentDetector.ts.

Step 3: Configure Settings

Settings are split between two components:

EstuaryVoiceConnection (vision intent detection):

SettingDefaultDescription
enableVisionIntentDetectiontrueEnable natural language camera activation
visionConfidenceThreshold0.7Confidence threshold for triggering camera (0-1)

EstuaryCamera (camera capture):

SettingDefaultDescription
captureResolution512Image resolution (smaller dimension in pixels)
enableVisionAcknowledgmenttrueCharacter says acknowledgment before analyzing
debugModetrueEnable debug logging

VisionIntentDetector

The VisionIntentDetector intelligently detects when the user wants the AI to see something. It's a core SDK component located in src/Components/VisionIntentDetector.ts.

How Detection Works

The detector uses a sophisticated heuristic system that analyzes speech for:

Strong Visual Indicators (High Confidence ~0.9)

  • "look at this", "see this", "what is this"
  • "can you see", "i'm looking at"
  • "identify this", "recognize this"
  • "help me with this", "check this out"

Medium Indicators + Context Clues (Medium Confidence ~0.75)

  • Deictic references ("this", "that", "here") combined with visual context
  • Visual nouns (plant, vase, dog, painting, food, sign, etc.)

Example Phrases That Trigger Camera:

"Hey what do you think of this vase I'm looking at"  → Triggered ✓
"Can you help me identify this plant?" → Triggered ✓
"Is this ripe enough to eat?" → Triggered ✓
"What breed of dog is this?" → Triggered ✓
"Look at this sunset" → Triggered ✓
"Help me read this sign" → Triggered ✓

"What's the weather today?" → Not triggered ✗
"Tell me a joke" → Not triggered ✗
"How do I make pasta?" → Not triggered ✗

Configuration

import { VisionIntentDetector } from 'estuary-lens-studio-sdk';

// No external API key required!
const detector = new VisionIntentDetector({
confidenceThreshold: 0.7, // Lower = more sensitive
debugLogging: true
});

The detector works without any additional API keys by using heuristic detection. For advanced use cases, you can optionally configure an LLM API for more sophisticated classification.

Using the Component

import { VisionIntentDetectorComponent } from 'estuary-lens-studio-sdk';

@component
export class MyVisionHandler extends BaseScriptComponent {

onAwake() {
const detector = VisionIntentDetectorComponent.instance;

if (detector) {
detector.onVisionIntent((request, result) => {
print(`Vision detected: ${result.confidence}`);
});
}
}
}

EstuaryCamera Example

The EstuaryCamera example component (in Examples/EstuaryCamera.ts) handles the actual image capture on Spectacles hardware.

Setup in Lens Studio

  1. Copy Examples/EstuaryCamera.ts to your project
  2. Create a SceneObject in your scene
  3. Add the EstuaryCamera script as a component
  4. Configure settings in the Inspector

Inspector Properties

PropertyTypeDefaultDescription
debugModebooleantrueEnable debug logging
captureResolutionnumber512Camera resolution (smaller dimension)
enableVisionAcknowledgmentbooleantrueAI says acknowledgment before analyzing

Vision Acknowledgment

When enableVisionAcknowledgment is enabled, the AI character will say a brief phrase (e.g., "Let me take a look!") immediately when the camera triggers. This provides instant feedback while the image is being captured and processed.

Resolution Guidelines

ResolutionQualityUse Case
256LowFastest transfer, basic recognition
512GoodRecommended - balanced quality/speed
768HighDetailed analysis needs
1024Very HighMaximum detail, slower transfer

Manual Capture

You can trigger a capture programmatically:

// Get reference to EstuaryCamera component
const estuaryCamera = /* your EstuaryCamera instance */;

// Trigger manual capture
estuaryCamera.manualCapture("What do you see?");

Integration Flow

Complete Event Flow

// 1. User speaks (handled by EstuaryVoiceConnection)
// "What do you think of this vase?"

// 2. Server sends STT transcript
character.on('transcript', (response) => {
// VisionIntentDetector analyzes this automatically
});

// 3. VisionIntentDetector detects visual intent
detector.on('visionIntentDetected', (request, result) => {
print(`Vision intent: ${result.confidence}`);
// EstuaryCamera receives this via cameraCaptureRequest event
});

// 4. EstuaryCamera captures and sends image
// (automatic when subscribed to cameraCaptureRequest)

// 5. AI responds with image analysis
character.on('botResponse', (response) => {
// "That's a beautiful ceramic vase with blue patterns..."
});

Event Flow Diagram

User Speaks → STT Transcript → VisionIntentDetector


┌────────────────────────┐
│ Vision Intent Detected │
│ (confidence > 0.7) │
└──────────┬─────────────┘

┌───────────────┼───────────────┐
▼ ▼ ▼
Vision Pending EstuaryCamera cameraCaptureRequest
Signal to Server Component Event Emitted
│ │
│ ▼
│ Capture Image
│ │
│ ▼
│ Encode Base64
│ │
└───────────────┤

sendCameraImage()


Server AI Analysis


Bot Response with
Image Description

EstuaryManager Camera Methods

The EstuaryManager provides methods for camera integration:

sendCameraImage

Send a captured image to the server for AI analysis:

import { EstuaryManager } from 'estuary-lens-studio-sdk';

EstuaryManager.instance.sendCameraImage(
imageBase64, // Base64-encoded image data
'image/jpeg', // MIME type
requestId, // Optional: request ID for correlation
text, // Optional: context text
16000 // Optional: TTS sample rate
);

sendVisionPending

Signal that an image is about to be sent (allows server to send acknowledgment):

EstuaryManager.instance.sendVisionPending(
transcript, // The user's speech that triggered vision
requestId // Optional: request ID for correlation
);

updatePreferences

Configure vision behavior:

EstuaryManager.instance.updatePreferences({
enableVisionAcknowledgment: true // AI says "Let me take a look!"
});

Practical Examples

Basic Setup

The camera system works automatically when you add both example components to your scene:

  1. Add EstuaryVoiceConnection (from Examples/EstuaryVoiceConnection.ts)
  2. Add EstuaryCamera (from Examples/EstuaryCamera.ts)

That's it! The components integrate automatically:

  • EstuaryVoiceConnection creates a VisionIntentDetector and listens for transcripts
  • When vision intent is detected, it emits a cameraCaptureRequest event
  • EstuaryCamera listens for this event and captures the image

Accessing Vision Intent Detector

From EstuaryVoiceConnection, you can access the vision intent detector:

// Get reference to EstuaryVoiceConnection (SimpleAutoConnect)
const voiceConnection = /* your EstuaryVoiceConnection instance */;

// Access the vision intent detector
const detector = voiceConnection.getVisionIntentDetector();
if (detector) {
detector.on('visionIntentDetected', (request, result) => {
print(`Vision intent: ${result.confidence}`);
print(`Reason: ${result.reason}`);
});
}

Manual Camera Capture

Trigger a capture programmatically from EstuaryCamera:

// Get reference to EstuaryCamera component
const estuaryCamera = /* your EstuaryCamera instance */;

// Trigger manual capture with a prompt
estuaryCamera.manualCapture("Describe what you see in detail.");

Requirements & Limitations

Platform Requirements

RequirementDetails
HardwareSpectacles only (CameraModule is device-specific)
PermissionsExtended Permissions required for development
InternetRequired for AI analysis

Development Notes

CameraModule Limitations
  • CameraModule APIs cannot be called in onAwake() - use OnStartEvent or later
  • In Lens Studio Preview, the camera captures the preview scene content (not a real camera feed)
  • Using CameraModule disables open internet for publicly released Lenses

Testing Workflow

  1. Development: Enable Extended Permissions in Project Settings
  2. Preview: Camera capture works but only captures the preview scene content
  3. Device: Deploy to Spectacles to capture real-world camera feed

Troubleshooting

Camera Not Capturing

  1. Check Lifecycle: Ensure camera is initialized after OnStartEvent
  2. Check Permissions: Extended Permissions must be enabled
  3. Check Connection: Verify EstuaryManager.isConnected is true
  4. Preview vs Device: In Preview, camera captures the preview scene; on Spectacles, it captures the real camera

Vision Intent Not Detecting

  1. Check Threshold: Lower confidenceThreshold for more sensitivity
  2. Enable Debug: Set debugMode = true to see detection logs
  3. Check Connection: Ensure character is connected and receiving transcripts

Image Not Sending

  1. Check Encoding: Verify Base64 encoding is successful
  2. Check Size: Large images may timeout - reduce captureResolution
  3. Check Connection: WebSocket must be connected

AI Not Responding to Image

  1. Check Request ID: Ensure request IDs match between pending signal and image
  2. Check Server Logs: Verify image was received and processed
  3. Check Vision Acknowledgment: Enable to get immediate feedback

Best Practices

Optimize for Performance

// Use appropriate resolution for your use case
captureResolution: 512 // Good balance of quality/speed

// Enable acknowledgment for user feedback
enableVisionAcknowledgment: true

Handle Edge Cases

// Check camera availability before capture
if (!cameraModule) {
print("Camera not available on this device");
return;
}

// Handle capture failures gracefully
try {
this.captureAndSend();
} catch (error) {
print(`Capture failed: ${error}`);
this.notifyUser("Sorry, I couldn't capture that image");
}

Provide User Feedback

// Show visual indicator during capture
this.captureIndicator.enabled = true;

// Disable after sending
manager.on('cameraCaptureRequest', () => {
// Indicate processing
this.showProcessingUI();
});

SDK Structure

The camera functionality is split between core and example components:

estuary-lens-studio-sdk/
├── src/
│ └── Components/
│ ├── VisionIntentDetector.ts ← Core: vision intent detection class
│ └── EstuaryManager.ts ← Core: sendCameraImage, sendVisionPending
└── Examples/
├── EstuaryVoiceConnection.ts ← Example: voice + vision intent settings
└── EstuaryCamera.ts ← Example: camera capture implementation

Settings Distribution:

  • EstuaryVoiceConnection: enableVisionIntentDetection, visionConfidenceThreshold
  • EstuaryCamera: captureResolution, enableVisionAcknowledgment, debugMode

Next Steps