Camera Module
Enable AI vision capabilities by capturing and analyzing images from the device camera.
Overview
The Camera Module allows your AI characters to "see" what the user is looking at through the device camera. This enables powerful multimodal interactions:
- Visual identification - "What breed of dog is this?"
- Object analysis - "Is this fruit ripe enough to eat?"
- Scene understanding - "What do you think of this painting?"
- Reading assistance - "Can you read this sign for me?"
┌──────────────────────────────────────────────────────────────────────┐
│ Camera Module Flow │
├──────────────────────────────────────────────────────────────────────┤
│ │
│ User: "Hey, what do you think of this vase I'm looking at?" │
│ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ VisionIntent │--→ │ Detects visual intent │ │
│ │ Detector │ │ (heuristic) │ │
│ └──────────────────┘ └───────────────┬──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ Vision Pending │ ←--│ Signals server that image │ │
│ │ Signal to Server │ │ is about to be sent │ │
│ └──────────────────┘ └───────────────┬──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ EstuaryCamera │--→ │ Captures image using │ │
│ │ (Example) │ │ Spectacles CameraModule │ │
│ └──────────────────┘ └───────────────┬──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ EstuaryManager │--→ │ Sends Base64 image to │ │
│ │ .sendCameraImage │ │ server for AI analysis │ │
│ └──────────────────┘ └───────────────┬──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ AI: "That's a beautiful ceramic vase with blue floral │ │
│ │ patterns! It appears to be hand-painted..." │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘
How It Works
The Camera Module operates through two complementary systems:
1. Server-Side Detection (Explicit Commands)
For explicit visual requests like:
- "What am I looking at?"
- "Describe what you see"
- "Take a picture and tell me about it"
The server automatically detects these commands and sends a cameraCaptureRequest event to the client.
2. VisionIntentDetector (Natural Language)
For natural language that implies visual context:
- "Hey what do you think of this vase I'm looking at?"
- "Can you help me identify this plant?"
- "Is this ripe enough to eat?"
- "What breed of dog is this?"
The VisionIntentDetector (in src/Components/VisionIntentDetector.ts) analyzes speech transcripts using smart heuristic detection to understand when the user wants the AI to see something.
Quick Start
Prerequisites
- Estuary SDK set up with
EstuaryVoiceConnectionorEstuaryManager EstuaryCredentialsconfigured- Spectacles hardware (CameraModule is device-only)
- Extended Permissions enabled in Project Settings (for development)
Step 1: Add EstuaryCamera Component
The EstuaryCamera example component handles camera capture. Copy it from the SDK's Examples/ folder to your project:
Examples/EstuaryCamera.ts
Then add it to a SceneObject in your scene.
Step 2: Add VisionIntentDetectorComponent (Recommended)
For natural language camera activation, add the VisionIntentDetectorComponent from the core SDK:
import { VisionIntentDetectorComponent } from 'estuary-lens-studio-sdk';
This component is located in src/Components/VisionIntentDetector.ts.
Step 3: Configure Settings
Settings are split between two components:
EstuaryVoiceConnection (vision intent detection):
| Setting | Default | Description |
|---|---|---|
enableVisionIntentDetection | true | Enable natural language camera activation |
visionConfidenceThreshold | 0.7 | Confidence threshold for triggering camera (0-1) |
EstuaryCamera (camera capture):
| Setting | Default | Description |
|---|---|---|
captureResolution | 512 | Image resolution (smaller dimension in pixels) |
enableVisionAcknowledgment | true | Character says acknowledgment before analyzing |
debugMode | true | Enable debug logging |
VisionIntentDetector
The VisionIntentDetector intelligently detects when the user wants the AI to see something. It's a core SDK component located in src/Components/VisionIntentDetector.ts.
How Detection Works
The detector uses a sophisticated heuristic system that analyzes speech for:
Strong Visual Indicators (High Confidence ~0.9)
- "look at this", "see this", "what is this"
- "can you see", "i'm looking at"
- "identify this", "recognize this"
- "help me with this", "check this out"
Medium Indicators + Context Clues (Medium Confidence ~0.75)
- Deictic references ("this", "that", "here") combined with visual context
- Visual nouns (plant, vase, dog, painting, food, sign, etc.)
Example Phrases That Trigger Camera:
"Hey what do you think of this vase I'm looking at" → Triggered ✓
"Can you help me identify this plant?" → Triggered ✓
"Is this ripe enough to eat?" → Triggered ✓
"What breed of dog is this?" → Triggered ✓
"Look at this sunset" → Triggered ✓
"Help me read this sign" → Triggered ✓
"What's the weather today?" → Not triggered ✗
"Tell me a joke" → Not triggered ✗
"How do I make pasta?" → Not triggered ✗
Configuration
import { VisionIntentDetector } from 'estuary-lens-studio-sdk';
// No external API key required!
const detector = new VisionIntentDetector({
confidenceThreshold: 0.7, // Lower = more sensitive
debugLogging: true
});
The detector works without any additional API keys by using heuristic detection. For advanced use cases, you can optionally configure an LLM API for more sophisticated classification.
Using the Component
import { VisionIntentDetectorComponent } from 'estuary-lens-studio-sdk';
@component
export class MyVisionHandler extends BaseScriptComponent {
onAwake() {
const detector = VisionIntentDetectorComponent.instance;
if (detector) {
detector.onVisionIntent((request, result) => {
print(`Vision detected: ${result.confidence}`);
});
}
}
}
EstuaryCamera Example
The EstuaryCamera example component (in Examples/EstuaryCamera.ts) handles the actual image capture on Spectacles hardware.
Setup in Lens Studio
- Copy
Examples/EstuaryCamera.tsto your project - Create a SceneObject in your scene
- Add the
EstuaryCamerascript as a component - Configure settings in the Inspector
Inspector Properties
| Property | Type | Default | Description |
|---|---|---|---|
debugMode | boolean | true | Enable debug logging |
captureResolution | number | 512 | Camera resolution (smaller dimension) |
enableVisionAcknowledgment | boolean | true | AI says acknowledgment before analyzing |
Vision Acknowledgment
When enableVisionAcknowledgment is enabled, the AI character will say a brief phrase (e.g., "Let me take a look!") immediately when the camera triggers. This provides instant feedback while the image is being captured and processed.
Resolution Guidelines
| Resolution | Quality | Use Case |
|---|---|---|
| 256 | Low | Fastest transfer, basic recognition |
| 512 | Good | Recommended - balanced quality/speed |
| 768 | High | Detailed analysis needs |
| 1024 | Very High | Maximum detail, slower transfer |
Manual Capture
You can trigger a capture programmatically:
// Get reference to EstuaryCamera component
const estuaryCamera = /* your EstuaryCamera instance */;
// Trigger manual capture
estuaryCamera.manualCapture("What do you see?");
Integration Flow
Complete Event Flow
// 1. User speaks (handled by EstuaryVoiceConnection)
// "What do you think of this vase?"
// 2. Server sends STT transcript
character.on('transcript', (response) => {
// VisionIntentDetector analyzes this automatically
});
// 3. VisionIntentDetector detects visual intent
detector.on('visionIntentDetected', (request, result) => {
print(`Vision intent: ${result.confidence}`);
// EstuaryCamera receives this via cameraCaptureRequest event
});
// 4. EstuaryCamera captures and sends image
// (automatic when subscribed to cameraCaptureRequest)
// 5. AI responds with image analysis
character.on('botResponse', (response) => {
// "That's a beautiful ceramic vase with blue patterns..."
});
Event Flow Diagram
User Speaks → STT Transcript → VisionIntentDetector
│
▼
┌────────────────────────┐
│ Vision Intent Detected │
│ (confidence > 0.7) │
└──────────┬─────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
Vision Pending EstuaryCamera cameraCaptureRequest
Signal to Server Component Event Emitted
│ │
│ ▼
│ Capture Image
│ │
│ ▼
│ Encode Base64
│ │
└───────────────┤
▼
sendCameraImage()
│
▼
Server AI Analysis
│
▼
Bot Response with
Image Description
EstuaryManager Camera Methods
The EstuaryManager provides methods for camera integration:
sendCameraImage
Send a captured image to the server for AI analysis:
import { EstuaryManager } from 'estuary-lens-studio-sdk';
EstuaryManager.instance.sendCameraImage(
imageBase64, // Base64-encoded image data
'image/jpeg', // MIME type
requestId, // Optional: request ID for correlation
text, // Optional: context text
16000 // Optional: TTS sample rate
);
sendVisionPending
Signal that an image is about to be sent (allows server to send acknowledgment):
EstuaryManager.instance.sendVisionPending(
transcript, // The user's speech that triggered vision
requestId // Optional: request ID for correlation
);
updatePreferences
Configure vision behavior:
EstuaryManager.instance.updatePreferences({
enableVisionAcknowledgment: true // AI says "Let me take a look!"
});
Practical Examples
Basic Setup
The camera system works automatically when you add both example components to your scene:
- Add
EstuaryVoiceConnection(fromExamples/EstuaryVoiceConnection.ts) - Add
EstuaryCamera(fromExamples/EstuaryCamera.ts)
That's it! The components integrate automatically:
EstuaryVoiceConnectioncreates aVisionIntentDetectorand listens for transcripts- When vision intent is detected, it emits a
cameraCaptureRequestevent EstuaryCameralistens for this event and captures the image
Accessing Vision Intent Detector
From EstuaryVoiceConnection, you can access the vision intent detector:
// Get reference to EstuaryVoiceConnection (SimpleAutoConnect)
const voiceConnection = /* your EstuaryVoiceConnection instance */;
// Access the vision intent detector
const detector = voiceConnection.getVisionIntentDetector();
if (detector) {
detector.on('visionIntentDetected', (request, result) => {
print(`Vision intent: ${result.confidence}`);
print(`Reason: ${result.reason}`);
});
}
Manual Camera Capture
Trigger a capture programmatically from EstuaryCamera:
// Get reference to EstuaryCamera component
const estuaryCamera = /* your EstuaryCamera instance */;
// Trigger manual capture with a prompt
estuaryCamera.manualCapture("Describe what you see in detail.");
Requirements & Limitations
Platform Requirements
| Requirement | Details |
|---|---|
| Hardware | Spectacles only (CameraModule is device-specific) |
| Permissions | Extended Permissions required for development |
| Internet | Required for AI analysis |
Development Notes
- CameraModule APIs cannot be called in
onAwake()- useOnStartEventor later - In Lens Studio Preview, the camera captures the preview scene content (not a real camera feed)
- Using CameraModule disables open internet for publicly released Lenses
Testing Workflow
- Development: Enable Extended Permissions in Project Settings
- Preview: Camera capture works but only captures the preview scene content
- Device: Deploy to Spectacles to capture real-world camera feed
Troubleshooting
Camera Not Capturing
- Check Lifecycle: Ensure camera is initialized after
OnStartEvent - Check Permissions: Extended Permissions must be enabled
- Check Connection: Verify
EstuaryManager.isConnectedis true - Preview vs Device: In Preview, camera captures the preview scene; on Spectacles, it captures the real camera
Vision Intent Not Detecting
- Check Threshold: Lower
confidenceThresholdfor more sensitivity - Enable Debug: Set
debugMode = trueto see detection logs - Check Connection: Ensure character is connected and receiving transcripts
Image Not Sending
- Check Encoding: Verify Base64 encoding is successful
- Check Size: Large images may timeout - reduce
captureResolution - Check Connection: WebSocket must be connected
AI Not Responding to Image
- Check Request ID: Ensure request IDs match between pending signal and image
- Check Server Logs: Verify image was received and processed
- Check Vision Acknowledgment: Enable to get immediate feedback
Best Practices
Optimize for Performance
// Use appropriate resolution for your use case
captureResolution: 512 // Good balance of quality/speed
// Enable acknowledgment for user feedback
enableVisionAcknowledgment: true
Handle Edge Cases
// Check camera availability before capture
if (!cameraModule) {
print("Camera not available on this device");
return;
}
// Handle capture failures gracefully
try {
this.captureAndSend();
} catch (error) {
print(`Capture failed: ${error}`);
this.notifyUser("Sorry, I couldn't capture that image");
}
Provide User Feedback
// Show visual indicator during capture
this.captureIndicator.enabled = true;
// Disable after sending
manager.on('cameraCaptureRequest', () => {
// Indicate processing
this.showProcessingUI();
});
SDK Structure
The camera functionality is split between core and example components:
estuary-lens-studio-sdk/
├── src/
│ └── Components/
│ ├── VisionIntentDetector.ts ← Core: vision intent detection class
│ └── EstuaryManager.ts ← Core: sendCameraImage, sendVisionPending
└── Examples/
├── EstuaryVoiceConnection.ts ← Example: voice + vision intent settings
└── EstuaryCamera.ts ← Example: camera capture implementation
Settings Distribution:
EstuaryVoiceConnection:enableVisionIntentDetection,visionConfidenceThresholdEstuaryCamera:captureResolution,enableVisionAcknowledgment,debugMode
Next Steps
- API Reference: Camera Module - Complete API documentation
- Voice Connection - Audio setup for transcripts
- Action System - Trigger actions from AI responses