Aeyez
A camera that narrates what you can’t see.
THE TEAM
Michael Yao, Terence Zhang, Alan Lu
THE PROBLEM
Blind & visually impaired people have little awareness of the world around them.
2.2B
people with vision impairment worldwide
$3,000+
cost of a high-end AI-powered vision device
56%
currently unemployed
INTRODUCING AEYEZ
An always-on AI companion that watches, understands, and narrates the world for you.
videocam
Camera watches environment
visibility
SPAZ model analyzes frame
record_voice_over
ElevenLabs text to speech
history
History saved with location + time
Live Demo
THE INTELLIGENCE
The SPAZ Model
Our Foundation.
visibility
Precision Processing
Describes complex, cluttered environments precisely
difference
Temporal Analysis
Detects subtle changes between frames
architecture
Spatial Awareness
Understands spatial relationships (left / right / near / far)
quiz
Visual Reasoning
Answers open-ended questions about visual scenes
tune
Adaptive Performance
Performs reliably in varied lighting & camera angles
verified
Field Tested
Extremely high accuracy on real-world tasks
MMMU-PRO BENCH
SPAZ Scores 88% Accuracy
System-level spatial reasoning beats strong multimodal baselines.
AEYEZ�OURS
88%
#1 in comparison set
+4.8 pts vs GPT-5.5
+7.0 pts vs GPT-5.4
MMMU-Pro accuracy (higher is better)
SPAZ (ours)
88.0%
GPT-5.5
83.2%
GPT-5.4
81.2%
Gemini 3 Pro
81.0%
GPT-5.4 mini
76.6%
GPT-5 mini
67.5%
GPT-5.4 nano
66.0%
Qwen3-VL 32B
65.3%
Qwen2.5-VL 72B
51.1%
Aeyez = full stack: vision + Spaz safe mode + OSHA grounding + K2V2 summary.
Sources: public MMMU-Pro benchmark pages for OpenAI / Google / Qwen; Aeyez = internal benchmark result.
Hard test set�Selected difficult MMMU-Pro items mirroring Aeyez use cases: Mechanical Engineering, Architecture & Engineering, load-path and spatial route reasoning.
Overhead hazards on crane
Fall exposure into pit
Walking hazards
on pathway
MODEL BENCHMARK
SPAZ vs GPT-5.5
Visual scene understanding — construction site
SPAZ
OUR MODEL
Safest now: Stop or step back slightly.
Safest after: Right lane if the worker has cleared
GPT-5.5
BASELINE
Proceed left: it appears less obstructed and away from the worker on the right.
WHY AEYES?
WHY AEYES?
AEYES
Scene Awareness
Continuous, real-time
Memory / Context
Full visual history per user
Interaction
Natural voice conversation
Location Intel
GPS + named places + visual context
Guidance
Active AI guidance (Safe Mode)
PHASE 03
Step 3
PHASE 02
Step 2
CURRENT PHASE
Step 1
Roadmap
Thank You.
VIDEO UNDERSTANDING
From Video to Chronological Evidence
Aeyez samples key moments, detects meaningful scene changes, preserves hazard candidates, and passes ordered evidence into reasoning.
Raw video
first-person stream
Key frames
sample + scene changes
Hazard candidates
preserve risky moments
Ordered evidence
time-aware context
Reasoning
motion + risk
Temporal intelligence
Compares before and after: workers move, carried objects cross paths, materials shift, routes open or close.
Meaningful scene changes
Extra attention goes to turns, close passes, tool motion, blocked exits, glare, and emerging hazards.
Daily evidence log
Important moments are saved as timestamped evidence for K2V2 summary and risk memory.
SPAZ ENGINE
Translator + Reasoner Feedback Loop
Spaz separates perception from reasoning so the system can observe first, question missing evidence, then verify safety-critical details.
Translator Agent
Extracts structured visual evidence: objects, routes, OCR, crops, and spatial cues.
Improved SIR
Stores evidence as structured scene representation, not just a caption.
Text Reasoner
Evaluates evidence, asks for missing visual cues, and makes the final decision.
Key design
Perception and reasoning stay separate; feedback improves later visual searches without contaminating the first neutral observation.
SAFE MODE
Immediate Mobility Guidance
Safe Mode changes the task into: “what should the user do in the next few seconds?”
Stop / Wait
Use when evidence is unclear or a dynamic obstruction is close.
Proceed with controls
Use when a route is feasible and verified safer.
Rescan / Fallback
Use when glare, blur, occlusion, or missing evidence makes navigation unsafe.
EGOCENTRIC ACTION MODEL
First-Person Action Formula
Safe Mode models action as a combination of hands, objects, contact, motion, and scene context.
Action = hand pose + active object + contact target + temporal motion + scene context
Hand pose
What are your hands or others’ hands doing?
Active object
What is being held, carried, pushed, or used?
Contact target
What is the tool or material touching?
Temporal motion
How did it move across frames?
Scene context
Why is that action risky here?
Example
A worker carrying a long board is treated as a moving route conflict, not just “a person holding something.”
SHORT-HORIZON MOTION PREDICTION
Reasoning Over the Next Few Seconds
With multiple frames, Aeyez compares motion and asks whether waiting is safer than passing now.
t-3
t-2
t-1
t
4-frame temporal window
Dynamic obstruction
Person, cart, worker, tool, or carried material may block now but clear seconds later.
Future conflict
Checks whether the user route intersects a moving object path.
Wait vs proceed
Separates “unsafe now” from “possibly safe after clearance.”
Safe Mode asks
Will the worker move away? Will the bucket path cross me? Will a route open after the obstruction clears?
ROUTE CONFLICT DETECTION
Candidate Route Safety
Aeyez treats routes as future motion corridors and rejects paths that cross active danger zones.
Ego-centered route map
USER
edge
load
clear
collision path
line-of-fire
pinch/crush
falling-object
open edge
blocked path
Safe next action options
Stop • Wait • Step back • Keep left/right • Move forward slightly, then turn • Rescan
SPAZ SAFE MODE RUNTIME
Safe Mode Runtime Loop
The first Translator pass stays neutral. The Reasoner sends back targeted questions so later rounds search harder for safety-critical details.
1. Neutral Translator
what is visible?
2. Reasoner Review
what is missing?
3. Targeted Search
find hazards + routes
4. Safety Report
terminate_and_report_safety
Safety Evidence Block
observed hazards • precursor conditions • missing controls • exposed people/assets • safest immediate actions
Route Memory Block
current position • passable paths • high-risk paths • route now • route after clear • no-go zones • fallback
Core advantage
First observe, then question, then verify — without biasing the initial visual evidence.
OSHA + MECHANICS HAZARD LENS
OSHA + Mechanics Lens
Aeyez translates what it sees into credible incident mechanisms: how a hazard could physically unfold.
Struck-by
falling / swinging / carried objects
Falls
open edge / shaft / ladder / platform
Caught-in / between
pinch, crush, shear, equipment gaps
Electrocution
exposed wire, wet floor, panels
Mechanics model checks load paths, supports, stored energy, swing zones, falling-object paths, pinch points, and exposed routes.
Safety verifier
Every route is screened for “feasible but unsafe” before any next-step advice is allowed.
OUTPUT LAYERS
Outputs + K2V2 Summary
Aeyez has two output modes: seconds-level Safe Mode and end-of-day summary. K2V2 is the final layer that makes long evidence readable.
Immediate Safe Mode
Current scene • dynamic-clear scene • route candidates • risk scores • no-go zones • safest next action
Daily Risk Memory
Timestamped hazards • repeated locations • risk clusters • unsafe patterns • prevention advice
K2V2 Final Summary
Compresses evidence into a concise, user-facing report: what happened, what was dangerous, and what to do next.
Translator + Reasoner find the risk → Event memory stores the evidence → K2V2 produces the final important output summary
Why K2V2 matters
It turns long, messy multi-frame evidence into a clean report that teammates and users can understand quickly.