1 of 25

Aeyez

A camera that narrates what you can’t see.

2 of 25

THE TEAM

Michael Yao, Terence Zhang, Alan Lu

3 of 25

THE PROBLEM

Blind & visually impaired people have little awareness of the world around them.

2.2B

people with vision impairment worldwide

$3,000+

cost of a high-end AI-powered vision device

56%

currently unemployed

4 of 25

INTRODUCING AEYEZ

An always-on AI companion that watches, understands, and narrates the world for you.

videocam

Camera watches environment

visibility

SPAZ model analyzes frame

record_voice_over

ElevenLabs text to speech

history

History saved with location + time

5 of 25

Live Demo

6 of 25

THE INTELLIGENCE

The SPAZ Model

Our Foundation.

visibility

Precision Processing

Describes complex, cluttered environments precisely

difference

Temporal Analysis

Detects subtle changes between frames

architecture

Spatial Awareness

Understands spatial relationships (left / right / near / far)

quiz

Visual Reasoning

Answers open-ended questions about visual scenes

tune

Adaptive Performance

Performs reliably in varied lighting & camera angles

verified

Field Tested

Extremely high accuracy on real-world tasks

7 of 25

8 of 25

9 of 25

MMMU-PRO BENCH

SPAZ Scores 88% Accuracy

System-level spatial reasoning beats strong multimodal baselines.

AEYEZ�OURS

88%

#1 in comparison set

+4.8 pts vs GPT-5.5

+7.0 pts vs GPT-5.4

MMMU-Pro accuracy (higher is better)

SPAZ (ours)

88.0%

GPT-5.5

83.2%

GPT-5.4

81.2%

Gemini 3 Pro

81.0%

GPT-5.4 mini

76.6%

GPT-5 mini

67.5%

GPT-5.4 nano

66.0%

Qwen3-VL 32B

65.3%

Qwen2.5-VL 72B

51.1%

Aeyez = full stack: vision + Spaz safe mode + OSHA grounding + K2V2 summary.

Sources: public MMMU-Pro benchmark pages for OpenAI / Google / Qwen; Aeyez = internal benchmark result.

Hard test set�Selected difficult MMMU-Pro items mirroring Aeyez use cases: Mechanical Engineering, Architecture & Engineering, load-path and spatial route reasoning.

10 of 25

Overhead hazards on crane

Fall exposure into pit

Walking hazards

on pathway

11 of 25

MODEL BENCHMARK

SPAZ vs GPT-5.5

Visual scene understanding — construction site

SPAZ

OUR MODEL

Safest now: Stop or step back slightly.

Safest after: Right lane if the worker has cleared

GPT-5.5

BASELINE

Proceed left: it appears less obstructed and away from the worker on the right.

12 of 25

WHY AEYES?

13 of 25

WHY AEYES?

AEYES

Scene Awareness

Continuous, real-time

Memory / Context

Full visual history per user

Interaction

Natural voice conversation

Location Intel

GPS + named places + visual context

Guidance

Active AI guidance (Safe Mode)

14 of 25

PHASE 03

Step 3

  • Partnerships with accessibility organizations
  • Integration with Meta Ray-Ban & smart glasses
  • Always-on ambient awareness

PHASE 02

Step 2

  • Open access to a broader user base
  • New features driven by beta feedback
  • Stronger vision models with lower latency

CURRENT PHASE

Step 1

  • Public website beta launch
  • Live Trail and testing under controlled environment
  • Building text-speech interactive tools

Roadmap

15 of 25

Thank You.

16 of 25

17 of 25

VIDEO UNDERSTANDING

From Video to Chronological Evidence

Aeyez samples key moments, detects meaningful scene changes, preserves hazard candidates, and passes ordered evidence into reasoning.

Raw video

first-person stream

Key frames

sample + scene changes

Hazard candidates

preserve risky moments

Ordered evidence

time-aware context

Reasoning

motion + risk

Temporal intelligence

Compares before and after: workers move, carried objects cross paths, materials shift, routes open or close.

Meaningful scene changes

Extra attention goes to turns, close passes, tool motion, blocked exits, glare, and emerging hazards.

Daily evidence log

Important moments are saved as timestamped evidence for K2V2 summary and risk memory.

18 of 25

SPAZ ENGINE

Translator + Reasoner Feedback Loop

Spaz separates perception from reasoning so the system can observe first, question missing evidence, then verify safety-critical details.

Translator Agent

Extracts structured visual evidence: objects, routes, OCR, crops, and spatial cues.

Improved SIR

Stores evidence as structured scene representation, not just a caption.

Text Reasoner

Evaluates evidence, asks for missing visual cues, and makes the final decision.

Key design

Perception and reasoning stay separate; feedback improves later visual searches without contaminating the first neutral observation.

19 of 25

SAFE MODE

Immediate Mobility Guidance

Safe Mode changes the task into: “what should the user do in the next few seconds?”

Stop / Wait

Use when evidence is unclear or a dynamic obstruction is close.

Proceed with controls

Use when a route is feasible and verified safer.

Rescan / Fallback

Use when glare, blur, occlusion, or missing evidence makes navigation unsafe.

20 of 25

EGOCENTRIC ACTION MODEL

First-Person Action Formula

Safe Mode models action as a combination of hands, objects, contact, motion, and scene context.

Action = hand pose + active object + contact target + temporal motion + scene context

Hand pose

What are your hands or others’ hands doing?

Active object

What is being held, carried, pushed, or used?

Contact target

What is the tool or material touching?

Temporal motion

How did it move across frames?

Scene context

Why is that action risky here?

Example

A worker carrying a long board is treated as a moving route conflict, not just “a person holding something.”

21 of 25

SHORT-HORIZON MOTION PREDICTION

Reasoning Over the Next Few Seconds

With multiple frames, Aeyez compares motion and asks whether waiting is safer than passing now.

t-3

t-2

t-1

t

4-frame temporal window

Dynamic obstruction

Person, cart, worker, tool, or carried material may block now but clear seconds later.

Future conflict

Checks whether the user route intersects a moving object path.

Wait vs proceed

Separates “unsafe now” from “possibly safe after clearance.”

Safe Mode asks

Will the worker move away? Will the bucket path cross me? Will a route open after the obstruction clears?

22 of 25

ROUTE CONFLICT DETECTION

Candidate Route Safety

Aeyez treats routes as future motion corridors and rejects paths that cross active danger zones.

Ego-centered route map

USER

edge

load

clear

collision path

line-of-fire

pinch/crush

falling-object

open edge

blocked path

Safe next action options

Stop • Wait • Step back • Keep left/right • Move forward slightly, then turn • Rescan

23 of 25

SPAZ SAFE MODE RUNTIME

Safe Mode Runtime Loop

The first Translator pass stays neutral. The Reasoner sends back targeted questions so later rounds search harder for safety-critical details.

1. Neutral Translator

what is visible?

2. Reasoner Review

what is missing?

3. Targeted Search

find hazards + routes

4. Safety Report

terminate_and_report_safety

Safety Evidence Block

observed hazards • precursor conditions • missing controls • exposed people/assets • safest immediate actions

Route Memory Block

current position • passable paths • high-risk paths • route now • route after clear • no-go zones • fallback

Core advantage

First observe, then question, then verify — without biasing the initial visual evidence.

24 of 25

OSHA + MECHANICS HAZARD LENS

OSHA + Mechanics Lens

Aeyez translates what it sees into credible incident mechanisms: how a hazard could physically unfold.

Struck-by

falling / swinging / carried objects

Falls

open edge / shaft / ladder / platform

Caught-in / between

pinch, crush, shear, equipment gaps

Electrocution

exposed wire, wet floor, panels

Mechanics model checks load paths, supports, stored energy, swing zones, falling-object paths, pinch points, and exposed routes.

Safety verifier

Every route is screened for “feasible but unsafe” before any next-step advice is allowed.

25 of 25

OUTPUT LAYERS

Outputs + K2V2 Summary

Aeyez has two output modes: seconds-level Safe Mode and end-of-day summary. K2V2 is the final layer that makes long evidence readable.

Immediate Safe Mode

Current scene • dynamic-clear scene • route candidates • risk scores • no-go zones • safest next action

Daily Risk Memory

Timestamped hazards • repeated locations • risk clusters • unsafe patterns • prevention advice

K2V2 Final Summary

Compresses evidence into a concise, user-facing report: what happened, what was dangerous, and what to do next.

Translator + Reasoner find the risk → Event memory stores the evidence → K2V2 produces the final important output summary

Why K2V2 matters

It turns long, messy multi-frame evidence into a clean report that teammates and users can understand quickly.