Insight
The world, narrated.
AI-powered spatial awareness for the visually impaired.
Navigation
Spatial Awareness
Independence
📱 Paste your
screenshot here
Problem
Solution
Tech Stack
Memory
They navigate. But they can't perceive.
Real scenarios where vision changes everything
🛍️
In a Mall
A cane or guide dog gets you through the crowd. But which shop is on your left? Is there a café nearby? Is this the exit or a dead end? Navigation helps. Awareness doesn't come with it.
🏠
At Home
Muscle memory and organisation work well — until something moves. Is this the ketchup or the ice cream? Which can of soup is this? Some things touch simply cannot identify. Only vision can.
🐕
Guide Dog Disobedience
Guide dogs practise intelligent disobedience — they stop when they sense danger. But they can't explain why. The person is left confused, frozen, unsure. Insight narrates the reason in real time.
🍽️
At a Restaurant
The waiter hands you a menu. There is no Braille version. Insight reads it aloud — items, descriptions, prices — so you can order like everyone else.
🚦
On the Road
"The pedestrian signal has turned white." Real-time visual context — traffic lights, crossings, signage — narrated the moment it changes. No guessing.
Problem
Solution
Tech Stack
Memory
Every alternative is missing something critical.
Competitive Analysis — why Insight fills the gap
Camera
(see the world)
GPS Navigation
(routing)
Conversational
(ask questions)
Lazarillo
✗
✓
✗
BlindSquare
✗
✓
✗
Be My Eyes
✓
✗
✗
Insight ✦
✓
✓
✓
Problem
Solution
Tech Stack
Memory
Insight does all three.
In one app. In one API call.
See
Live camera feed. Insight narrates what it sees — shops, labels, objects, hazards — the moment they appear.
Navigate
Google Maps Places + Directions API provides turn-by-turn routing with full awareness of real-world businesses nearby.
Converse
"What's in my hand?" "Is the café busy?" "Why did we stop?" — Insight answers naturally via a persistent conversational agent.
Problem
Solution
Tech Stack
Memory
Architecture
SwiftUI — four purpose-built engines, each owning its domain
NarratorEngine
Orchestrates all narration. Receives scene data from CameraManager, calls Claude Haiku, speaks output via AVFoundation TTS. Fires haptic on speech start.
NavigationEngine
Owns GPS routing. Queries Google Maps Directions API for turn-by-turn steps, Google Maps Places API for nearby POIs. Triggers haptic on direction changes.
CameraManager
Manages AVCaptureSession. Captures frames on a defined interval, encodes to base64 JPEG, feeds directly into the Claude Haiku API call alongside the conversation context.
LocationManager
Wraps CoreLocation. Feeds real-time coordinates to NavigationEngine and enriches each Claude prompt with current GPS context for spatially-aware narration.
Problem
Solution
Tech Stack
Memory
One model. One call. Two jobs.
CameraManager + NarratorEngine — powered by claude-haiku
INPUTS
📸 Camera frame
base64 JPEG from AVCaptureSession
💬 Last 6 messages
rolling short-term memory array
📍 GPS context
current coordinates from LocationManager
🗣️ User utterance
what the user just asked (if anything)
claude
-haiku
one API call
OUTPUT
🔊
Spoken
narration
📳
Haptic
feedback
💾
Stored in
memory
Problem
Solution
Tech Stack
Memory
Why not YOLO?
A deliberate architectural decision
YOLO / On-device CV
✗ Outputs bounding boxes and class labels — not natural language
✗ Needs a separate NLP model to turn detections into speech
✗ Two models = two API calls = more latency, more complexity
✗ Can't answer conversational questions like "what's in my hand?"
✗ Can't read text — menus, labels, signs are invisible to it
claude-haiku ✓
✓ Describes scenes in natural language, ready to speak aloud
✓ Handles CV + conversation in a single API call
✓ Can read text: menus, labels, road signs, packaging
✓ Remembers context — doesn't repeat what it just said
✓ Fast enough for real-time use with rolling frame capture
Problem
Solution
Tech Stack
Memory
NavigationEngine
Google Maps as the GPS backbone
Google Maps Places API
→ Queries nearby points of interest by category and proximity
→ Feeds real business names into narration — not just coordinates
→ "There is a Starbucks 40 metres to your right" — not just a dot on a map
→ Results enriched with GPS context from LocationManager
Google Maps Directions API
→ Turn-by-turn walking directions from current position to destination
→ Step instructions passed into NarratorEngine for spoken output
→ Each direction change triggers a haptic pulse before speaking
→ Re-routes automatically if the user deviates from the path
Problem
Solution
Tech Stack
Memory
Memory System
Short-term context + long-term identity — no login required
Short-Term Memory
Rolling array of 6 responses
The last 6 model outputs are stored in a Swift array in NarratorEngine
Prevents repetition
Each API call includes this array — Haiku won't repeat what it just told you
Conversational continuity
"What did you just say about the café?" — Insight can answer because it remembers
Lightweight by design
No database write needed. Lives in memory. Resets when the session ends.
Long-Term Memory
No login. No account.
Zero auth friction — designed for users who may struggle with login flows
iOS Keychain device ID
On first launch, a UUID is generated and stored in iOS Keychain — persistent, silent, secure
Supabase / Postgres backend
Device ID is the primary key. User preferences and history are stored against it
Returns on every session
Open the app, and Insight already knows your preferred routes, saved places, and past context
Problem
Solution
Tech Stack
Memory
Haptic Feedback
Touch as stimulus — a second channel of communication
📳 Narration begins
Medium Impact
Every time NarratorEngine begins speaking, a haptic fires first. The user feels it before they hear it — a reliable signal that information is incoming.
📳 Navigation direction change
Heavy Impact
Before announcing a turn — "Turn left in 10 metres" — a strong haptic pulse precedes the voice. Reinforces the instruction through two senses simultaneously.
📳 Hazard detected
Notification (Warning)
When Insight detects something urgent in the scene — a vehicle, a step, an obstacle — a warning haptic fires immediately, even before speech starts.
📳 Destination reached
Notification (Success)
A distinct success pattern confirms arrival. No ambiguity — the user knows they are exactly where they intended to be.
I
ntelligent scene narration via claude-haiku
N
avigation powered by Google Maps APIs
S
hort + long-term memory, no login needed
I
OS-native SwiftUI, four purpose-built engines
G
PS-enriched, spatially-aware AI responses
H
aptic stimulus on every key moment
T
he world, finally described
Insight gives blind people the freedom to know their world.