1 of 11

Insight

The world, narrated.

AI-powered spatial awareness for the visually impaired.

Navigation

Spatial Awareness

Independence

📱 Paste your

screenshot here

2 of 11

Problem

Solution

Tech Stack

Memory

They navigate. But they can't perceive.

Real scenarios where vision changes everything

🛍️

In a Mall

A cane or guide dog gets you through the crowd. But which shop is on your left? Is there a café nearby? Is this the exit or a dead end? Navigation helps. Awareness doesn't come with it.

🏠

At Home

Muscle memory and organisation work well — until something moves. Is this the ketchup or the ice cream? Which can of soup is this? Some things touch simply cannot identify. Only vision can.

🐕

Guide Dog Disobedience

Guide dogs practise intelligent disobedience — they stop when they sense danger. But they can't explain why. The person is left confused, frozen, unsure. Insight narrates the reason in real time.

🍽️

At a Restaurant

The waiter hands you a menu. There is no Braille version. Insight reads it aloud — items, descriptions, prices — so you can order like everyone else.

🚦

On the Road

"The pedestrian signal has turned white." Real-time visual context — traffic lights, crossings, signage — narrated the moment it changes. No guessing.

3 of 11

Problem

Solution

Tech Stack

Memory

Every alternative is missing something critical.

Competitive Analysis — why Insight fills the gap

Camera

(see the world)

GPS Navigation

(routing)

Conversational

(ask questions)

Lazarillo

✗

✓

✗

BlindSquare

✗

✓

✗

Be My Eyes

✓

✗

Insight ✦

✓

4 of 11

Problem

Solution

Tech Stack

Memory

Insight does all three.

In one app. In one API call.

See

Live camera feed. Insight narrates what it sees — shops, labels, objects, hazards — the moment they appear.

Navigate

Google Maps Places + Directions API provides turn-by-turn routing with full awareness of real-world businesses nearby.

Converse

"What's in my hand?" "Is the café busy?" "Why did we stop?" — Insight answers naturally via a persistent conversational agent.

5 of 11

Problem

Solution

Tech Stack

Memory

Architecture

SwiftUI — four purpose-built engines, each owning its domain

NarratorEngine

Orchestrates all narration. Receives scene data from CameraManager, calls Claude Haiku, speaks output via AVFoundation TTS. Fires haptic on speech start.

NavigationEngine

Owns GPS routing. Queries Google Maps Directions API for turn-by-turn steps, Google Maps Places API for nearby POIs. Triggers haptic on direction changes.

CameraManager

Manages AVCaptureSession. Captures frames on a defined interval, encodes to base64 JPEG, feeds directly into the Claude Haiku API call alongside the conversation context.

LocationManager

Wraps CoreLocation. Feeds real-time coordinates to NavigationEngine and enriches each Claude prompt with current GPS context for spatially-aware narration.

6 of 11

Problem

Solution

Tech Stack

Memory

One model. One call. Two jobs.

CameraManager + NarratorEngine — powered by claude-haiku

INPUTS

📸 Camera frame

base64 JPEG from AVCaptureSession

💬 Last 6 messages

rolling short-term memory array

📍 GPS context

current coordinates from LocationManager

🗣️ User utterance

what the user just asked (if anything)

claude

-haiku

one API call

OUTPUT

🔊

Spoken

narration

📳

Haptic

feedback

💾

Stored in

memory

7 of 11

Problem

Solution

Tech Stack

Memory

Why not YOLO?

A deliberate architectural decision

YOLO / On-device CV

✗ Outputs bounding boxes and class labels — not natural language

✗ Needs a separate NLP model to turn detections into speech

✗ Two models = two API calls = more latency, more complexity

✗ Can't answer conversational questions like "what's in my hand?"

✗ Can't read text — menus, labels, signs are invisible to it

claude-haiku ✓

✓ Describes scenes in natural language, ready to speak aloud

✓ Handles CV + conversation in a single API call

✓ Can read text: menus, labels, road signs, packaging

✓ Remembers context — doesn't repeat what it just said

✓ Fast enough for real-time use with rolling frame capture

8 of 11

Problem

Solution

Tech Stack

Memory

NavigationEngine

Google Maps as the GPS backbone

Google Maps Places API

→ Queries nearby points of interest by category and proximity

→ Feeds real business names into narration — not just coordinates

→ "There is a Starbucks 40 metres to your right" — not just a dot on a map

→ Results enriched with GPS context from LocationManager

Google Maps Directions API

→ Turn-by-turn walking directions from current position to destination

→ Step instructions passed into NarratorEngine for spoken output

→ Each direction change triggers a haptic pulse before speaking

→ Re-routes automatically if the user deviates from the path

9 of 11

Problem

Solution

Tech Stack

Memory

Memory System

Short-term context + long-term identity — no login required

Short-Term Memory

Rolling array of 6 responses

The last 6 model outputs are stored in a Swift array in NarratorEngine

Prevents repetition

Each API call includes this array — Haiku won't repeat what it just told you

Conversational continuity

"What did you just say about the café?" — Insight can answer because it remembers

Lightweight by design

No database write needed. Lives in memory. Resets when the session ends.

Long-Term Memory

No login. No account.

Zero auth friction — designed for users who may struggle with login flows

iOS Keychain device ID

On first launch, a UUID is generated and stored in iOS Keychain — persistent, silent, secure

Supabase / Postgres backend

Device ID is the primary key. User preferences and history are stored against it

Returns on every session

Open the app, and Insight already knows your preferred routes, saved places, and past context

10 of 11

Problem

Solution

Tech Stack

Memory

Haptic Feedback

Touch as stimulus — a second channel of communication

📳 Narration begins

Medium Impact

Every time NarratorEngine begins speaking, a haptic fires first. The user feels it before they hear it — a reliable signal that information is incoming.

📳 Navigation direction change

Heavy Impact

Before announcing a turn — "Turn left in 10 metres" — a strong haptic pulse precedes the voice. Reinforces the instruction through two senses simultaneously.

📳 Hazard detected

Notification (Warning)

When Insight detects something urgent in the scene — a vehicle, a step, an obstacle — a warning haptic fires immediately, even before speech starts.

📳 Destination reached

Notification (Success)

A distinct success pattern confirms arrival. No ambiguity — the user knows they are exactly where they intended to be.

11 of 11

ntelligent scene narration via claude-haiku

avigation powered by Google Maps APIs

hort + long-term memory, no login needed

OS-native SwiftUI, four purpose-built engines

PS-enriched, spatially-aware AI responses

aptic stimulus on every key moment

he world, finally described

Insight gives blind people the freedom to know their world.