1 of 13

Team Duck Typing

Quinn Farquharson, Robert Horvath, Linus Kim, John Lyle

December 2024

Vision-Based Agent Selection and Control

Detecting user gaze to select and control multiple agents with a single controller and eye tracking

2 of 13

Project Overview

High level objective:

A system where a user can control multiple agents via a single controller, using visual feedback to determine the agent of interest

Functionality (requirements) breakdown:

Visualization and interfaces
Agent setup & controls
Agent selection
Gaze detection

3 of 13

Program flow/Approach

OpenCV

Dlib

Numpy

FilterPy

Numpy

Tkinter

Turtle

4 of 13

Visualization & Interfaces

Key Features:

Single-Window Interface
Interactive Agent Control
Dynamic Visual Feedback
Customizable Controls
Instruction Window
Approachable to users

5 of 13

Agent setup & controls (turtle)

Key Features:

Canvas coordinates to match screen coordinates
Arrow keys to control / (+) & (-) keys to speed up & down
Boundaries declared
Separate “agent state” function
Return locations of agent when called

6 of 13

Agent Selection

Two Modes

Velocity based
Position based

7 of 13

Gaze Detection

Split into four sections:

Initialization

Pull display configuration
Pull webcam configuration

Data (webcam) input

Take snapshots from calibration

Processing and transformation

Process images and create generalized transform

Application

Apply transform to requests

8 of 13

Gaze Detection

Technical Overview:

Screeninfo and OpenCV interface with hardware�
Dlib converts images to histogram oriented gradients (HOG)�
Dlib uses a pre-trained linear support vector machine (SVM) classifier�

Uses fitted hyperplanes to separate�HOG outputs into predefined groups�
Identifies location of facial features�

Transform outputs for subsequent requests with linalg

9 of 13

Gaze Detection

10 of 13

Demo

Github - Duck Typing

11 of 13

Next Steps

Split into ROS2 package(s)
Collect data for a binary classifier
Expand use cases outside of turtles

12 of 13

Resources and Links

Note: all images and text hyperlinked to resource

13 of 13

Presentation Requirements

Each team will make a final presentation to the class that summarizes your:

Project objectives
Project requirements
Approach

What packages were used?
What algorithms or capabilities were created?
How effort was delegated.

Project results

This can can take multiple forms including a demo, video, graphs, student participation, or other substantial results

Project links

Where people can access/download functional code
Documentation on how to use the code.

Basics

Presentations must be

(Graduate Students) between 8 and 10 minutes.
(Undergraduate Students) between 8 and 10 minutes
Q&A will happen after the presentation(s)
Next team sets up during Q&A of previous team.

Presentations are during the class periods on the course schedule.
The material you must include is outlined above.
All team members need to play a role in the presentation