1 of 13

Team Duck Typing

Quinn Farquharson, Robert Horvath, Linus Kim, John Lyle

December 2024

Vision-Based Agent Selection and Control

Detecting user gaze to select and control multiple agents with a single controller and eye tracking

2 of 13

Project Overview

High level objective:

A system where a user can control multiple agents via a single controller, using visual feedback to determine the agent of interest

Functionality (requirements) breakdown:

  1. Visualization and interfaces
  2. Agent setup & controls
  3. Agent selection
  4. Gaze detection

3 of 13

Program flow/Approach

OpenCV

+

Dlib

Numpy

FilterPy

Numpy

Tkinter

+

Turtle

4 of 13

Visualization & Interfaces

Key Features:

  • Single-Window Interface
  • Interactive Agent Control
  • Dynamic Visual Feedback
  • Customizable Controls
  • Instruction Window
  • Approachable to users

5 of 13

Agent setup & controls (turtle)

Key Features:

  • Canvas coordinates to match screen coordinates
  • Arrow keys to control / (+) & (-) keys to speed up & down
  • Boundaries declared
  • Separate “agent state” function
  • Return locations of agent when called

6 of 13

Agent Selection

Two Modes

  • Velocity based
  • Position based

7 of 13

Gaze Detection

Split into four sections:

  • Initialization
    1. Pull display configuration
    2. Pull webcam configuration

  • Data (webcam) input
    • Take snapshots from calibration

  • Processing and transformation
    • Process images and create generalized transform

  • Application
    • Apply transform to requests

8 of 13

Gaze Detection

Technical Overview:

  • Screeninfo and OpenCV interface with hardware�
  • Dlib converts images to histogram oriented gradients (HOG)�
  • Dlib uses a pre-trained linear support vector machine (SVM) classifier�
    • Uses fitted hyperplanes to separate�HOG outputs into predefined groups�
    • Identifies location of facial features�
  • Transform outputs for subsequent requests with linalg

9 of 13

Gaze Detection

10 of 13

Demo

11 of 13

Next Steps

  • Split into ROS2 package(s)
  • Collect data for a binary classifier
  • Expand use cases outside of turtles

12 of 13

Resources and Links

Note: all images and text hyperlinked to resource

13 of 13

Presentation Requirements

Each team will make a final presentation to the class that summarizes your:

  • Project objectives
  • Project requirements
  • Approach
    • What packages were used?
    • What algorithms or capabilities were created?
    • How effort was delegated.
  • Project results
    • This can can take multiple forms including a demo, video, graphs, student participation, or other substantial results
  • Project links
    • Where people can access/download functional code
    • Documentation on how to use the code.
  • Basics
    • Presentations must be
      • (Graduate Students) between 8 and 10 minutes.
      • (Undergraduate Students) between 8 and 10 minutes
      • Q&A will happen after the presentation(s)
      • Next team sets up during Q&A of previous team.
    • Presentations are during the class periods on the course schedule.
    • The material you must include is outlined above.
    • All team members need to play a role in the presentation