1 of 19

Sign Language Detection

Henry Hong, Alan Wang, Steven Zeng

2 of 19

Meet The Team

Alan Wang

4th Year

EECS

Domain Expansion:

Unlimited Void

Henry Hong

2nd Year

Data Science + CogSci

Domain Emphasis: Applied Math + Cognition

Steven Zeng

3rd Year

EEP + Data Science

Domain Emphasis:

Ecology + Environment

3 of 19

Context
Comp Vision
CNN
Random Forest
Next Steps

Agenda

4 of 19

Computer Vision

& ASL Classification

5 of 19

Why Comp Vision and ASL?

It’s lit and improving accessibility for folks who need it is also good
Easy way to test the model w/ASL
Decided to focus on ASL bc more and more folks in our gen are experiencing deafness at an early age

11.5+ ppl in US experience hearing impairment
Shortage of ASL interpreters
Employment barriers for the deaf community
Improving accessibility is good

6 of 19

Our Dataset

Based on the MNIST database (Modified National Institute of Standards and Technology database)
ImageDataGenerator

7 of 19

Processing

Bound a box around the hand
Resize to 28x28 so it’s compatible with MNIST data
Normalize then feed the beast!

8 of 19

Convolutional Neural Networks

Deep Learning Model for Processing Grid-like Data:

Kernels convolve across the input images, enabling the model to learn features
After convolution, a nonlinear activation function is applied to the feature map
Pooling layers reduce the size of feature maps, helping to prevent overfitting
After multiple epochs of training, network uses fully connected layers to combine learned features and make classifications

9 of 19

CNN Results

10 of 19

Convolutional Neural Net Demo

11 of 19

The Problem

Almost always predicts P or F
Why?

Data Problems:

Testing environment varies from the data we collected
Low resolution image training data

Model Problems:

Varying Aspect Ratios based on hand size
Grayscale and size standardization
May have overfit to precise pixel patterns
With limited data, CNNs tend to overfit

12 of 19

New Data!

13 of 19

Processing

Utilize Mediapipe to generate 21 key points and track their (x, y, z) coordinates

14 of 19

New Model: Random Forest

15 of 19

Random Forest Demo

16 of 19

What Went Better

Landmark points are more precise than our bounding box
Nature makes it less prone to overfitting compared to CNN
Reduced model complexity and ease to train on the limited data we generated

17 of 19

Next Steps

18 of 19

Next Steps

Motion! A lot of ASL includes shorthand motions; this means that our model would have to be able to read not only one frame at a time, but multiple frames and still be able to interpret the meaning
More data

Current model works very well with front facing angle, but real life scenarios call for the model to be able to interpret ASL from a variety of camera angles

Incorporate multiple models at a time for better detection efficiency?
Improve domain adaptation (so it can be used in diff scenarios due to the lighting, environment, and background.)

19 of 19

Thank you!