1 of 19

Sign Language Detection

Henry Hong, Alan Wang, Steven Zeng

2 of 19

Meet The Team

Alan Wang

4th Year

EECS

Domain Expansion:

Unlimited Void

Henry Hong

2nd Year

Data Science + CogSci

Domain Emphasis: Applied Math + Cognition

Steven Zeng

3rd Year

EEP + Data Science

Domain Emphasis:

Ecology + Environment

3 of 19

  1. Context
  2. Comp Vision
  3. CNN
  4. Random Forest
  5. Next Steps
  • Agenda

4 of 19

Computer Vision

& ASL Classification

5 of 19

Why Comp Vision and ASL?

  1. It’s lit and improving accessibility for folks who need it is also good
  2. Easy way to test the model w/ASL
  3. Decided to focus on ASL bc more and more folks in our gen are experiencing deafness at an early age
    1. 11.5+ ppl in US experience hearing impairment
    2. Shortage of ASL interpreters
    3. Employment barriers for the deaf community
    4. Improving accessibility is good

6 of 19

Our Dataset

  • Based on the MNIST database (Modified National Institute of Standards and Technology database)
  • ImageDataGenerator

7 of 19

Processing

  • Bound a box around the hand
  • Resize to 28x28 so it’s compatible with MNIST data
  • Normalize then feed the beast!

8 of 19

Convolutional Neural Networks

Deep Learning Model for Processing Grid-like Data:

  1. Kernels convolve across the input images, enabling the model to learn features
  2. After convolution, a nonlinear activation function is applied to the feature map
  3. Pooling layers reduce the size of feature maps, helping to prevent overfitting
  4. After multiple epochs of training, network uses fully connected layers to combine learned features and make classifications

9 of 19

CNN Results

10 of 19

Convolutional Neural Net Demo

11 of 19

The Problem

  • Almost always predicts P or F
  • Why?
    • Data Problems:
      • Testing environment varies from the data we collected
      • Low resolution image training data
    • Model Problems:
      • Varying Aspect Ratios based on hand size
      • Grayscale and size standardization
      • May have overfit to precise pixel patterns
      • With limited data, CNNs tend to overfit

12 of 19

New Data!

13 of 19

Processing

  • Utilize Mediapipe to generate 21 key points and track their (x, y, z) coordinates

14 of 19

New Model: Random Forest

15 of 19

Random Forest Demo

16 of 19

What Went Better

  • Landmark points are more precise than our bounding box
  • Nature makes it less prone to overfitting compared to CNN
  • Reduced model complexity and ease to train on the limited data we generated

17 of 19

Next Steps

18 of 19

Next Steps

  • Motion! A lot of ASL includes shorthand motions; this means that our model would have to be able to read not only one frame at a time, but multiple frames and still be able to interpret the meaning
  • More data
    • Current model works very well with front facing angle, but real life scenarios call for the model to be able to interpret ASL from a variety of camera angles
  • Incorporate multiple models at a time for better detection efficiency?
  • Improve domain adaptation (so it can be used in diff scenarios due to the lighting, environment, and background.)

19 of 19

Thank you!