1 of 25

ML Problems from CS 441 Students

CS 441 - April 27, 2023

2 of 25

[Problem] by [name]

  1. What you are solving and why – use visual examples if possible
  2. What are the main challenges (ML and otherwise)
  3. Specific questions that you would like addressed/discussed in lecture

This can be in 1-3 slides. Please be present in class on April 27 to talk through this and answer clarification questions. Practice explaining – it should take about 5 minutes for the initial explanation

Template: do not change – copy and paste

3 of 25

Identifying CRACKS in concrete bridges with Radar sensing and Machine Learning

A radar sends EM wave and receives a reflected signal that contains information. →

Is this signal from part 1, 2, or 3?

Time (ns)

by Ishfaq Aziz

4 of 25

Each scan-signal is a datapoint which correspond to one class

Problem:

  • With a given scan-signal, can we predict which class it refers to?

Dataset:

  • Real data from 5 different bridges with labelled ground truth

  • Total data = 500,000 signals
  • Class-1 : 71 %
  • Class-2 : 24 %
  • Class-3 : 05 %

→ Imbalanced dataset

Target:

Train on data from 4 bridges and test on data from 5th (zero-shot)

5 of 25

Even after these, models are biased towards the major class (Class-1) →

Models tried:

  • Random Forest & Logistic regression with features (Freqs., PCA, etc.) as input.
  • 1-D CNN with raw signal as input.

Techniques tried for class balancing:

  • Oversampling minor class
  • Undersampling major class
  • Weighted loss function

Input (signal/features)

Output

(class probabilities)

6 of 25

Ideas from class

7 of 25

More ideas

  • Amplitude and maybe phase of frequencies over time seems most relevant – use 1D Discrete Cosine Transform (DCT) as the raw representation
  • Normalize for each bridge
    • Subtract median features from raw features
    • Represent the probability of seeing a set of DCT values at a particular time
  • Reformulate as detection of depth of crack
  • Clustering or mixture modeling may help

8 of 25

9 of 25

Recruiting Currently

Post a job

Manually filter 600-800 applicants

Interview top candidates

Extend job offer

1

2

3

4

10 of 25

The Solution

Hatch is an AI recruitment platform that allows recruiters to rank hundreds

of applicants in seconds.

11 of 25

Link Search: candidate github, linkedin, personal portfolio

12 of 25

What have we considered: What we are exploring:

  • Using the Word2Vec model with either SkipGram or CBOW.
    • Which is the better choice for resume parsing?
  • Then apply the model to the resumes and job descriptions:
    • We do we scale the resultant vectors equally. How?
  • And apply cosine similarity between the “job vector” and “student vector” to rank the resulting resume vectors:
    • Is cosine similarity the best similarity measure? What other comparison metrics can we use?
  • How do we build the dataset? We have some beta testers (startup founders, recruiters), but what sort of data should we collect from them? (ranked resumes as truth labels?)
  • Are there any foundational models we can start with comparable runtimes and smaller scale?
  • How do we eliminate recruiter bias?
  • Anything we’re missing when considering this problem!

13 of 25

Ideas from class

14 of 25

More ideas

  • Sounds like you have a resume parser, but just in case https://github.com/hxu296/nlp-resume-parser
    • Once you know what works, you can use it directly or to inform an approach
  • BERT may be good starting point for representation: inputs are json for job description and json for resume
  • Building dataset – very tricky.
    • One approach is to build a useful non-automated approach that you provide for free and gather ratings of candidate quality and job fit
    • Can use RL during deployment
    • May want to maintain both general model and client-specific model
    • Can predict whether the person is selected for interview, offered, hired – independent recruiters can offer a lot of data
  • Bias – consider providing separate scores for different subgroups, various ideas in Model Cards are relevant

15 of 25

Vessel Route Prediction by Marcus Kornmann

https://www.vesselfinder.com/

  • Automated Information System (AIS)
  • Vessels send dynamic and static information in fixed intervals
  • Position, vessel specifications (length, width, …), status

16 of 25

Vessel Route Prediction

Goal

  • Detect areas of high traffic
  • Predict the route of a vessel given its previous movement
  • Use information for route planning for autonomous sailing boat

17 of 25

Vessel Route Prediction

Challenges

  • Size of dataset: > 6 million data points per day
    • How to efficiently read large datasets (even using data for a specific region requires reading all the data once)?
  • Very unreliable data:
    • Some of the information is added manually => high error quotes (e.g. ship length / width)
    • Some specifications are ambiguous (e.g. status)
    • Outliers (unreasonable) values for vessel movement

Approach

  • Clean data, divide tracks into subtracks, perform line-reduction (define tracks by turning points), perform clustering

18 of 25

Ideas from class

19 of 25

More ideas

  • Track flow of traffic instead of points?
    • Model density at each tile, and probability of transitioning between tiles
    • Potentially, consider longer history, e.g. next tile given previous three tiles

20 of 25

Improving Crystallizer Effluent Particle Size Distribution Predictions with Hybrid Mechanistic/ML Process Models

by Sam Aguiar

21 of 25

Can we combine mechanistic models with ML to help better predict changes to crystallizer effluent under non-ideal conditions?

Available Data

Currently accounted for in Model

•Ion concentrations

•Particle Size Distribution in reactor

Partially included in Model

•Flow rates or Mixing Conditions

•Temperature

Not Included in Model

•Organic additives

•Solids content

•Coprecipitation

•Fluid dynamics

22 of 25

How can we implement a Hybrid Model?

23 of 25

Ideas from class

24 of 25

More ideas

  • May have some relationship to using RL to solve for kinematics e.g. this

25 of 25

Coming up

  • Last class on Tuesday
    • Summarizing key ideas from semester
    • Where to learn more
    • Future of machine learning

  • Final project due next Wed

  • Final exam on May 9