1 of 10

Using Vector Embeddings to Predict Bike Races

2 of 10

Outline

  1. What are vector embeddings?
  2. Application to road cycling

3 of 10

What are vector embeddings?

4 of 10

Intro to Vector Embeddings

  • Sometimes deal with non-numeric data
    • E.g. Words, movies, songs
  • How to represent these so we can model them?
    • Need to convert to numbers
  • Solution: Give each item a vector of k numbers
  • Similar vectors imply similar items
    • Similar = high dot-product

5 of 10

Example 1: Word2Vec

  • Q: How to use text in model?
  • A: convert words to vectors
  • How to define similarity?
  • Make probability of two words appearing together in a sentence proportional to their dot product

6 of 10

Example 2: Netflix

  • Q: Which movie should I recommend to a user?
  • A: convert movies and users into vectors
  • Make probability of a user watching a certain movie proportional to their dot-product

7 of 10

Example 3: CLIP

  • Q: How to connect images and their captions?
  • A: convert images and captions into vectors?
  • Make real image-caption pairs have high dot-product

8 of 10

Vector embeddings for cycling race prediction

9 of 10

Predicting Cycling Races

  • Different races suit different riders
    • E.g. flat vs mountainous terrain suit different riders’ strengths
  • Current literature hard-codes different features from races that are known to be similar
    • Problem: narrow scope, hard to scale

10 of 10

My Idea

  • What if each rider and each race had their own embedding?
  • If rider does well at race, they have high dot-product
  • Scraped historical race results from CQRanking.com
  • Minimize MSE between dot-product and number of ranking points earned by riders in races