MIT Smart Confessions
What are MIT Confessions?
Problem Statement
Three problems that we wanted to tackle:
Data & Challenges
Data From MIT Confessions Pages
Data From MIT Confessions Pages
Observations from the Data
Graphs of the Data: Data is extremely skewed (4,555 examples in total)
Graphs of the Data: Highest Reaction Counts Happen Below 200 Characters
Machine Learning Models
Two Models: Predictor and Generator
Bucket Classifier
Text Input
Bucket Classification Model
Buckets Probabilities
Text
Embedding (32)
Conv1D
ReLU
MaxPooling1D
Dropout (0.1)
Flatten
Dense (64)
ReLU
Dropout (0.1)
Dense (BC)
Softmax
Text embedding converts each word index into a vector of width size 32.
BC is the number of buckets, which varies depending on the reaction type.
The text input are a list of integer where each integer represents a word.
LSTM Generator
Text Input
LSTM Classification Model
One-Hot Vector for Word Index
Text
Embedding (300)
LSTM (300)
Dense (WC)
Softmax
Text embedding converts each word index into a vector of width 300.
WC is the number of words, so the output here is a one hot vector. Ideally, we should output a text embedding vector.
The text input are a list of integer where each integer represents a word.
We have an LSTM with 300 as the dimensionality of the output space.
Training Results
Results on Bucket Classifier
Results on LSTM Generator
Some Results
Bucket Classifier Results
These text inputs were taken from the current MIT confessions page. The output is the highest predicted bucket. The model can definitely be improved.
---
---
---
LSTM Generator Results
---
---
---
NOTE: Model may have been re-trained and modified after generating these, so it may not output the same confessions given these seeds
LSTM Generator Results: It is Sometimes Sensible
Future Work & Extensions
Future Work & Extensions
Thanks
Project Mentor: Yaakov Helman
Industry Mentor: Charles Tam
Github Website: mit-smart-confessions-website
Github API: mit-smart-confessions-api