ADHD, Aspergers, Depression, OCD, and PTSD Classification
Kefan Yu, Xinyu Li, Yixiang Cheng, Zezhen Liu
Background
Intro to Project
EDA & Objective
Cleaning Process
Model: Naive Bayes
Model: Bert
Demo
Conclusion
Model: Doc2Vec
Background
A lot of people seek for answers and solutions in Reddit trends under various categories.
Specifically, in the mental health trends, people might tend to find out the name of mental disorders including ADHD, OCD, etc. based on Reddit users’ posts.
ADHD?
OCD?
Therefore, we decide to build a classification model based on such specific Reddit trend. When people are uncertain about whether they have such mental disease, they don’t need to waste their time viewing thousands posts from Reddit trends.
In this project, we are going to focus on ADHD, Aspergers, Depression, OCD, and PTSD.
Data Gathering
ADHD
OCD
Aspergers
Depression
PTSD
EDA
Wordcloud
ADHD
OCD
Aspergers
Depression
PTSD
Objective
We are going to make classification into the category of ADHD, Aspergers, Depression, OCD, and PTSD depending on the users’ descriptions.
Cleaning Process
The dataset we use is the reddit_mental_health_post from hugging face
Missing Value
Naive Bayes
Naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features.
Baseline Result
Doc2Vec
Doc2Vec + Classifier
Results
BERT fine tuning
Data pre-processing for bert
Therefore, we don’t want the text to be too long.
For example:
“<cls>Benefits for autistic people in the UK&What, if any, benefits are we entitled to?<sep>PTSD<Sep>”
BERT tokenizer
Thus, Bert tokenizer is very slow~
BERT Fine tuning
We copy the
Bert.encoder —>
Plus Bert.hidden
<-We give a nn.Linear(768,5)
<-bert.encoder(: , 0 , :)
BERT Fine tuning—V1
model_name: bert-base-cased
--train_file ftr.csv: -50%
--validation_file fva.csv: -25%
--test_file fte.csv: -25%
--do_train
--do_predict
--max_seq_length: 128
--train_batch_size: 32
--learning_rate: 2e-5
--num_train_epochs: 3(this is not a big data set)
BERT Fine tuning—V1
We got a accuracy of 75% Plus F1 score 73%
BERT Fine tuning—V1
model_name: bert-base-cased
--train_file ftr.csv: -50%
--validation_file fva.csv: -25%
--test_file fte.csv: -25%
--do_train
--do_predict
--max_seq_length: 256
--train_batch_size: 32
--learning_rate: 2e-5
--num_train_epochs: 3(With more budget we can make it to 20+)
<<Money is all you need>>
BERT Fine tuning—V2
We still got a accuracy of 75% Plus F1 score 74%
V1
V2
Conclusion & Future work
Any Questions?
Thank You!