1 of 29

ADHD, Aspergers, Depression, OCD, and PTSD Classification

Kefan Yu, Xinyu Li, Yixiang Cheng, Zezhen Liu

2 of 29

Background

Intro to Project

EDA & Objective

Cleaning Process

Model: Naive Bayes

Model: Bert

Demo

Conclusion

Model: Doc2Vec

3 of 29

Background

A lot of people seek for answers and solutions in Reddit trends under various categories.

Specifically, in the mental health trends, people might tend to find out the name of mental disorders including ADHD, OCD, etc. based on Reddit users’ posts.

ADHD？

OCD？

4 of 29

Therefore, we decide to build a classification model based on such specific Reddit trend. When people are uncertain about whether they have such mental disease, they don’t need to waste their time viewing thousands posts from Reddit trends.

In this project, we are going to focus on ADHD, Aspergers, Depression, OCD, and PTSD.

5 of 29

Data Gathering

ADHD

OCD

Aspergers

Depression

PTSD

8 of 29

ADHD

OCD

Aspergers

Depression

PTSD

9 of 29

Objective

We are going to make classification into the category of ADHD, Aspergers, Depression, OCD, and PTSD depending on the users’ descriptions.

10 of 29

Cleaning Process

The dataset we use is the reddit_mental_health_post from hugging face

Containing several CSV files of specific mental disease with roughly 30000 data points each

Merging
Reducing Dimensions
Merge tile and body as one document
Merge five files into one
Drop missing value

11 of 29

Missing Value

12 of 29

Naive Bayes

13 of 29

Naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features.

Data pre-processing: removing punctuations, removing stopwords, tokenization, stemming, lemmatization
BOW
Use Multinomial NB to train the classification model.

14 of 29

Baseline Result

16 of 29

Doc2Vec + Classifier

Data pre-processing
Use Doc2Vec to obtain dense vector which is trained to predict words in the document.
Use logistic regression classifier to classify the text into ADHD, Aspergers, Depression, OCD, and PTSD categories.

18 of 29

BERT fine tuning

19 of 29

Data pre-processing for bert

Bert-base-cased will gives 768 features to each token
Each contains all the informations in the sentences

Therefore, we don’t want the text to be too long.

For example:

“<cls>Benefits for autistic people in the UK&What, if any, benefits are we entitled to?<sep>PTSD<Sep>”

20 of 29

BERT tokenizer

Bert tokenizer select words features
Bert tokenizer use python double for loop structure

Thus, Bert tokenizer is very slow~

21 of 29

BERT Fine tuning

Data pre-processing
Use Doc2Vec to obtain dense vector which is trained to predict words in the document.
Use logistic regression classifier to classify the text into ADHD, Aspergers, Depression, OCD, and PTSD categories.

We copy the

Bert.encoder —>

Plus Bert.hidden

<-We give a nn.Linear(768,5)

<-bert.encoder(: , 0 , :)

22 of 29

BERT Fine tuning—V1

model_name: bert-base-cased

--train_file ftr.csv: -50%

--validation_file fva.csv: -25%

--test_file fte.csv: -25%

--do_train

--do_predict

--max_seq_length: 128

--train_batch_size: 32

--learning_rate: 2e-5

--num_train_epochs: 3(this is not a big data set)

23 of 29

BERT Fine tuning—V1

We got a accuracy of 75% Plus F1 score 73%

24 of 29

BERT Fine tuning—V1

model_name: bert-base-cased

--train_file ftr.csv: -50%

--validation_file fva.csv: -25%

--test_file fte.csv: -25%

--do_train

--do_predict

--max_seq_length: 256

--train_batch_size: 32

--learning_rate: 2e-5

--num_train_epochs: 3(With more budget we can make it to 20+)

<<Money is all you need>>

25 of 29

BERT Fine tuning—V2

We still got a accuracy of 75% Plus F1 score 74%

27 of 29

Conclusion & Future work

We get a good accuracy in predicting OCD,PTSD, Depression, ADHD, Aspergers
Use more of Computational Resources in future larger dataset
Use more professional Labelled Dataset from more academic resources to improve accuracy.
Use other Bert Family Model to train the Data

1 of 29

2 of 29

3 of 29

4 of 29

5 of 29

6 of 29

7 of 29

8 of 29

9 of 29

10 of 29

11 of 29

12 of 29

13 of 29

14 of 29

15 of 29

16 of 29

17 of 29

18 of 29

19 of 29

20 of 29

21 of 29

22 of 29

23 of 29

24 of 29

25 of 29

26 of 29

27 of 29

28 of 29

29 of 29