ABCDEFGHIJKLMNOPQRSTUVWXYZAA
1
Project-->Advanced Fake News Detection SystemTimeLine: 2-3 weeksTo connect the webpage to the model we chose
flask as a web development framework, which is deployed
on Amazon Web Services (AWS) instance. The input received from the website will be given to the AWS-EC2
instance. This instance has all required files of machine
learning models, flask files, and the front-end web server.
Once the required models are dumped from the machine
learning model we will further use only those dumped
models in the flask file.
2
3
Selected Base paper:https://drive.google.com/file/d/1LS-0QMCthAGMeHBbSoRCSmv9RAgcD3qv/view?usp=share_link
4
WebPage--> StreamLit
5
6
Project PhaseTasksToolsETA(Tentative)
7
Information Retrival1.Using web crawler data collection,datasets using Beautifulsoup (https://www.crummy.com/software/BeautifulSoup/bs4/doc/),Storing them has train and test data.StreamLit FrameWork + Beautiful Soup12Hrs
8
WorkFlow:https://docs.google.com/presentation/d/1E2xFxU36X9gomczbGzZ0IJ99HE9R3kT-/edit?usp=share_link&ouid=102380624136599057163&rtpof=true&sd=true
9
NLTK -Library -PorterStemmer -Stopwords -WordCloud
10
Data Preprocessing1.Stop Word Removal - are more common and hold less useful information.(Natural
Language Toolkit – (NLTK) library to remove stop word)
6Hrs
11
12
2.Punctuation Removal(tokenization)
13
3.Stemming(tokenization)
14
15
NLPWord Vector Representation-->for feature selection and Before using a machine learning system, text must be translated into numbers.12-18Hrs
16
17
1. TF-IDF Vectorizer.(TF-IDF uses word frequency to identify words that are
more important (occur more frequently) in a document.
The TF-IDF Vectorizer converts documents into tokens,
learns vocabulary, inverses document frequency weightings, and allows you to cipher new documents.
18
2. Countvectorizer (The Count Vectorizer creates an encoded vector that
comprises the full vocabulary’s length as well as the frequency with which each word appears in the document.)
19
20
Model Building & TrainingAlgorithms-->(Train the model with train data)-- Aggregation of model outputs is considered24Hrs
21
22
1.Logistic Regression
23
2.Decision Tree
24
3.Random Forest
25
26
27
Predicting the OutputBased on Classification Metrics6Hrs
28
-Confusion matrix
29
-Accuracy
30
-Precision
31
-Recall
32
-F1 score
33
34
Mitigate the Impact of Fake News
If Output is FAKE-->8-12Hrs
35
auto-populate the spreader's inbox with Official News using email bots or email filtering tools
36
Note:-(Still need to figure out, how to identify the original source of News)
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100