Project_Plan(Mad_Max)

	A	B	C	D	E	F	G	H
1	Project-->	Advanced Fake News Detection System		TimeLine: 2-3 weeks				To connect the webpage to the model we chose flask as a web development framework, which is deployed on Amazon Web Services (AWS) instance. The input received from the website will be given to the AWS-EC2 instance. This instance has all required files of machine learning models, flask files, and the front-end web server. Once the required models are dumped from the machine learning model we will further use only those dumped models in the flask file.
2
3		Selected Base paper:	https://drive.google.com/file/d/1LS-0QMCthAGMeHBbSoRCSmv9RAgcD3qv/view?usp=share_link
4					WebPage--> StreamLit
5
6				Project Phase	Tasks	Tools	ETA(Tentative)
7				Information Retrival	1.Using web crawler data collection,datasets using Beautifulsoup (https://www.crummy.com/software/BeautifulSoup/bs4/doc/),Storing them has train and test data.	StreamLit FrameWork + Beautiful Soup	12Hrs
8	WorkFlow:	https://docs.google.com/presentation/d/1E2xFxU36X9gomczbGzZ0IJ99HE9R3kT-/edit?usp=share_link&ouid=102380624136599057163&rtpof=true&sd=true				StreamLit FrameWork + Beautiful Soup	12Hrs
9						NLTK -Library -PorterStemmer -Stopwords -WordCloud
10				Data Preprocessing	1.Stop Word Removal - are more common and hold less useful information.(Natural Language Toolkit – (NLTK) library to remove stop word)		6Hrs
11
12					2.Punctuation Removal(tokenization)
13					3.Stemming(tokenization)
14
15				NLP	Word Vector Representation-->for feature selection and Before using a machine learning system, text must be translated into numbers.		12-18Hrs
16
17					1. TF-IDF Vectorizer.(TF-IDF uses word frequency to identify words that are more important (occur more frequently) in a document. The TF-IDF Vectorizer converts documents into tokens, learns vocabulary, inverses document frequency weightings, and allows you to cipher new documents.
18					2. Countvectorizer (The Count Vectorizer creates an encoded vector that comprises the full vocabulary’s length as well as the frequency with which each word appears in the document.)
19
20				Model Building & Training	Algorithms-->(Train the model with train data)-- Aggregation of model outputs is considered		24Hrs
21
22					1.Logistic Regression
23					2.Decision Tree
24					3.Random Forest
25
26
27				Predicting the Output	Based on Classification Metrics		6Hrs
28					-Confusion matrix
29					-Accuracy
30					-Precision
31					-Recall
32					-F1 score
33
34				Mitigate the Impact of Fake News	If Output is FAKE-->		8-12Hrs
35					auto-populate the spreader's inbox with Official News using email bots or email filtering tools
36					Note:-(Still need to figure out, how to identify the original source of News)
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100