AI-DL Session 3 - Q&A Report

	A	B	C	D	E	F	G
1	Question Report
2	Report Generated:	Jun 14, 2020 1:03 AM
3	Topic	Webinar ID	# Question	Actual Start Time	Actual Duration (minutes)
4	Artificial Intelligence and Deep learning Course by IIT Roorkee	962 3602 6138	175	Jun 13, 2020 7:33 PM	217
5	Question Details
6	#	Question	Answer(s)	Asker Name	Asker Email
7	1	are you going to cover multi task network or multi task loss ?	I did not get your question ..could you please elaborate ..we will sover RMSE as cost function in this session	Arpit vijaywargiya	arpitvw16@gmail.com
8	2	Why do we need Box plot when we have histogram for same visualisation?	live answered	Sugandhita	sugandhitap@gmail.com	Box plot is another way of looking at data especially on an interval scale
9	3	in object detection we have multi task network, which detect obect and give bounding box,		Arpit vijaywargiya	arpitvw16@gmail.com
10	4	so there will be 2 losses one is for classification and other is for bounding box		Arpit vijaywargiya	arpitvw16@gmail.com
11	5	What information do we get out from skewness and modality?	machine learning algorithms can not learn properly if the data is tail heavy or skwed…we will discuss the reason in Training models chapter	Satyabrat Sabat	satya.jin@gmail.com
12	6	Q2) generally we use Z-value (depends on median ) for P-value, can we drive same thing with mode , as it is robust for outliers		Arpit vijaywargiya	arpitvw16@gmail.com
13	7	what is the impact of non distributed feature on model creation?	live answered	sudhir shetty	sudhir.m.shetty@gmail.com
14	8	*(depends on mean)		Arpit vijaywargiya	arpitvw16@gmail.com
15	9	*drive things from Median		Arpit vijaywargiya	arpitvw16@gmail.com
16	10	Hi, May I know which books we are referring please for current diagrams? This si something I would like to read in details too. I purchased O'reilly one on Hands-On Ml and great help connecting last two classed but these histograms etc not there.	These are the general concepts …we did not follow any book for the same.	Rajiv	krajiv.2018@gmail.com
17	11	How the Variance is diff from SD	Standard deviation and variance both show spread. Standard deviation is the square root of variance	Rajeev	rajeev213149@gmail.com
18	12	error miimise		Chinmay Athavale	chinmayat@gmail.com
19	13	same unit as data		sudhir shetty	sudhir.m.shetty@gmail.com
20	14	otehr wise data will get canceled		Dr. santoshkumar	ksantosh.11@gmail.com
21	15	beacause there is both + and - diff		409992	supriya.bms@gmail.com
22	16	because the difference can be negative and positive		Shalini Gupta	gupt.shalu1993@gmail.com
23	17	so that difference won't cancel		kunal upadhyay	kupadhy@gmail.com
24	18	ok, thank you		Rajiv	krajiv.2018@gmail.com
25	19	What is standard deviation? What is the difference between standard deviation and variance?	Standard deviation and variance both show spread. Standard deviation is the square root of variance	Anantpadmanabh Divanji	apgd14@gmail.com
26	20	Squaring adds more weight to the larger differencs		Anmol Khopade	anmolck@gmail.com
27	21	This is important when points further than the mean are important		Anmol Khopade	anmolck@gmail.com
28	22	what is the diffrence between bias and variance ?	The bias error is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting). Whereas, the variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).	Vikas Bhartiya	ghivikas@gmail.com
29	23	Is mean and median being equal or closer to each other is the way of determining a distribution is normal or not?	Normal distributions are symmetric, unimodal, and asymptotic, and the mean, median, and mode are all equal.	Puneet Rsstogi	puneetrstg@gmail.com
30	24	Why Normal Distribution is imp?	A lot of physical phenomenon follow normal distribution. We have built understanding and modelling capabilities for them.	Rajeev	rajeev213149@gmail.com
31	25	does variation of mode vs median tell us anything	A difference in mode and median typically means there is some skewing in the data. It calls for checking for outliers, long or fat tails, etc	Anmol Khopade	anmolck@gmail.com
32	26	how do we know if a dataset follows a normal distribution?	For normal distribution the mean, median, mode are equal. To test for noramlity we can find the skewedness of the data.	Nini Nursiah	nursiah.neelesh28@gmail.com
33	27	what is z-value?	A z-score gives you an idea of how far from the mean a data point is. Tt's a measure of how many standard deviations below or above the population mean a raw score is.	VED	parmarvedpro5@gmail.com
34	28	Why do we take root of it?	By taking the root the unit of error is same as the observation	Senthilnathan Ramaswami	senthilnathan.ramaswami@servicenow.com
35	29	When do we use RMSE and MSE?	Hi Srini, this is a good question ..can you please post it on forum and we would explain it in detail	Srini Boddu	siliconfish@yahoo.com
36	30	what is the relation between variance and z-value	z = (x - mean)/ (standard deviation)	VED	parmarvedpro5@gmail.com
37	31	how many lines will the algorithm try? As there are infinite way to assign w)0 and w1?	We will learn this in upcoming chapters while discussing gradient descent	Nini Nursiah	nursiah.neelesh28@gmail.com
38	32	What is W0 and W1 here ?	These are coefficients	Nitin Nigam	nknigam@gmail.com
39	33	For different lines drawing, are we using different data?	No, the algorithm tries to find the line which best fits the available data. The best fitting line is the one with lowest RMSE.	Veeru(VeeraNancharaiah Javvaji)	jsriveeru@gmail.com
40	34	I meant in which scenario, do you choose MSE and in which one you use RMSE?	I think it is already answered	Srini Boddu	siliconfish@yahoo.com
41	35	i didn’t get why pam score is better ?		Anonymous Attendee
42	36	what does negative sign indicate in output of describe method ?		jia sharma	jiavidhi.sharma@gmail.com
43	37	what are the x axis and y axis in hist plot	X is the value and Y axis has the frequency of the value	Rajeev	rajeev213149@gmail.com
44	38	More towards left - near 34		Rajiv	krajiv.2018@gmail.com
45	39	what does bins represent?		Venkata Pradyumna	ivpradyumna@gmail.com
46	40	Latitude bimodal and longitude bimodal—> LA and SF?	Yes, more dense population in certain areas.	Domenico Fioravanti	nicodom@gmail.com
47	41	riht skewed		Dr. santoshkumar	ksantosh.11@gmail.com
48	42	one outlier and skewed		Sourav Ghosh	souravghosh@hotmail.com
49	43	How the peak will help in observation? is there anything do you see in peak value?	The peak, skew etc are methods for us to understand the data better. Having an intuitive sense of what the data is and what effect will be goes long way in building a good model.	Rajiv	krajiv.2018@gmail.com
50	44	LA = 34.0522° N, 118.2437° W SF =37.7749° N, 122.4194° W (yes I confirm)		Domenico Fioravanti	nicodom@gmail.com
51	45	What does capping at 15 means?	That the income was capped at 15 ..all the values greater than 15 was capped to 15 …Hope it answers your question	Aswin Sabaaree	aswin.sabaaree@innovatia.net
52	46	what is the reason for spike?		Sugandhita	sugandhitap@gmail.com
53	47	does X has to be normally distributed or Y or both?	TYpically Y is. X usually is not	Sourav Ghosh	souravghosh@hotmail.com
54	48	Caped means that the value grated then the caped value are converted to the caped value like median_house_value > 500000 will be converted to 500000		DIvya Pathak	dev.feb88@gmail.com
55	49	why is it easier to perform ML on normal data?	We will learn this in the upcoming chapters	Nini Nursiah	nursiah.neelesh28@gmail.com
56	50	How the ratio 80/20 was chosen? Is there a reason?	depends on the size of data ..we use either 80:20 or 70:30	Domenico Fioravanti	nicodom@gmail.com
57	51	why is outliers removed if they a good in numbers. Is it not a problem of data collection?	Outliers is often removed to make modelling easier. Outliers are a real thing in most data collection exercises	Krishna Mohan	kmiitan96@gmail.com
58	52	How do we convert attributes to bell curve shaped ?	All attributes may not be convertibile to a bell shaped curve	Srihari M	srihariblr12@gmail.com
59	53	How to decide %s taken fro Training set Vs Test Set? Like you took 80% for training and 20% for test set. what are various things I need to think for it.	Typically we choose 20% for the test set. For large dataset we can choose less % for test data.	Rajiv	krajiv.2018@gmail.com
60	54	There is a terminology Out of Time validation. Is it same thing as Test set?	The out-of-time validation sample contains data from an entirely different time period or customer campaign than what was used for model development. Validating model performance on a different time period is beneficial to further evaluate the model's robustness.	Puneet Rsstogi	puneetrstg@gmail.com
61	55	What does capping at 15 means?		Aswin Sabaaree	aswin.sabaaree@innovatia.net
62	56	how do you know which data goes to training set and which data will go to test set?	live answered	Stuti Rastogi	e0498211@u.nus.edu	We want the same type of data in training and test. The best way is often to do some random distribution
63	57	Does the Caping means that the value grated then the caped value are converted to the caped value like median_house_value > 500000 will be converted to 500000 ?	That is correct	DIvya Pathak	dev.feb88@gmail.com
64	58	@TAs - When do we use MSE and when do we use RMSE?	Srini, I am not sure if there is a perfect answer. Both of them are a measure of the error. The MSE may penalise the error more severely than the RMSE, which may help some times and maybe a detriment other times.	Srini Boddu	siliconfish@yahoo.com
65	59	Why is training not performed on 100% of data?	Because you need to test quality of the model you create. If you used it for training, the model will usually show a good result on that	Nikhil Sharma	nikhilthemacho@gmail.com	If you train on 100% data, you will have no data left to evalute your model on how it is performing.
66	60	As a human we can be bias but how come a ML algorithm is bias ?	If ML algorithm will learn from the biased data then ofcourse the final model will be biased :)	Vikas Bhartiya	ghivikas@gmail.com
67	61	What are different mechanism to split data? is Hash most famous?	We will cover diff techniques now	Rajiv	krajiv.2018@gmail.com
68	62	time based observations		Sourav Ghosh	souravghosh@hotmail.com
69	63	what is biased?		VED	parmarvedpro5@gmail.com
70	64	The training and test data might have same row if we use the ramdom permutation?	No the training data and test data are disjoint.	Nini Nursiah	nursiah.neelesh28@gmail.com
71	65	but can't the splitting be done initially and then models be run with same trainign and test set?		Preedesh M	Preedesh@gmail.com
72	66	Why 42 is passed to random.seed?	It is just a random number, you can choose a different one.	Anantpadmanabh Divanji	apgd14@gmail.com	You can use another number. The results may vary a little.	We can pass anny number as randomm seed …but make sure to use that number through out your model training process
73	67	Seed can be any values or based on any observation we need to consider ?	It can be any value	Satya Sunil	kvv.satyasunil@gmail.com
74	68	ok thanks Praveen		Anmol Khopade	anmolck@gmail.com
75	69	what exactly is Hash?		Puneet Rsstogi	puneetrstg@gmail.com		A hash is a function that converts one value to another. Hashing data is a common practice in computer science and is used for several different purposes. Examples include cryptography, compression, checksum generation, and data indexing
76	70	what is the relative advantages of using md5 vis-a-vis the other one pls?	Typically any hash that randomises sufficiently will do the work	Sourav Ghosh	souravghosh@hotmail.com
77	71	what is hash. Please elaborate	https://en.wikipedia.org/wiki/Hash_function Hashes are used to create a random representation of the data. This removes any systematic grouping like all data from the same area may be together.	Sugandhita	sugandhitap@gmail.com
78	72	why can't we just select the last 20% of the dataset for test set?	The last 20% may come from the same area and may not be representative of the entire data set	Nini Nursiah	nursiah.neelesh28@gmail.com	Then the data used for training the model will not be inclusive.
79	73	why do we need to append data at the end?		Aakash Sinha	post2aakash@gmail.com
80	74	can more comments be added into the python file to explain the code section ? reason is there are code sections which are alternative ways to achieve similar goal.	noted the feedback	Prakhar Prasad	prakhar.prasad@gmail.com
81	75	do we need Id column to create if we use train_test_split() ?	If we are using scikit-learn function then no ..it will split based on the row index I believe	Nitin Nigam	nknigam@gmail.com
82	76	what is the best way for getting the train and test data . Creating own function or using Sckit Learn Function	using scikit-learn function	DIvya Pathak	dev.feb88@gmail.com	You can do both. Use the Scikit Function is easier and usually better
83	77	Can we see an example of sampling bias?	Hi Sourav, is your question answered now? Prof just explained it again for Jia’s question	Sourav Ghosh	souravghosh@hotmail.com
84	78	sampling bias arises due to the way we collect the data right?	yes	Nini Nursiah	nursiah.neelesh28@gmail.com
85	79	Like in CNN's cant we take entire data and split it into 80-20 ratio where 80% is training and 20% validation and we refine based on testing accuracy. Random just seems too uncontrollable even with larger dataset		Anmol Khopade	anmolck@gmail.com
86	80	how do you take a sample of 100 out of 200 million adult population? But I do realize that they take the samples, from the whole population set? What is the process to in corporate the whole population?	Its not possible to incorporate the entire population usually. So you try to sample a smaller set for your excercise.	Srini Boddu	siliconfish@yahoo.com
87	81	This sampling bias exists when we accquired the data or it also exits while splitting test and train data?	We are going to show this	Puneet Rsstogi	puneetrstg@gmail.com
88	82	how is stratified samples different from cluster samples		Krishna Mohan	kmiitan96@gmail.com
89	83	what is sampling bias in the example shown ? please explain		jia sharma	jiavidhi.sharma@gmail.com
90	84	I got my answer - Thanks		Srini Boddu	siliconfish@yahoo.com
91	85	How do we measure if a status is large enough?		Domenico Fioravanti	nicodom@gmail.com
92	86	If know data in advance, we can categories the income		Veeru(VeeraNancharaiah Javvaji)	jsriveeru@gmail.com
93	87	for future data, how can ?	For near future term we can assume that the income will stay in similar range.	Veeru(VeeraNancharaiah Javvaji)	jsriveeru@gmail.com
94	88	*stratus		Domenico Fioravanti	nicodom@gmail.com
95	89	How do the prof determine the the income as the measure for stratified sampling? or did he pick up randomly?		Srini Boddu	siliconfish@yahoo.com
96	90	Wouldn’t it be better to use bins=[0., 2, 3.0, 4.5, 6., np.inf], to increase the size of the first stratus?		Domenico Fioravanti	nicodom@gmail.com
97	91	2 to 6 has most		AG	abhijeetgadgil@gmail.com
98	92	so how to represent 2 to 6 in right manner	Hi AG, did not get your question ..can you please elaborate	AG	abhijeetgadgil@gmail.com
99	93	If I look at bins we say 0 to 1.5 and then 1.5 to 3, but why are bins picked in 1.5 ranges in this example	There are a couple of way we can determine the optimum bin size. First, we find the smallest and largest data point, lower the minimum a little and raise the maximum a little, decide how many bins you need, divide your range (the numbers in your data set) by the bin size, and finally create the bin boundaries. Second is you can use Sturge’s Rule, which is K = 1 + 3. 322 logN, where K is the number of class intervals, N is the number of observations.	AG	abhijeetgadgil@gmail.com
100	94	why was numerical converted to categorical? I didn't understand that part	Categorical was converted numerical, to enable modelling	Nini Nursiah	nursiah.neelesh28@gmail.com