A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Question | Answer(s) | Asker Name | Asker Email | |||||||||||||||||||||
2 | Where are earlier recorded sessions posted pl | It will be in the My Course page | Go to My Courses, and you would find them under topic# 10 and 11 | AG | abhijeetgadgil@gmail.com | ||||||||||||||||||||
3 | Thanks | AG | abhijeetgadgil@gmail.com | ||||||||||||||||||||||
4 | I could not find yesterday’s session under topic 11 (Session-3 Recording) | Please send us an email with a screenshot, we will look into it. | Please check under topic 10 | https://cloudxlab.com/assessment/displayslide/5014/understanding-and-visualisation-of-data?course_id=84&playlist_id=464 | Kajal | kajalchatterjee@gmail.com | |||||||||||||||||||
5 | sure, will do. | Kajal | kajalchatterjee@gmail.com | ||||||||||||||||||||||
6 | Seed function | The seed method is used to initialize the pseudorandom number generator in Python. The random module uses the seed value as a base to generate a random number. if seed value is not present it takes system current time. if you provide same seed value before generating random data it will produce the same data. | VED | parmarvedpro5@gmail.com | |||||||||||||||||||||
7 | Got it. Thank you. | Kajal | kajalchatterjee@gmail.com | ||||||||||||||||||||||
8 | how to determine the value of parameters of function seed | You can check all the methods of the random library from the below link: https://docs.python.org/3/library/random.html | VED | parmarvedpro5@gmail.com | |||||||||||||||||||||
9 | ok | VED | parmarvedpro5@gmail.com | ||||||||||||||||||||||
10 | Will we get the same output as you if we use 42 in seed? | Yes | Nishant Singh | nishant1695@gmail.com | |||||||||||||||||||||
11 | what are we trying to achieve with randomness? | Randomness is for shuffling the data | Karan K | karankarnik47@gmail.com | |||||||||||||||||||||
12 | is there a shortcut to open api doc from line of code ? | You can use help(<func>) in jupyter. For e.g. help(pd.read_csv) | jia sharma | jiavidhi.sharma@gmail.com | |||||||||||||||||||||
13 | Ho do we know which function is part of whic lib like which one belongs to numpy and whichone belong to pandas | Once you start practicing you would know the commons ones as you would be using them frequently, if you need anything special you can always search in Google | Hemanta Lenka | hemanta.lenka@gmai.com | |||||||||||||||||||||
14 | Why so we compare contents of Test data and Train data | VED | parmarvedpro5@gmail.com | ||||||||||||||||||||||
15 | Is there a more extensive course for Pandas? The pre-requistes in the course have a brief content about DataFrame but seems it is pretty less compared to the coverage of the overall Pandas topic. | Rohit Arora | rohit.arora@creation-tec.com | ||||||||||||||||||||||
16 | How do we decide which existing features to use to engineer new feature? Is it intuition based? | Nishant Singh | nishant1695@gmail.com | You may have to iterate various combinations and evaluate performance for each combination and decide which one to choose | So this way you can choose - Prof Durga | ||||||||||||||||||||
17 | why data engeneering required here | DIvya Pathak | dev.feb88@gmail.com | ||||||||||||||||||||||
18 | ? | DIvya Pathak | dev.feb88@gmail.com | ||||||||||||||||||||||
19 | Can you repeat teh drop() part at line | Nitin Nigam | nknigam@gmail.com | ||||||||||||||||||||||
20 | #113 | Nitin Nigam | nknigam@gmail.com | ||||||||||||||||||||||
21 | The data we were looking at was that not clean data? | So far the data is not clean, we will clean the data | AG | abhijeetgadgil@gmail.com | |||||||||||||||||||||
22 | Higher side as in there is any %age we take into account? | Sugandhita | sugandhitap@gmail.com | ||||||||||||||||||||||
23 | sir, isnt data cleaning and tidying come before creating train and test data sets? | Sanjeeb Bose | sanjeeb.bose@oracle.com | ||||||||||||||||||||||
24 | instead of dropping can we not assign it some default value | Chinmay Athavale | chinmayat@gmail.com | ||||||||||||||||||||||
25 | we created a train_set and test_set data structures, while whatever prof is showing now as an example incomplete rows as an example have to be taken out finally from the train_set data structure, right? | AG | abhijeetgadgil@gmail.com | ||||||||||||||||||||||
26 | Obeservations and predictors aren’t same ? | Observations are input data records (rows). Predictors are variables used for prediction (columns). | Srihari M | srihariblr12@gmail.com | |||||||||||||||||||||
27 | ao that all rows of housing is deleted | Dr. Santosh Kumar | ksantosh.11@gmail.com | ||||||||||||||||||||||
28 | how we do data cleaning in case of categorical value | DIvya Pathak | dev.feb88@gmail.com | ||||||||||||||||||||||
29 | As you just explained Median is NOT sensitive to outliers. What if we have data set as 5, 6 8,10,7,8,9,200. In this case I guess it is affected by Outlier(200). | Manoj Kumar | manoj.gupta.91@gmail.com | ||||||||||||||||||||||
30 | we saw that total_bedroom has very less correlation, why do we need to correct the column.. probably need to drop those irrelevant columns | Nitin Nigam | nknigam@gmail.com | ||||||||||||||||||||||
31 | Won't imputation create a overfitting scenario? | Aakash Sinha | post2aakash@gmail.com | ||||||||||||||||||||||
32 | can we not use numerical translation for categorical values ? | Vikas Bhartiya | ghivikas@gmail.com | ||||||||||||||||||||||
33 | If we remove categorical attributed before imputation, do we merge the data frame back into the main post imputation so that we dont miss the categorical values in the data set. The categorical attributes may still be important to make predictions.... | Preedesh M | Preedesh@Gmail.com | ||||||||||||||||||||||
34 | I mean merge the inputed dataframe to the original | Preedesh M | Preedesh@Gmail.com | ||||||||||||||||||||||
35 | Still not completely clear about stratified sampling, can you point to some resources? thanks | Karan K | karankarnik47@gmail.com | ||||||||||||||||||||||
36 | What is the difference b/w imputer.fit(housing_num) AND X = imputer.transform(housing_num) ? | So by fit the imputer calculates the means of columns from some data, and by transform it applies those means to some data (which is just replacing missing values with the means) Prof Durga | Nitin Nigam | nknigam@gmail.com | imputer.fit takes the data in and analyses it. It does not transform the data. imputer.transform will take in the analysed data and then transform it. | And "means", is an example of it | Using "means" for analysis or for doing transform | ||||||||||||||||||
37 | cool | Aakash Sinha | post2aakash@gmail.com | ||||||||||||||||||||||
38 | imputer.fit(housing_num) - Does this impute all numerical columns with median value where there are NAN values? | Srini Boddu | siliconfish@yahoo.com | ||||||||||||||||||||||
39 | bins = labels = categories = can be imputed, but if you put numerical like 1,2,3,4,5 then not good idea to impute, right? | AG | abhijeetgadgil@gmail.com | ||||||||||||||||||||||
40 | A recorded video of every class will be available before the next day of the session. Instructors will keep adding Slides, Questions, and Projects on a weekly basis. where I can find Questions and project sir as given above | It will be updated in LMS | sshrivastava | sshrivastava@imtnag.ac.in | |||||||||||||||||||||
41 | Is it recommended to apply the OrdinalEncoder object to fit & transform the test data as well ? | Prakhar Prasad | prakhar.prasad@gmail.com | ||||||||||||||||||||||
42 | When Sir | sshrivastava | sshrivastava@imtnag.ac.in | ||||||||||||||||||||||
43 | I cant get any question and project till date | It is available under the slides on the session page | sshrivastava | sshrivastava@imtnag.ac.in | |||||||||||||||||||||
44 | can we do PCA after encoding ? suppose we have 50 categories , it will create 50 more columns , how to deal with that? we can't control number of categories , if column is having strong relationship | Arpit vijaywargiya | arpitvw16@gmail.com | ||||||||||||||||||||||
45 | Sir is it comes under Machine Learning Specialization | sshrivastava | sshrivastava@imtnag.ac.in | ||||||||||||||||||||||
46 | Please suggest | sshrivastava | sshrivastava@imtnag.ac.in | ||||||||||||||||||||||
47 | Although my course is Artificial Intelligence Deep Learning IIT Roorkee | sshrivastava | sshrivastava@imtnag.ac.in | ||||||||||||||||||||||
48 | Which one is usually used StandardScaler or MinMaxScaler and whether there be difference in the model performance due to the scaler chosen ? | Prakhar Prasad | prakhar.prasad@gmail.com | ||||||||||||||||||||||
49 | Is it necessary to create CombinedAttributesAdder class for newly added features? | Nitin Nigam | nknigam@gmail.com | ||||||||||||||||||||||
50 | cant we use inbuilt transform function? | Nitin Nigam | nknigam@gmail.com | ||||||||||||||||||||||
51 | Since we already added to df | Nitin Nigam | nknigam@gmail.com | ||||||||||||||||||||||
52 | what was , 16 | AG | abhijeetgadgil@gmail.com | ||||||||||||||||||||||
53 | Correct me if I'm wrong but transformer that we built is just automating feature engineering process and appending it to our data set, right? | Nishant Singh | nishant1695@gmail.com | ||||||||||||||||||||||
54 | diff between data fit vs transform with eg | kunal upadhyay | kupadhy@gmail.com | ||||||||||||||||||||||
55 | but we will be using any one transofrmation in our data right? what is benefit of combining ? | Dr. Santosh Kumar | ksantosh.11@gmail.com | ||||||||||||||||||||||
56 | I mean 16 in | AG | abhijeetgadgil@gmail.com | ||||||||||||||||||||||
57 | (16512, 16) | AG | abhijeetgadgil@gmail.com | ||||||||||||||||||||||
58 | PCA will affect encoding ? | Arpit vijaywargiya | arpitvw16@gmail.com | ||||||||||||||||||||||
59 | can we get some study material to understand transform concept on numerical and categoriacl variables? or soem reference links? | YOu can use the Oreilly book for reference. | jia sharma | jiavidhi.sharma@gmail.com | |||||||||||||||||||||
60 | suppose if we apply PCA on encoding data , How it will affect the model ? | Arpit vijaywargiya | arpitvw16@gmail.com | ||||||||||||||||||||||
61 | please provide some study material or more references | Refer to the O’Reilly book which we refered earlier | Sarbjit Singh | ssingh@imtnag.ac.in | |||||||||||||||||||||
62 | attr_adder =CombinedAttributesAdder(add_bedrooms_per_room=False) housing_extra_attribs = attr_adder.transform(housing.values) | SUVAIN G | brusuvain@gmail.com | ||||||||||||||||||||||
63 | as PCA is projection of data on to Eigen vectors , how it will loose the information of encoded data? | Arpit vijaywargiya | arpitvw16@gmail.com | ||||||||||||||||||||||
64 | can you please explain the above code lines once again these are from CombinedAttributeAdder | SUVAIN G | brusuvain@gmail.com | ||||||||||||||||||||||
65 | can you please explain the last 2 code lines once again from CombinedAttributeAdder | SUVAIN G | brusuvain@gmail.com | ||||||||||||||||||||||
66 | or instead of onehot if we use ordinal encoding , and then use PCA , | Arpit vijaywargiya | arpitvw16@gmail.com | ||||||||||||||||||||||
67 | so information will be project as per catrgories | Arpit vijaywargiya | arpitvw16@gmail.com | ||||||||||||||||||||||
68 | thanks | Sarbjit Singh | ssingh@imtnag.ac.in | ||||||||||||||||||||||
69 | When should we build a pipeline? | anantpadmanabh divanji | apgd14@gmail.com | ||||||||||||||||||||||
70 | So should we make it a habit, as beginners, to incorporate transformers and pipelines in our models at this point of time? What would you suggest? | Nishant Singh | nishant1695@gmail.com | ||||||||||||||||||||||
71 | Which one to go for first missing value analysis or encoding the categorical variables | anantpadmanabh divanji | apgd14@gmail.com | ||||||||||||||||||||||
72 | Should we follw the same steps as you are teaching as a beginners? | Try to follow. If behind try later,using video and slides | VED | parmarvedpro5@gmail.com | We recommend you to try along as far as possible. You may not understand everything, but you can run it as is, without any edits. | ||||||||||||||||||||
73 | I was reading about sparse matrix and it was mentioned that they are usually time consuming to work with owing to very few non-zero values . Are there ways to structure them better? | Puneet Rastogi | puneetrstg@gmail.com | ||||||||||||||||||||||
74 | What do we have in housing_labels ? | Nitin Nigam | nknigam@gmail.com | ||||||||||||||||||||||
75 | why did we use decision tree here | AG | abhijeetgadgil@gmail.com | ||||||||||||||||||||||
76 | sir what is CV=10 means? | Sanjeeb Bose | sanjeeb.bose@oracle.com | ||||||||||||||||||||||
77 | 10 rows for cross validation? | Sanjeeb Bose | sanjeeb.bose@oracle.com | ||||||||||||||||||||||
78 | i didnt get the concept of negative mean square error please repeat ? | VED | parmarvedpro5@gmail.com | ||||||||||||||||||||||
79 | why can't I use my training set for cross validation? | Because then the hyperparameters that you get from cross-validation might be overfitted to your training set, and might not perform well on test set. So we want to to tune hyperparameters on data different from training set, so we do it on cross-validation set | Nini Nursiah | nursiah.neelesh28@gmail.com | |||||||||||||||||||||
80 | what is random forest | Its a model | VED | parmarvedpro5@gmail.com | |||||||||||||||||||||
81 | where can we get more information around models for e.g. RandomForestRegressor? | Rohit Arora | rohit.arora@creation-tec.com | ||||||||||||||||||||||
82 | is there a way to know when should I stop fine tuning? | Nini Nursiah | nursiah.neelesh28@gmail.com | ||||||||||||||||||||||
83 | So pretty much we need to get the lowest RMSE for various models and select that model , is that what we are trying to do here? | Preedesh M | Preedesh@Gmail.com | ||||||||||||||||||||||
84 | Shouldn't the RMSE value lie between 0 and 1? | Swetha Lakshmipathy | swethalpathy@gmail.com | ||||||||||||||||||||||
85 | shall we always get better result from random forest? | Dr. Santosh Kumar | ksantosh.11@gmail.com | ||||||||||||||||||||||
86 | Why negative in np.sqrt() for few model and not for Random Forest ? | Nitin Nigam | nknigam@gmail.com | ||||||||||||||||||||||
87 | does it cause overfit, if keep on check performance and tune the feature | Srihari M | srihariblr12@gmail.com | ||||||||||||||||||||||
88 | But then if we just compare how it performs on the test set, if the error is decreasing, then we might get stuck in a local minima right? | Nini Nursiah | nursiah.neelesh28@gmail.com | ||||||||||||||||||||||
89 | How do we account for the time dimension of data observations because some observations will become obsolete in due course of time | Rohit Arora | rohit.arora@creation-tec.com | ||||||||||||||||||||||
90 | In case the model evaluation is not promising on the validation data, then we again go back to the training data / revisit the model and iteratively check the performance on the test data. But this would also at some point result in overfitting on the validation data. Should we then have an unseen dataset that where we evaluate only when we are fully satisfied with the model performance ? | Prakhar Prasad | prakhar.prasad@gmail.com | ||||||||||||||||||||||
91 | Which one to choose We have other as wee MSE orMAE? | VED | parmarvedpro5@gmail.com | When we want to magnify errors, we use MSE otherwise MAE | |||||||||||||||||||||
92 | In real world, should we have pre determined target for confidence/error range on the predictions? | It depends on the real-world domain and problem that you're trying to solve, and how much error in predictions is acceptable for that kind of problem | Puneet Rastogi | puneetrstg@gmail.com | |||||||||||||||||||||
93 | When we are using K-Fold cross validation, the training set is different for each training epoch, why did we have to set the seed to get pseudo random split of train and test sets? if we had continued without setting the seed then it would have behaved as K-Fold cross validation. Am I missing something here for splitting the train and test sets | Vinod | vinods.kumar@gmail.com | ||||||||||||||||||||||
94 | How do I know which model is performing better? Only from the mean error, is it enough to know how good a model is? | Nini Nursiah | nursiah.neelesh28@gmail.com | ||||||||||||||||||||||
95 | What is the meaning of Negative Mean squared error? Does it mean that model is bad? | Srini Boddu | siliconfish@yahoo.com | ||||||||||||||||||||||
96 | What is the good RMSE to see if it gives satisfactory prediction? | Nitin Nigam | nknigam@gmail.com | ||||||||||||||||||||||
97 | OK, I get it, CV is within the train data set | Vinod | vinods.kumar@gmail.com | ||||||||||||||||||||||
98 | What is random_state and bootstrap ? | Nitin Nigam | nknigam@gmail.com | ||||||||||||||||||||||
99 | can we apply ab testing to this dataset? | anantpadmanabh divanji | apgd14@gmail.com | ||||||||||||||||||||||
100 | can you explain with example hyper large vs scale space? | AG | abhijeetgadgil@gmail.com |