Statistical Inference-Homework
In this homework, we will explore some common extensions to hypothesis testing. This will preview some of the concepts introduced in upcoming lessons on Sample Size Calculations for Surveys and Power Calculations. As in the previous quiz, we will use the EG DIB evaluation dataset. Note that the variable "total_ely#" contains Year # endline test scores and the variable "treatment" denotes EG students (=1) and non-EG students (=0).
Sign in to Google to save your progress. Learn more
Email *
First, compare Year 2 endline test scores for EG and non-EG students by running a t-test. Ignore clustering & covariates for now.
1. What is the average Year 2 endline test for non-EG students? Round your answer to the nearest 0.001. *
1 point
2. What is the average Year 2 endline test for EG students? Round your answer to the nearest 0.001. *
1 point
3. If the true difference in average test scores between EG & non-EG students were 0, what is the probability that you would find a difference this large or larger by chance (assuming that random assignment was carried out correctly). Round your answer to the nearest 0.001 *
1 point
Now, compare Year 2 endline test scores for EG and non-EG students again, this time by running a regression. Using the regress command, regress Year 2 endline test scores on treatment assignment. (Note that the regression output rounds the p-value to the nearest 0.01).
4. Which of the following statements is true? *
1 point
One problem with the default regression command is that it assumes that variance in the two groups is equal. This may not be the case. For instance, if the treatment improves test scores for the lowest-performing students more than it does for the top-performing students, variance could be less in the treatment group. I.e. the errors would be heteroskedastic. Rather than assume variance is equal, we use robust standard errors to account for possible heteroskedasticity. Note that robust standard errors are valid even if there is not heteroskedasticity, so they should really be your default. Rerun the regression, this time using robust standard errors by specifying the ", robust" option in your regression command.
Rerun the regression, this time using robust standard errors by specifying the ", robust" option in your regression command.
5. Which of the following statements about the regression output is true, relative to the previous regression? *
1 point
6. Which of the following statements about the regression output is true, relative to the previous regression? *
1 point
7. Which of the following statements about the regression output is true, relative to the previous regression? *
1 point
A much more consequential problem with the previous regression specification is that we assumed treatment was randomized at the individual-level. In fact, treatment was randomized at the village-level. We'll see in the coming lessons why exactly this leads us to underestimate the standard error on treatment. For now, let's correct the regression specification and examine the implications for inference.
Rerun the regression, this time clustering at the village-level (village_id_rand) by specifying the ", cluster()" option. Note that you do not need to specify robust standard errors again, since clustered standard errors are automatically robust.
8. Which of the following statements about the regression output is true, relative to the previous regression? *
1 point
9. Which of the following statements about the regression output is true, relative to the previous regression? *
1 point
The test score variable is the sum of three separate components: hindi_ely2 + math_ely2 + english_ely2. Try running similar regressions on each of these components.
10. In which subject do treatment students score highest, on average? *
1 point
11. In which subject do treatment students outperform control students the most, on average? *
1 point
Submit
Clear form
Never submit passwords through Google Forms.
This form was created inside of Idinsight.org. Report Abuse