Statistical Inference-Homework

JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

In this homework, we will explore some common extensions to hypothesis testing. This will preview some of the concepts introduced in upcoming lessons on Sample Size Calculations for Surveys and Power Calculations. As in the previous quiz, we will use the EG DIB evaluation dataset. Note that the variable "total_ely#" contains Year # endline test scores and the variable "treatment" denotes EG students (=1) and non-EG students (=0).

Email *

First, compare Year 2 endline test scores for EG and non-EG students by running a t-test. Ignore clustering & covariates for now.

1. What is the average Year 2 endline test for non-EG students? Round your answer to the nearest 0.001. *

1 point

2. What is the average Year 2 endline test for EG students? Round your answer to the nearest 0.001. *

1 point

3. If the true difference in average test scores between EG & non-EG students were 0, what is the probability that you would find a difference this large or larger by chance (assuming that random assignment was carried out correctly). Round your answer to the nearest 0.001 *

1 point

Now, compare Year 2 endline test scores for EG and non-EG students again, this time by running a regression. Using the regress command, regress Year 2 endline test scores on treatment assignment. (Note that the regression output rounds the p-value to the nearest 0.01).

4. Which of the following statements is true? *

1 point

The p-value calculated from the regression is smaller than the p-value calculated from the t-test.

The p-value calculated from the regression is the same as the p-value calculated from the t-test.

The p-value calculated from the regression is larger than the p-value calculated from the t-test.

Not enough information

One problem with the default regression command is that it assumes that variance in the two groups is equal. This may not be the case. For instance, if the treatment improves test scores for the lowest-performing students more than it does for the top-performing students, variance could be less in the treatment group. I.e. the errors would be heteroskedastic. Rather than assume variance is equal, we use robust standard errors to account for possible heteroskedasticity. Note that robust standard errors are valid even if there is not heteroskedasticity, so they should really be your default. Rerun the regression, this time using robust standard errors by specifying the ", robust" option in your regression command.

Rerun the regression, this time using robust standard errors by specifying the ", robust" option in your regression command.

5. Which of the following statements about the regression output is true, relative to the previous regression? *

1 point

The coefficient on treatment is LARGER now

The coefficient on treatment is EXACTLY THE SAME now

The coefficient on treatment is SMALLER now

6. Which of the following statements about the regression output is true, relative to the previous regression? *

1 point

The standard error on the treatment coefficient is LARGER now

The standard error on the treatment coefficient is EXACTLY THE SAME now

The standard error on the treatment coefficient is SMALLER now

7. Which of the following statements about the regression output is true, relative to the previous regression? *

1 point

The p-value on the treatment coefficient is LARGER now

The p-value on the treatment coefficient is EXACTLY THE SAME now

The p-value on the treatment coefficient is SMALLER now

A much more consequential problem with the previous regression specification is that we assumed treatment was randomized at the individual-level. In fact, treatment was randomized at the village-level. We'll see in the coming lessons why exactly this leads us to underestimate the standard error on treatment. For now, let's correct the regression specification and examine the implications for inference.

Rerun the regression, this time clustering at the village-level (village_id_rand) by specifying the ", cluster()" option. Note that you do not need to specify robust standard errors again, since clustered standard errors are automatically robust.

8. Which of the following statements about the regression output is true, relative to the previous regression? *

1 point

The coefficient on treatment is LARGER now

The coefficient on treatment is EXACTLY THE SAME now

The coefficient on treatment is SMALLER now

9. Which of the following statements about the regression output is true, relative to the previous regression? *

1 point

The p-value on the treatment coefficient is LARGER now

The p-value on the treatment coefficient is EXACTLY THE SAME now

The p-value on the treatment coefficient is SMALLER now

The test score variable is the sum of three separate components: hindi_ely2 + math_ely2 + english_ely2. Try running similar regressions on each of these components.

10. In which subject do treatment students score highest, on average? *

1 point

Hindi

Math

English

Can't say

11. In which subject do treatment students outperform control students the most, on average? *

1 point

Hindi

Math

English

Can't say

Submit

Clear form

Never submit passwords through Google Forms.

This form was created inside of Idinsight.org. Report Abuse

Forms