Econometrics Analysis Project
Impact of Health and Living Conditions on Childhood Mortality
Kimberly Lynch
ECON-5330
Utah State University
Introduction
Model
This regression model seeks to examine the mortality rates for children under five and what factors may contribute to these rates. The factors examined include; the percentage of children between 12-23 months who received a DPT immunization before they were 12 months old, the percentage of the population that has access to an improved drinking water source, and the percentage of the population between 15-49 with HIV. This model is based on the theory that health conditions affect mortality rates and examines the impact these variables have on causing a lower or higher mortality rate. Model; Mortalityi= B1 + B2IMMi + B3Wateri + B4HIVi + ui
Data
The cross-sectional data used in this model was retrieved from The World Bank, an organization that gathers data from countries on living conditions, mortality, and health information1. There were 95 countries that had information on all three of the factors being studied in this model. I did adjust the data on mortality rates in order to simplify the interpretation of the model’s results. The data provided on this subject gave the rate of newborns that die before reaching the age of five per 1000 births, to allow for a clear percentage I altered the number of deaths to be per 100 births.
Hypothesis:
H0 : B2=B3=B4=0
H1 : B2 B3
B4
0
The hypothesis for this model examines if there is any relationship between the variables and the mortality rate and also examines if those variables matter jointly.
Results
Regression
Mortalityi = 21.75 - .06 IMMi - .14 Wateri + .10 HIVi + ui
t = (14.30) (-3.37) (-8.28) (2.33)
p = (6.942E-25) (.001) (1.058E-12) (.0220)
R2 = .63 Adjusted R2 = .62
F = 51.79 Significance F = 1.53E-19
Regression Interpretation
This model shows that each variable has a fairly substantial impact on mortality rates in children. The most substantial variable appears to be access to improved water facilities. If the population’s access increases by 1% holding other factors constant the mortality rate is lowered 14% on average. HIV appears to be quite impactful also, all other factors held constant if HIV in the population increases by 1% then the mortality rate is expected to increase on average by 10%. For the DPT immunization, a one percentage increase in immunizations will on average lower the mortality rate by 6%.
The R2 value in this model is .63 meaning the model explains about 63% of variation in mortality rates. The Adjusted R2 value is less than the original R2 value indicating that the number of variables is not increasing the explanatory power of the model.
Hypothesis Conclusion
The t-test values, important as they relate the coefficients with their standard errors, are quite large indicating a rejection of the null that these variables are unrelated to mortality rates. The P-value for each of these t-test values are very small further demonstrating that each is statistically significant in this model and have a significant impact on the dependent variable. The F-statistic tests the hypothesis that the variables do not matter jointly, unsurprisingly the F-test shows that the variables are significant collectively. It appears that access to improved water, DPT immunizations, and HIV individually may have a significant impact individually and jointly on mortality rates in children.
Post Estimations
RESET Test
The RESET test was performed to determine if the model was misspecified. The RESET test uses an F-Test to determine if the estimated Y values squared and cubed have any impact on the model. If they are jointly significant then the model is misspecified but if they are not jointly significant then the model is good. After performing the RESET test on my model I received a F-value of 3.98 which is a bit higher than the critical value of 2.60 at a 5% level of significance, this was motivating to try other functional forms. I began with a log model because the data is already in percentage form so it seemed plausible that utilizing logs with this data would be appropriate. I tried using a Lin-Log model to keep the dependent value linear and to have test statistics and R squared values that would be comparable. I also created a model utilizing dummy variables as they can sometimes assist in better fitting the model. I created the dummy variables to identify the countries whose water access and DPT immunizations were below 75% and the HIV level was above 5%. I hoped the dummy variables might help fit the data in a different and better way. The following is a chart of the models examined by the RESET test including their F-values and R2 values:
Model | F-Value: Critical = 2.60 | R2 |
Linear | 3.98 | .63 |
ANOVA - Dummy variables | 4.58 | .40 |
Lin-Log | 6.35 | .62 |
Although the RESET test uses the F-value as its conclusive value I also examined the R2 values side by side for further insight. The chart above demonstrates that all of the functional forms exceed the critical value which means that all are misspecified. It was interesting to find the Lin-Log model had a similar amount of explanatory power, but it’s F-Value was quite a bit higher than the linear model’s F-Value. It was also interesting that the ANOVA model’s F-value was fairly close to the linear model’s F-value and yet was still worse in its specification. After considering all these factors I decided to continue using the linear form of the model because it was the best option having the highest R2 value and the lowest F-value.
Heteroscedasticity
Heteroscedasticity occurs when the variance values vary as the independent values increase. In order to gain a visual on whether my data may be heteroscedastic, I plotted the squared residuals against each variable:
The charts do point towards heteroscedasticity as the squared residuals can be seen clumped together and are clearly patterned rather than being evenly distributed and random.
White’s Heteroscedasticity Test. This test was used to confirm the visual aids that were utilized and takes a mathematical approach to observing the behavior of the variance and residuals. A regression was run on the following equation;
ei2 = A1+A2IMMi+A3Wateri+A4HIV+A5IMM2i+A6Water2i+A7HIVi2+A8(IMMi)(Wateri)(HIVi)+ui
The result of White’s Heteroscedasticity Test gave a chi squared value of 19.9453 which exceeded the critical value of 14.0671. This result points to heteroscedasticity in the model and/or misspecification. The results of this test were unsurprising as the cross-sectional nature of the data makes heteroscedasticity very probable and it was already established that the model’s functional form is slightly misspecified or the model is underfit.
Autocorrelation
Autocorrelation is most common in time series data but it can also be present in cross-sectional data and it can be another indicator of misspecification. Autocorrelation examines if there is correlation amongst the error terms.
`Durbin-Watson Test. This test utilizes rho, the coefficient of autocorrelation, if rho equals zero then the error terms correctly balance themselves out indicating an absence of autocorrelation. The Durbin-Watson Test utilizes rho to find the d statistic which indicates whether the model is autocorrelated based on a several critical values. The d critical values are specific to each model and data set. The following shows the d critical values related to my model and the possible zones which the d-value can fall in.
The d-value associated with my model was 1.8920 and falls between critical values du and 4-du which signifies there is no autocorrelation in the model.
Conclusion
Problems
This model has some major specification problems made obvious by its failure of the RESET test and White’s Heteroscedasticity Test. My attempt to change the model’s functional form was unsuccessful so I believe the problem is most likely due to an omission of a variable or variables. Unfortunately, the data provided by The World Bank Group was limited and I limited myself to only using data that was in percentage or per capita form to make the analysis more sensible. The only data that could have been used was too close to the data I already had, for example, there was a lot of data about how many children received immunizations but I suspected that the percentage of children who got a DPT immunization would be similar to the number of children who received a Polio immunization and would therefore result in a high correlation between the variables. The model is most likely underfit and without more data in percentage or per capita form it would be difficult to remedy the problem.
Usefulness
This model supports the theory that poor living and health conditions can cause a country to have a higher childhood mortality rate. The R2 value is not large but significant. The model would be fairly effective in predicting a country’s childhood mortality rate based on improved water access, DPT immunizations, and the presence of HIV. There is most likely an abundance of other factors that impact childhood mortality and whose presence would make the model better specified but I would have to attain more suitable data. The gathering of such data would most likely improve this model’s explanatory and predictive power but in its current form the model still does a decent job outlining some of the factors that contribute to childhood mortality.
Citations