1 of 20

Webinar Series in Applied Quantitative Analysis - Updated

https://www.airp3-africa.org/applied-quantitative-analysis-webinar-series

Date	Topic
�February 29�March 7	Session One�Potential Outcomes and Omitted Variable Bias I (Theory) �Potential Outcomes and Omitted Variable Bias II (Application)
�March 21�March 28	Session Two�Difference-in-differences I (Theory)�Difference-in-differences II (Application)
�April 25�May 2	Session Three�Power analysis, clustering and sample size calculations I (Theory)�Power analysis, clustering and sample size calculations II (Application)
May 23�May 30	Session Four�Propensity score matching (Theory)�Propensity score matching (Application)
�June 20�June 27	Session Five�Fixed-effects I (Theory)�Fixed-effects II (Application)
�July 25�August 1	Session Six�Instrumental variables I (Theory)�Instrumental Variables II (Application)
August 22 August 29	Session Seven�Lagged dependent variables and the Arellano-Bond Estimator I (Theory)�Lagged dependent variables and the Arellano-Bond Estimator II (Application)

2 of 20

�� Écoute de l'interprétation d'une langue �Windows | macOS�

�1. Dans les contrôles de votre réunion/webinaire, cliquez sur Interprétation .

2. Cliquez sur la langue que vous souhaitez entendre. (Nous aurons le français) Pas besoin de choisir l'anglais, c'est la langue de la salle Zoom principale

3. (Facultatif) Pour entendre uniquement la langue interprétée, cliquez sur Couper le son original.

Remarques:

Vous devez rejoindre l’audio de la réunion via l’audio/VoIP de votre ordinateur. Vous ne pouvez pas écouter l’interprétation linguistique si vous utilisez les fonctions audio de connexion ou d’appel téléphonique.
En tant que participant rejoignant une chaîne linguistique, vous pouvez retransmettre sur le canal audio principal canal si vous réactivez votre audio et parlez.

3 of 20

Listening to language interpretation� Windows | macOS

In your meeting/webinar controls, click Interpretation .
Click the language that you would like to hear. (We will have French) No need to choose English, that is the language in the main Zoom room

3. (Optional) To hear the interpreted language only, click Mute Original Audio.

Notes:

You must join the meeting audio through your computer audio/VoIP. You cannot listen to language interpretation if you use the dial-in or call me phone audio features.
As a participant joining a language channel, you can broadcast back into the main audio

channel if you unmute your audio and speak.

4 of 20

Dynamic Panel Data Estimation I - Theory

Ashu Handa

Institute Fellow – AIR

Kenan Eminent Professor of Public Policy – UNC-CH

August 22, 2024

5 of 20

Why this topic?

Large number of panel data sets slowly becoming available in Africa

Panel data supports more sophisticated statistical methods to address endogeneity (e.g. fixed effects, fixed effects with IV—FEIV)

The Arellano-Bond (1991) and Blundell-Bond (1998) methods provide a solution to a specific problem where there is an endogenous lagged dependent variable

6 of 20

Panel data sets you should have on your computer!

World Bank Integrated Surveys on Agriculture (LSMS-ISA)

Eight countries, national panel data, multiple waves
https://www.worldbank.org/en/programs/lsms/initiatives/lsms-ISA

Ghana Socioeconomic Panel Survey

Three waves, national
https://dataportal.isser.edu.gh/index.php/catalog/4

Transfer Project cash transfer evaluation data sets (panels)

Kenya, Lesotho, Ghana; Malawi and Zambia coming soon
https://transfer.cpc.unc.edu/datasets/

Kagera (TZ), NIDS (RSA), Young Lives in Ethiopia, etc

https://www.younglives.org.uk/

7 of 20

Why would we have a lagged dependent variable?

Almost all human behavior is conditioned on past choices – what we did in the past influences what we do in the present

Panel data lets us build statistical models that include past choices

More importantly, many outcomes of interest are highly dependent on their past value

Y_i_t = f(Y_i_t-1, X) where i is the individual/household and t is time

8 of 20

Why would we have a lagged dependent variable?

Education: child does poorly this year, more likely to do poorly next year
Nutrition: child is low weight or small this period, more likely to be low weight or small next period
Physical health: adult has high BMI this year, likely to have high BMI next year
Mental health, chronic disease, many other examples
State dependency: the extent to which an outcome or variable depends on its past value

9 of 20

Use the case of child nutrition (height or stunting - H) to illustrate the problem

H_it = β₀ + β₁(H_it-1) + β₂(X_it) + β₃(X_ht) + (ε_it + µ_i)

Current period (t) height depends on lagged height (t-1)

Truly random error: not correlated

with Xs nor H_it-1

Health endowment of child: fixed over

time and correlated with H_i in each period

10 of 20

National Educational Longitudinal Survey - NELS

12 of 20

We have an endogeneity problem: H_it-1 is correlated with the error term!

H_it = β₀ + β₁(H_it-1) + β₂(X_it) + β₃(X_ht) + (ε_it + µ_i)

We can solve this using IV. Instruments must satisfy two criteria (remember?)

Must be highly correlated with H_it-1;
Must not directly affect H_it;

POLL

13 of 20

Now let us look at the Arellano-Bond (1991) strategy

H_it = β₀ + β₁(H_it-1) + β₂(X_it) + β₃(X_ht) + (ε_it + µ_i) (1)

H_it-1 = β₀ + β₁(H_it-2) + β₂(X_it-1) + β₃(X_ht-1) + (ε_it-1 + µ_i) (2)

ΔH_it = β₁(ΔH_it-1) + β₂(ΔX_it) + β₃(ΔX_ht) + (Δε_it) (3)

Take the difference in these two equations (t) – (t-1)

µ_ihas been removed in (3) – good

But there is still a problem

14 of 20

Now let us look at the Arellano-Bond (1991) strategy

ΔH_it = β₁(ΔH_it-1) + β₂(ΔX_it) + β₃(ΔX_ht) + (Δε_it) (3)

H_it-1 – H_it-2

H_it-1 is obviously correlated with ε_it-1

ε_it – ε_it-1

Arellano-Bond propose H_it-2 as an instrument for (H_it-1 – H_it-2)

Is this a valid instrument?

H_it-2 depends on µ_i of course, but u_i is not in (3)!!

15 of 20

Tthe Arellano-Bond (1991) strategy when facing an endogenous lagged dependent variable

ΔH_it = β₁(ΔH_it-1) + β₂(ΔX_it) + β₃(ΔX_ht) + (Δε_it) (3)

H_it-1 – H_it-2

H_it-1 is correlated with ε_it-1

ε_it – ε_it-1

Use H_it-2 as an instrument for (H_it-1 – H_it-2)

“Use the two-period value of the dependent variable in levels as an instrument in the differenced equation”

16 of 20

Blundell-Bond 1998 (building on Arellano-Bover 1995)

When N is large and T is small

These are exactly the data sets I listed at the beginning, large, national data sets (N in the thousands) but just a few waves (T around 3, 4 or 5)

In these data, Y_it-2 tends to be a very weak instrument for (Y_it-1) – (Y_it-2)

Their solution is a ‘system estimator’

XTDPDSYS or XTDPD in STATA (will use it next week)

17 of 20

What is the Blundell-Bond (1998) solution?

H_it = β₀ + β₁(H_it-1) + β₂(X_it) + β₃(X_ht) + (ε_it + µ_i) (1)

Use (H_it-1 – H_it-2) as an instrument for H_it-1

But isn’t (H_it-1 – H_it-2) correlated with µ_i?

H_it-1 and H_it-2 are both dependent on µ_i

But when you subtract the two, µ_icancels out!!

Estimate the levels equation (1) using the lagged difference in Y as the instrument together with the Arellano-Bond (1991) estimator” as a

system “system estimator”

18 of 20

Recap of the Blundell-Bond (1998) system estimator

H_it = β₀ + β₁(H_it-1) + β₂(X_it) + β₃(X_ht) + (ε_it + µ_i) (1)

Use (H_it-1 – H_it-2) as an instrument for H_it-1

Use H_it-2 as the instrument for (H_it-1 – H_it-2)

ΔH_it = β₁(ΔH_it-1) + β₂(ΔX_it) + β₃(ΔX_ht) + (Δε_it) (3)

Levels equation with a lagged

differenced instrument

Differenced equation with a lagged

levels instrument

1 of 20

2 of 20

3 of 20

4 of 20

5 of 20

6 of 20

7 of 20

8 of 20

9 of 20

10 of 20

11 of 20

12 of 20

13 of 20

14 of 20

15 of 20

16 of 20

17 of 20

18 of 20

19 of 20

20 of 20