RPPL Playbook
Purpose & Audience
The RPPL Playbook describes our emerging thinking on different types of studies RPPL will engage in together. It provides guardrails for anchor studies, shared micro-studies, and contributing studies and will form the basis for assessing proposed studies for funding.
This includes an assessment of:
Study Findings Impact Practice & Field
We Generate Actionable, Generalizable Findings
Build Organizational Capacity for Research
Research is the Way We Do Our Work
Systematize Self Sustaining Ecosystem
We Spread the Word & Invite Others In
Key Terms
Descriptive Analysis
Answers Questions About:
Contribution:
Methods: Qualitative methods such as case studies, interviews + thematic analyses, ethnography; quantitative methods such as surveys, correlational analysis of administrative data
Causal Analysis
Answers Questions About:
Contribution:
Methods: Whether we can make a causal claim depends on research design. Robust designs include experiments and natural experiments where individuals are assigned to different interventions/treatments based on things outside their control.
Anchor Studies (RPPL Term)
Anchor Studies are major cross-organizational studies designed to gain deeper insight on a key topic of interest in the RPPL learning agenda.
Note: An Anchor Study does not require RPPL organizations to find and develop new partners. (We understand that this is time intensive and limiting, so ideally, your organization can leverage existing partnerships). However, an Anchor Study typically does require a change in standard practice (e.g., developing a second version of a PL or tweaking a program to align with research questions).
Link to Framework
Micro-Studies (RPPL Term)
Shared Micro Studies engage multiple organizations in addressing related questions on a single topic with short-term outcomes that can provide more rapid evidence. A set of small trials can collectively build a powerful knowledge base about a topic.
Shared Micro Studies are comprised of multiple trials.
This Playbook will clarify if the guidelines apply to collective Shared Micro Studies or individual trials, where needed.
Link to Framework
Contributing Studies (RPPL Term)
Contributing Studies are organization-driven studies to take on a question tied to the learning agenda that is of interest both to the organization and to the broader RPPL membership.
Note: Contributing Studies do not need to be new in scope for RPPL organizations. You can definitely receive funding for existing work! RPPL aims to help each organization develop its capacity for high quality research and will meet each organization where it is in its development.
Link to Framework
Supporting Studies (RPPL Term)
RPPL funds 3 types of studies (Anchor Studies, Shared Micro Studies, & Contributing Studies, see framework) using the guardrails in this RPPL Playbook, however:
Often, Supporting Studies may serve as the foundation for future RPPL funded studies (e.g., may grow into a Shared Micro Study through participation by more organizations, but may not be large enough to receive funding as a Contributing Study on its own).
Study Selection Guardrails
Framing
The guardrails in this RPPL Playbook are ones we’ll build towards together over time. Year over year, we will build our collective capacity for high-quality research within RPPL organizations.
The nature of research is that we need to plant many seeds, only some of which will sprout into findings that provide actionable evidence and move the field forward.
We do not intend for these guardrails to be limiting, but instead to articulate the bar for alignment and quality. They set the best conditions for sprouting studies that provide usable findings.
Part 1: Guardrails for Study Learning Objectives
This section contains guardrails around:
These are to be used as the first screen for potential RPPL study ideas.
Learning Objectives
Valuable Contribution to
Orgs
Valuable Contribution to Literature
Alignment to Learning Agenda [Note 1]
YES
YES
YES
YES
YES
YES
or
YES
YES
YES [Note 2]
+
+
+
+
+
Note 1. The Learning Agenda is centered around Teacher Professional Learning. Research questions on other enabling conditions (such as school leadership) to support teacher outcomes are also considered aligned to the Learning Agenda.
Note 2. Depending on organization capacity for high quality research, a contributing study may support development of this capacity, more so than meaningfully contribute to the Learning Agenda (i.e., study design may not be rigorous enough to support causal inferences).
Note 3. Each Contributing Study should have a theory about how it can progress to a more rigorous study or relate to a larger RPPL theme of studies (e.g., another rigorous study has been conducted and a Contributing Study supports implementation of those findings)
Part 2: Guardrails for Study Rigor
This section contains guardrails around:
which are are evaluated in conjunction with each other.
Rigor of Study Design
Experiments
Support Robust Causal Inferences
May Support Causal Inferences
Do Not Support
Causal Inferences
(5)
Random Assignment; RCT
(4)
Comparison Group Design w Exogenous Assignment
(3) Well-Matched Comparison Group Design
(2)
Designs with Comparison Groups of Unknown Match
(1)
Designs without Comparison Groups
Could Support Causal Inferences
Strong Quasi-
Experimental Design
Descriptive Studies
Rigor of Study Design
Experiments
Strong Quasi-
Experimental Design
Descriptive Studies
(5)
Random Assignment; RCT
(4) Comparison Group Design w Exogenous Assignment
(3) Well-Matched Comparison Group Design
(2)
Designs with Comparison Groups of Unknown Match
(1)
Designs without Comparison Groups
Minimum Rigor for Contributing Studies
Minimum Rigor for Micro-Studies
Minimum Rigor for Anchor Studies
Support Robust Causal Inferences
May Support Causal Inferences
Do Not Support
Causal Inferences
Could Support Causal Inferences
See Appendix for Examples
RPPL also plans to produce guardrails for qualitative methods
Rigor of the Methodology
Evaluated in conjunction with each other to determine rigor*:
(5) Strong
(1) Weak
(5) Large
(1) Small
(5) High
(1) Low
(5) High
(1) Low
(5) Low
(1) High
*All of these are intersecting (e.g., sample size, treatment contrast, measurement quality), but we consider them individually for simplicity.
Rigor of the Methodology
Evaluated in conjunction with each other to determine rigor:
How different are the two treatment conditions we are testing? The larger the difference, the easier it will be to detect effects.
(5) Strong
(1) Weak
This criteria is a subjective assessment.
If treatment contrast is weak but sample size is large and measurement quality is high, this methodology could still support causal inferences in a high quality natural experiment.
Rigor of the Methodology
Evaluated in conjunction with each other to determine rigor:
2. Sample Size
Larger sample sizes ensure that the differences we detect between groups are not just due to chance.
There are important considerations about the level of randomization (school, team, teacher) that vary by context and study design.
(5) Large
(1) Small
This criteria is needed to support sufficient power to detect treatment impacts.
Minimum for
Anchor Studies
75 randomizable units per org
Anticipated for Contributing Studies
75-100 randomizable units
Minimum for
Micro-Studies
50 randomizable units per trial
Rigor of the Methodology
Evaluated in conjunction with each other to determine rigor:
3. Measurement Type
More proximal measures that are easier to collect may be less expensive and more likely to show effects, but more distal measures that reflect practice and impact on students are likely more meaningful.
(5) High
(1) Low
Student Outcomes;
Observational Measures of Instructional Practice
Not Aligned to PL Desired Outcomes
Survey Measures of Instructional Practice or Mindsets/Attitudes
Minimum for
Anchor Studies
Preferred for
Micro-Studies
(3) Medium
Preferred for
Contributing Studies
Rigor of the Methodology
Evaluated in conjunction with each other to determine rigor:
4. Measurement Quality
Are we measuring what we want to be measuring? (reliability & validity)
Note: Reliability and validity can be enhanced by either choosing existing measures or piloting measures prior to use. Scores should show evidence of face validity (e.g., experts agree they will measure intended construct) or construct validity (e.g., via cognitive interviews, factor analyses, correlational analyses). With smaller sample sizes, more reliable measures are needed.
Common measures across organizations, as applicable, would enhance research.
(5) High
(1) Low
Scores show evidence of reliability (e.g., test-retest correlation 0.7 or internal reliability of 0.85); strong face and construct validity
Measures mostly noise (e.g., random answers); or measures a construct other than intended
Score reliability 0.7-0.85; scores have strong face validity and some evidence of construct validity
(3) Medium
Minimum for
All Studies
Rigor of the Methodology
Evaluated in conjunction with each other to determine rigor:
5. Attrition
Do we have the full picture of the outcome? High rates of attrition, and particularly differential attrition between the different treatments, can bias study results. (Differential attrition is the difference in attrition rates between the treatment groups).
Note: This cannot be fully known until after the study is complete. In study design, it should be demonstrated that there is high confidence that outcome measures will be collected for 100% of randomized units.
(5) Low
(1) High
<20% overall and <5% pt differential
>30% overall and >10% pt differential
<25% overall and <7% pt differential
(3) Medium
Source: What Works Clearinghouse Standards Handbook (Pg. 11)
Minimum for
All Studies
Appendix
1 - Designs without Comparison Groups
Example: Measure outcomes of teachers before and after they participate in a PL and see whether the PL improved outcomes; survey teachers in a district about their experiences with curriculum adoption; interview teachers who participated in a PL
Opportunities: Can provide helpful descriptive evidence: interesting patterns across the sample, needs, feedback to inform program design
Constraints: Not able to make any claims about program impact or program effectiveness – without a comparison group, we do not know whether these teachers would have improved without the PL
2 - Designs with Comparison Groups of Unknown Match
Example: Compare outcomes for teachers who participate in a PL to other teachers in the school or district
Opportunities: Quite limited. When coupled with rich data about who participates, can potentially yield some hypotheses about whether a program is effective or not,
Constraints: Not able to make any claims about program impact or program effectiveness – teachers who participate and do not participate in the PL are different in unknown ways, so we do not know that the differences in outcomes come from the PL or not.
3 - Well-Matched Comparison Group Design
Example: Compare outcomes for teachers who participate in a PL with teachers who did not but who are similar on many different characteristics (i.e., well-matched). Comparison can be done via matching or robust regression adjustment.
Opportunities: Surfaces robust hypotheses about program impact for further testing.
Constraints: Even though teachers are similar in ways that we can observe, teachers who participate and do not participate in the PL may well be different in unobserved ways. For example, teachers more motivated to learn might be more likely to participate. As a result, we are not certain that the differences in outcomes come from the PL.
4 - Comparison Group Design with Exogenous Assignment
Example: Districts fund PL opportunities in the lowest-performing schools, so for schools near the cutoff participation in the PL is essentially “as good as random”
Opportunities: Rigorous studies with exogenous assignment can lead to causal inferences about program impact
Constraints: Hard to find opportunities for true exogenous assignment
See additional regression discontinuity example in this guide for states (‘Example 2: Summer PD Academy on Differentiated Instruction’’ beginning on PDF Pg 22).
Source:
Perez-Johnson, Irma, Kirk Walters, Michael Puma and others. ―Evaluating ARRA Programs and Other Educational Reforms: A Guide for States.‖ Resource document developed jointly by The American Institutes for Research and Mathematica Policy Research, Inc. April 2011.
5 - Random Assignment RCT
Example: Schools, grade-level teams, or individual teachers are assigned to receive two different PL opportunities
Opportunities: Clear causal inferences about program impacts
Constraints: Randomization requires additional buy-in