1 of 64

Dylan Wiliam (@dylanwiliam)

Leadership for teacher learning

www.dylanwiliam.net

2 of 64

Outline: Four questions

  • Where should our efforts be focused?
  • Where does formative assessment fit in?
  • What makes effective teacher learning?
  • What doesn’t get done?

3 of 64

What determines how quickly students learn?

  • Student characteristics
  • School/college organization
  • Instructional quality (i.e., teaching quality)
    • The time teachers have to plan teaching
    • The size of classes
    • The resources available
    • The quality of the curriculum
    • The skills of the teacher
  • All of these are important, but the curriculum and the teacher seems to be especially important

4 of 64

Curriculum

5 of 64

Curriculum matters

  • Comparison of four elementary maths curricula
    • Investigations in Number, Data, and Space (Pearson
    • Math Expressions (Houghton Mifflin)
    • Saxon Math (Harcourt Achieve)
    • Scott Foresman-Addison Wesley Mathematics (Pearson)
  • Curricula randomly assigned to schools
  • Impact on achievement
    • 1st grade: B,C > A,D
    • 2nd grade: B, C, D > A
  • But, it’s not possible to predict outcomes in advance

Agodini, Harris, Seftor, Remillard, and Thomas (2013)

6 of 64

Teacher quality:�Why it matters

7 of 64

Teaching quality and teacher quality

  • Teaching quality is more important than teacher quality
  • But our measures of teaching quality are weak.
  • What we can do is determine whether some teachers are more effective than others

8 of 64

Teacher quality and student achievement

Correlation with progress in

Study

Location

Reading

Math

Rockoff (2004)

New Jersey

0.10

0.11

Nye, Konstantopoulos, Hedges (2004)

Tennessee

0.26

0.36

Rivkin, Hanushek, and Kain (2005)

Texas

0.15

0.11

Aaronson, Barrow, and Sander (2007)

Chicago

0.13

Kane, Rockoff, and Staiger (2008)

New York City

0.08

0.11

Jacob and Lefgren (2008)

0.12

0.26

Kane and Staiger (2008)

0.18

0.22

Koedel and Betts (2009)

San Diego

0.23

Rothstein (2010)

North Carolina

0.11

0.15

Hanushek and Rivkin (2010)

0.11

Chetty et al. (2014)

0.12

0.16

Hanushek and Rivkin (2010)

9 of 64

What does this mean for student progress?

  • Take a group of 50 teachers:
    • Students taught by the most effective teacher in that group of 50 teachers learn in six months what those taught by the average teacher learn in a year
    • Students taught by the least effective teacher in that group of 50 teachers will take two years to achieve the same learning

10 of 64

Teacher quality:�How to get more of it

11 of 64

Strategies for improving teacher quality

  • Replace existing teachers with better ones
    • Replace departing teachers with better ones
    • Accelerate the process by removing less effective teachers
  • Help existing teachers improve

12 of 64

Teacher preparation and selection

  • Teacher preparation has little—and often no—relation to teacher quality (Harris & Sass, 2007)
  • Screening instruments such as California’s Reading Instruction Competency Assessment do not predict teacher quality (Buddin & Zamarro, 2010)

13 of 64

Evaluating teaching

14 of 64

Do we know a good teacher when we see one?

  • Experiment 1
    • Seven teachers (3 high-performing, 4 not)
      • Group 1: at least 0.5 sd above mean value-added for 3 years
      • Group 2: never 0.5 sd above average value-added in 3 years
    • 7 video clips shown to 100 raters
    • Average number of correct ratings: 2.8

Distribution of total correct ratings

0

1

2

3

4

5

6

7

1%

11%

29%

36%

13%

9%

1%

0%

Strong, Gargani, and Hacifazlioğlu (2011)

15 of 64

Ratings by rater type

Rater

Number

Accuracy (%)

Teachers

10

37

Parents

7

37

Mentors

10

47

University professors

9

41

Administrators

10

31

Teacher educators

10

31

College students

11

36

Math educators

10

34

Other adults

11

43

Primary school students

12

50

Rater

Number

Accuracy (%)

Teachers

10

37

Parents

7

37

Mentors

10

47

University professors

9

41

School leaders/deputes

10

31

Teacher educators

10

31

College students

11

36

Math educators

10

34

Other adults

11

43

16 of 64

What if the difference is larger?

  • Experiment 2
    • Two groups of teachers (4 teachers in each group)
      • Group 1: at least 0.5 sd above average value-added
      • Group 2: at least 0.5 sd below average value-added
    • 8 video clips shown to 165 experienced school leaders
    • Average number of correct ratings: 3.85

Distribution of total correct ratings

0

1

2

3

4

5

6

7

8

1%

3%

11%

25%

25%

24%

9%

1%

0%

17 of 64

Can we identify good teachers after training?

18 of 64

Framework for teaching (Danielson 1996)

  • Four domains of professional practice
    • Planning and preparation
    • Classroom environment
    • Instruction
    • Professional responsibilities
  • Links with student achievement (Sartain, et al. 2011)
    • Domains 1 and 4: no impact on student achievement
    • Domains 2 and 3: some impact on student achievement

19 of 64

Observations and teacher quality

Sartain, Stoelinga, Brown, Luppescu, Matsko, Miller, Durwood, Jiang, and Glazer (2011)

So, the highest rated teachers are 30% more productive than the lowest rated

But the best teachers are 400% more productive than the least effective

20 of 64

Unreliability in lesson observations

Achieving a reliability of 0.9 in judging teacher quality through lesson observation is likely to require observing a teacher teaching 6 different classes, and for each lesson to be judged by 5 independent observers.

Hill, Charalambous and Kraft (2012)

21 of 64

Bias in lesson observations

  • A study of 834 teachers from six large US school districts found that teachers were more likely to be given a higher observation rating if they were teaching students with higher achievement.

22 of 64

Bias in lesson observations

Steinberg and Garrett (2016)

23 of 64

Bias in lesson observations

  • A study of 834 teachers from six large US school districts found that teachers were more likely to be given a higher observation rating if they were teaching students with higher achievement.
  • Compared with teachers teaching the lowest achieving students (bottom 20%), those teaching the highest achieving students (top 20%) were:
    • 2.5 times as likely to be top-rated in English
    • 6 times as likely to be top-rated in mathematics

24 of 64

Can we identify good teachers from test scores?

24

25 of 64

Short-term and long-term effects

  • Data: 10,534 students attending USAFA (2000-2007)
  • Students randomly allocated to calculus instructors

25

Carrell and West (2010)

Instructors

Less qualified, less experienced

More qualified, more experienced

Higher end of course scores

Lower end of course scores

Lower scores on follow-on courses

Higher scores on follow-on courses

Higher end of course evaluations

Lower end of course evaluations

Instructors

Less qualified, less experienced

More qualified, more experienced

Higher end of course scores

Lower end of course scores

Lower scores on follow-on courses

Higher scores on follow-on courses

Instructors

Less qualified, less experienced

More qualified, more experienced

Higher end of course scores

Lower end of course scores

Instructors

Less qualified, less experienced

More qualified, more experienced

Instructors

26 of 64

Can we identify good teachers by combining evidence from different sources?

27 of 64

Measures of Effective Teaching project

  • Three sources of evidence on teacher effectiveness
    • Value-added estimates
    • Classroom observation
    • Student perception surveys
  • Prediction accuracy maximised with these weights
    • Value-added estimates 81%
    • Classroom observation 17%
    • Student perception surveys 2%

27

Bill and Melinda Gates Foundation (2012)

28 of 64

28

 For secondary English teachers (S1 to S3)

Correlation with standardized test score gains

0.69

Correlation with higher-order assessments

0.29

Reliability

0.51

To get a 90% reliable prediction of a teacher’s quality, you would need to collect data on each teacher for 9 years

29 of 64

This is what a correlation of 0.69 looks like…

29

Actual

Predicted

30 of 64

…and this is a correlation of 0.29…

30

Actual

Predicted

31 of 64

What is the impact of removing less effective teachers?

31

32 of 64

What if we remove low-performing teachers?

  • Data: reading scores for 4th and 5th grade students in Florida’s public schools from 2004-05 to 2008-09
  • A total of 227,014 students (96%) are matched to 15,152 teachers responsible for teaching reading
  • A value-added score is estimated for each teacher each year
  • Two policy options explored for teacher removal:
    • Value-added score below threshold for two consecutive years
    • Two-year average value-added score below threshold

32

Winters and Cowen (2013)

33 of 64

System-wide impact

33

Policy

Severity (percentile)

Increase in teacher valued-added

Extra weeks of learning per student per year

Consecutive

5th

.003

0.0

10th

.006

0.1

25th

.020

0.3

Two-year average

5th

.020

0.3

10th

.031

0.4

25th

.050

0.7

34 of 64

What does this all mean?

  • The only way to improve student achievement at scale is to invest in the teachers we already have
  • The “love the one you’re with” strategy

34

35 of 64

Evaluation vs. improvement

  • Evaluation frameworks:
    • of necessity, have to be comprehensive
    • include all aspects of teachers work
    • at best, incentivize improvement on all aspects of practice
    • at worst, incentivize improvement on aspects of practice that are easy to improve
  • Improvement frameworks:
    • are selective
    • focus on those aspects of practice with the biggest payoff for students
  • To maximize improvement, evaluation frameworks have to be used selectively
  • Key focus: Opportunity cost

36 of 64

The ‘next big thing’

37 of 64

Things that don’t work

  • Getting smarter people into teaching
  • Paying good teachers more
  • Brain Gym®
  • Learning styles
  • Copying other countries

38 of 64

Things that might work

  • Differentiation
  • Lesson study/Learning study
  • Social and emotional aspects of learning
  • Educational neuroscience
  • Grit

39 of 64

Things that do work—a bit

  • Firing bad teachers
  • Class size reduction
  • Growth mindset

40 of 64

There is no ‘next big thing’

Just lots of small, mostly old, things

41 of 64

Understanding meta-analysis

  • A technique for aggregating results from different studies by expressing results with a common measure
  • Problems with meta-analysis
    • Inappropriate comparisons
    • Aptitude x treatment interaction
    • The “file drawer” problem
    • Variations in intervention quality
    • Selection of studies
  • Problems with effect sizes
    • Variation in population variability
    • Sensitivity of outcome measures

42 of 64

Meta-analysis in education

  • Some problems are unavoidable:
    • Aptitude x treatment interactions
    • Sensitivity to instruction
    • Selection of studies
  • Some problems are avoidable:
    • Inappropriate comparisons
    • File-drawer problems
    • Intervention quality
    • Variation in variability
  • Unfortunately, many of the people doing meta-analysis in education:
    • don’t discuss the unavoidable problems, and
    • don’t avoid the avoidable ones

43 of 64

So what does this mean?

  • Meta-analysis is hard to do well anywhere
  • In education
    • Meta-analysis is really hard to do well
    • Meta-meta-analysis is impossible to do well
  • Rejoinders
    • The effects average out
    • The rank order of effects is still OK
    • There is no reason to suppose that these are the case
  • Conclusion
    • Meta-meta analysis is an unsound basis for determining the impact of any educational intervention on student achievement

44 of 64

Learning from research

  • Some questions we should ask of research
    • Does it solve a problem we have?
    • How much extra achievement will it yield?
    • How much will it cost?
    • Can we implement it here?
  • The most important question
    • What use of teacher professional development time will have the greatest impact on our students’ achievement?

44

45 of 64

Classroom formative assessment

46 of 64

Formative assessment

46

Span

Length

Impact

Long-cycle

Medium-cycle

Short-cycle

Across terms, teaching units

Four weeks to

one year

Monitoring, curriculum alignment

Within and between lessons

Minute-by-minute and day-by-day

Engagement, responsiveness

Within and between teaching units

One to four weeks

Student-involved assessment

46

47 of 64

Unpacking Formative Assessment

47

Where the learner �is going

Where the learner�is now

How to get �the learner there

Teacher

Peer

Student

Clarifying, sharing, and understanding learning intentions

Eliciting evidence of learning

Providing feedback that moves learners forward

Activating students as learning

resources for one another

Activating students as

owners of their own learning

47

48 of 64

Unpacking Formative Assessment

48

Where the learner �is going

Where the learner�is now

How to get �the learner there

Teacher

Peer

Student

Clarifying, sharing, and understanding learning intentions

Eliciting evidence of learning

Providing feedback that moves learners forward

Activating students as learning

resources for one another

Activating students as

owners of their own learning

Responsive teaching

The learner’s role

Before you can begin

48

49 of 64

Strategies and techniques

  • Clarifying, understanding and sharing learning intentions
    • Ranking examples
  • Eliciting evidence
    • All student response systems

50 of 64

50

51 of 64

Strategies and techniques

  • Clarifying, understanding and sharing learning intentions
    • Ranking examples
  • Eliciting evidence
    • All student response systems
  • Feedback that moves learning forward
    • Making feedback into detective work
  • Learners as learning resources for one another
    • Best composite response
  • Learners as owners of their own learning
    • Plus/Minus/Interesting

52 of 64

So much for the easy bit�

53 of 64

Reasons not to do formative assessment

  • Higher achievement isn’t needed
  • These students lack the aptitude
  • I don’t need to improve; I get great results
  • It’s not relevant to my subject
  • I don’t have time
  • We have a syllabus to cover
  • I’m doing it already
  • Parents won’t like it

53

54 of 64

What makes effective teacher learning?

55 of 64

A model for teacher learning

  • Content, then process
  • Content (what we want teachers to change):
    • Evidence
    • Ideas (strategies and techniques)
  • Process (how to go about change):
    • Choice
    • Flexibility
    • Small steps
    • Accountability
    • Support

56 of 64

Supportive accountability

  • What is needed from teachers:
    • A commitment to:
      • The continual improvement of practice
      • Focus on those things that make a difference to students
  • What is needed from leaders:
    • A commitment to engineer effective learning environments for teachers by:
      • Creating expectations for continually improving practice
      • Keeping the focus on the things that make a difference to students
      • Providing the time, space, dispensation, and support for innovation
      • Supporting risk-taking

57 of 64

A “signature pedagogy” for teacher learning

  • Every monthly TLC meeting follows the same structure and sequence of activities:
    • Activity 1: Introduction (5 minutes)
    • Activity 2: Starter activity (5 minutes)
    • Activity 3: Feedback (25–50 minutes)
    • Activity 4: New learning about formative assessment (20–40 minutes)
    • Activity 5: Personal action planning (15 minutes)
    • Activity 6: Review of learning (5 minutes)

57

58 of 64

Every TLC needs a leader

  • The job of the TLC leader(s):
    • To ensure that all necessary resources (including refreshments!) are available at meetings
    • To ensure that the agenda is followed
    • To maintain a collegial and supportive environment
  • But most important of all:
    • It is not to be the formative assessment “expert.”

58

59 of 64

Peer observation

  • Run to the agenda of the observed, not the observer:
    • Observed teacher specifies focus of observation:
      • E.g., teacher wants to increase wait time.
    • Observed teacher specifies what counts as evidence:
      • Provides observer with a stopwatch to log wait times.
    • Observed teacher owns any notes made during the observation.

59

60 of 64

We’ll know when it’s working when…

  • Leading indicators of success
    • Teachers are given time to meet, and do so
    • Teachers increasingly act as “critical friends” to others
    • The prevalence of classroom formative assessment practices is increasing
    • Students are more engaged in classrooms
    • Teachers modify the techniques in appropriate ways, indicating an understanding of the underlying theory
    • There is a shift in the ownership of the reform
  • Lagging indicators of success
    • Increased student achievement

60

61 of 64

The empirical evidence: Large-scale trials

  • Embedding Formative Assessment
    • Whole-school teacher-led 2-year PD program
    • Focus on five strategies of formative assessment
    • Detailed resource packs for groups of 8 - 14 teachers
      • 18 monthly 75-minute meetings (1% of contracted time)
      • Peer observations between meetings
    • Cost:
      • Teacher time: 1% of contracted time
      • Additional cost: $2 per student per year

61

62 of 64

Evaluation

  • Design
    • Pre-registered, cluster randomized evaluation
  • Participating schools
    • 58 treatment, 66 control
    • 22,709 students beginning 9th grade in September 2015
  • Outcome measure
    • “Attainment 8”
      • Average score on exams in 8 subjects taken in May 2017
  • “Intention to treat” analysis
  • Increase in achievement over two years
    • Control schools 0.52 sd
    • Experimental schools 0.65 sd
    • A 25% increase in the rate of student progress

62

Speckesser, Runge, Foliano, Bursnall, Hudson-Sharpe, Rolfe, and Anders (2018)

63 of 64

To find out more…

63

64 of 64

…and even more...