1 of 74

Measuring Growth

What growth is (and isn’t) �How to estimate growth correctly (and incorrectly)�What’s wrong with the broken CDE Dashboard

by �Steve Rees�K12 MeasuresFor the �California Charter School Association

March 20, 2024

2 of 74

Measuring Growth of Learning �of a Single Student

3 of 74

Not as easy as measuring height. But we are aiming for something as clear, and just as easy to explain.

4 of 74

Designed to measure growth

Scale enables estimating growth within a year and across the grade spans

Given 3x/year in reading, language usage and math

Delivers 45-60 questions, untimed

Norms for scale score and growth

Using the NWEA Measures of Academic Progress

5 of 74

Individual Student Growth

Morgan from grade 4 to start of grade 7

math

6 of 74

Individual Student Growth

Morgan from grade 4 to start of grade 7

reading

7 of 74

Individual Student Growth

Morgan from grade 4 to start of grade 7

math

reading

8 of 74

Analytic Exercise

What does a multi-year view of test scores reveal that a one-year view does not?

Guiding questions

9 of 74

Analytic Exercise

What does a multi-year view of test scores reveal that a one-year view does not?

What more do you know by seeing results for both math and reading together?

Guiding questions

10 of 74

Individual Student Growth

Connor from grade 4 to start of grade 7

math

reading

11 of 74

Individual Student Growth

Leilani from grade 4 to start of grade 7

math

reading

12 of 74

Growth is not linear. Expect ups and downs.

Some students have extended periods of flat results, followed by bursts of growth.

Sometimes, gains in scores occur over summer, when no schooling has occurred.

Math and reading patterns often differ.

Observations of Learning Growth Patterns

13 of 74

Measuring Growth of Learning �of Grad Class Cohorts Over Time

14 of 74

Planning with evidence from NWEA MAP results

15 of 74

Planning with evidence from NWEA MAP results

Our 3rd graders’ reading scores improved a lot in just 15 weeks, from about the 24th to about the 38th percentile.

16 of 74

NWEA MAP contains assumptions you need to know.

Entity

Students (individual)

Subgroup

Classroom

Grade level

School or�district

Graduating class cohort

Metric

Scale score

Distance�from�standard

Percentage of students meeting or exceeding standard

Percentile

Time

<1 year

1 year

2 years

3 years

4+ years

Context

Your school alone

Your district

Your county average

Similar schools

All schools

State average

Norms

Vantage

Point

Cross-�sectional

Quasi-�longitudinal

Longitudinal

17 of 74

Designed to measure growth using CAASPP

Looking at more or less same kids over 3 to 8 years

Comparing results to schools with highly similar students

Norms for scale score and growth

Using the K12 Measures Assessment Explorer

18 of 74

What is a school’s effect on what students know and can do.

The question we’re asking drives the evidence we’re building

19 of 74

Growth at School Level

What is the question we are trying to answer? It is too often this question. Not good.

“Did our kids in grades 3-5 make as much progress in math last year as kids in the same grade level in California?”

20 of 74

Analysis of same kids over longer time enables us to reduce the noise of student variability and see the school’s effect.

“Did our kids in the graduating classes of 2027 and 2028 make as much progress in math as California kids over the three years they’ve taken the CAASPP?”

21 of 74

Adding a context of schools with highly similar students, enables you to make claims like this.

“Over the last 3 years, our kids in the graduating classes of 2027 and 2028 made more progress in math than highly similar kids in 12 of 15 schools a lot like ours.”

22 of 74

Elements of similarity

Students

  1. Parent education

  • Free or reduced-price lunch

  • English language fluency

23 of 74

Elements of student similarity

24 of 74

Elements of student similarity

25 of 74

Elements of student similarity

26 of 74

Decide whose growth to measure

View that comparison from a certain vantage point

Select the right metric (scale score)

Choose a period of time

Decide who to compare to whom

To estimate a school’s effect on students, we have to …

27 of 74

Restructured results by graduating class cohorts

Used scale scores

Viewed same students (more or less)

Over as many years as possible

Compared to highly similar students in schools serving same grade range

To build growth estimates from CAASPP results …

28 of 74

The Assumptions of the �K12 Measures Assessment Explorer

Entity

Students (individual)

Subgroup

Classroom

Grade level

School or�district

Graduating class cohort

Metric

Scale score

Distance�from�standard

Percentage of students meeting or exceeding standard

Percentile

Time

<1 year

1 year

2 years

3 years

4+ years

Context

Your school alone

Your district

Your county average

Similar schools

All schools

State average

Norms

Vantage

Point

Cross-�sectional

Quasi-�longitudinal

Longitudinal

29 of 74

CAASPP reporting site looks at grad class cohorts

Source: CAASPP reporting site

30 of 74

The Case of Napa Valley USD’s Middle School Math Sag: Did COVID Cause It?

Napa Valley USD

  • About 16,500 students
  • Math results in elementary schools contrast with results in middle schools
  • Can comparability shed light on the question: is it due to COVID?

31 of 74

Napa Valley USD’s Class of 2027

State average scale score

32 of 74

Napa Valley USD’s Class of 2027 in Context

33 of 74

Napa Valley USD’s Class of 2027 in Context

Gilroy USD

Napa Valley USD

34 of 74

Napa Valley USD’s Class of 2026 in Context

35 of 74

Napa Valley USD’s Class of 2028 in Context

36 of 74

Napa Valley USD’s Class of 2028 in Context

37 of 74

When evidence conflicts

The Dashboard versus K12 Measures and the�Stanford Educational Opportunity Project

Yuba River Charter School

  • 189 students tested in 2023
  • Grass Valley, Nevada County
  • Waldorf model
  • Serving K-8 students
  • Launched in 1994

38 of 74

39 of 74

Designed to measure growth (learning rate)

National in scope, covering schools and districts

Looks at state tests from 2009-2018

Provides a context of socio-economic status

Stanford Educational Opportunity Explorer

40 of 74

Average Students’ Test Scores, 2009-2018

Average Students’ Test Scores, 2009-18

By Stanford Educational Opportunity Explorer

41 of 74

Average Students’ Learning Rates, 2009-18

By Stanford Educational Opportunity Explorer

42 of 74

By Stanford Educational Opportunity Explorer

Yuba River Charter School As Viewed by the Stanford Educational Opportunity Explorer 2009-2018

ELA and math results are combined to reach these conclusions.

43 of 74

Yuba River Charter School (Class of 2028) as �K12 Measures Assessment Explorer Sees It

44 of 74

Yuba River Charter School (Class of 2027) as �K12 Measures Assessment Explorer Sees It

45 of 74

Yuba River Charter School (Class of 2026) as �K12 Measures Assessment Explorer Sees It

46 of 74

How can the Dashboard’s results conflict to this degree?

47 of 74

How can the Dashboard’s results conflict to this degree?

The Dashboard’s errors are fundamental flaws of four types

48 of 74

How can the Dashboard’s results conflict to this degree?

The Dashboard’s errors are fundamental flaws of four types

  1. Joining year-to-year change with status

49 of 74

How can the Dashboard’s results conflict to this degree?

The Dashboard’s errors are fundamental flaws of four types

  1. Joining year-to-year change with status
  2. Failing to measure changes for the same students over time

50 of 74

How can the Dashboard’s results conflict to this degree?

The Dashboard’s errors are fundamental flaws of four types

  1. Joining year-to-year change with status
  2. Failing to measure changes for the same students over time
  3. Disregarding imprecision and uncertainty

51 of 74

How can the Dashboard’s results conflict to this degree?

The Dashboard’s errors are fundamental flaws of four types

  1. Joining year-to-year change with status
  2. Failing to measure changes for the same students over time
  3. Disregarding imprecision and uncertainty
  4. Comparing a subgroup to the whole it’s part of to calculate “gaps”

52 of 74

Joining year-to-year change with status is a basic logic error

53 of 74

When they are related, like height and weight, the combo has meaning

53

54 of 74

Weather Bureau created a true “signal” when it created the windchill factor

55 of 74

A cowboy joke about joining two things that should be kept apart.

What do you get when you cross a jack rabbit with an antelope?

56 of 74

What do you get when you cross a jack rabbit with an antelope?

… a Jack-a-lope

A cowboy joke about joining two things that should be kept apart.

57 of 74

Failing to measure changes for the same students over time

58 of 74

California’s Official Dashboard View

Graduating Class of 2020 in 2016 is yellow

Graduating Class of 2021 in 2016 is green

Graduating Class of 2022 in 2016 is blue

How the CDE Dashboard evaluates CAASPP results for this middle school. The students in this school met standard.

59 of 74

California’s Official Dashboard View

Graduating Class of 2022 in 2016 is blue

Graduating Class of 2021 in 2016 is green

Graduating Class of 2020 in 2016 is yellow

60 of 74

California’s Official Dashboard View

Two years of zero “difference from standard”

Graduating Class of 2022 in 2016 is blue

Graduating Class of 2021 in 2016 is green

Graduating Class of 2020 in 2016 is yellow

61 of 74

California’s Official Dashboard View Ignores Graduating Class Cohorts

0 0 0

Net 50 scale score point gain (DFS) for grad class of 2023, tinted apricot here.

But summing up the results in each year across all three grade levels leads you to zero. Measuring “change” year to year, the Dashboard would conclude “no change” occurred.

Graduating Class of 2020 in 2016 is yellow

Graduating Class of 2021 in 2016 is green

Graduating Class of 2022 in 2016 is blue

62 of 74

Disregarding imprecision and uncertainty

63 of 74

Scores to the right from Morgan Hill USD

Disregarding imprecision: student level

Imprecision for a student

+/- 25

To

+/- 35

Scale score points

64 of 74

School level imprecision

CAASPP Online Reporting System live report for Morgan Hill schools

Disregarding imprecision: school level

+/- 7

+/- 9

65 of 74

School level imprecision

CAASPP Online Reporting System live report for Morgan Hill schools

Disregarding imprecision

+/- 7

+/- 9

66 of 74

School level imprecision

CAASPP Online Reporting System live report for Morgan Hill schools

Disregarding imprecision

+/- 7

+/- 9

Maintained +/- 3 pts

67 of 74

Gaps compare a subgroup to the whole to which it belongs

68 of 74

Logic Error: Gaps

“… of any student group was two or more performance levels below the ‘all student’ performance …”

69 of 74

Logic Error: Gaps

Comparing the part to the whole to which it belongs

70 of 74

Logic Error: Gaps��Pine Beetle Infestation in Pacific Northwest

Why would you compare the CALIF rate to the rate of infestation in�

(WA + OR + CA)?

71 of 74

Logic Error: Gaps

Should be comparing each part to the other parts to measure differences

72 of 74

Put the Dashboard aside. Rely on higher quality evidence.

Look at CAASPP results for the same students over 3+ years.

Frame CAASPP results within context of highly similar schools.

Six steps you can you take now to get a handle on vital signs

73 of 74

Ask more specific questions about evidence of learning.

Interrogate the evidence together, like doctors at a case conference.

If you need evidence that isn’t available, build it yourself.

Six steps you can you take now to get a handle on vital signs

74 of 74

Steve Rees

Email: steve.rees@schoolwisepress.com

Book website: https://k12measures.com

Company site: �https://schoolwisepress.com

Company: K12 Measures team,�a project of School Wise Press

Discount code SS254 for 25% off