1 of 55

Tale of Two Schools

What authorizers can do when evidence conflicts

�by �Steve Rees, K12 Measures��For the conference of the �California Charter Authorizing Professionals

June 20, 2024

2 of 55

Designed to measure growth using CAASPP

Looking at more or less same kids over 3 to 8 years

Comparing results to schools with highly similar students

Norms for scale score and growth

Using the K12 Measures Assessment Explorer

3 of 55

Growth at School Level

Decide whose growth we’re measuring

Compare using what metric

View that comparison from a certain vantage point

Compare over a certain period of time

Compare who to whom (context)

To estimate a school’s effect on students, we have to …

4 of 55

Growth at School Level

Restructured results by graduating class cohorts

Used scale scores

Viewed of same students (more or less)

Over as many years as possible

Comparing to highly similar students in schools

To build growth estimates from CAASPP results …

5 of 55

The K12 Measures Assessment Explorer’s assumptions

Entity

Students (individual)

Subgroup

Classroom

Grade level

School or�district

Graduating class cohort

Metric

Scale score

Distance�from�standard

Percentage of students meeting or exceeding standard

Percentile

Time

<1 year

1 year

2 years

3 years

4+ years

Context

Your school alone

Your district

Your county average

Similar schools

All schools

State average

Norms

Vantage

Point

Cross-�sectional

Quasi-�longitudinal

Longitudinal

6 of 55

Triangulation slide

7 of 55

When evidence conflicts

The Dashboard versus K12 Measures and the�Stanford Educational Opportunity Project

Yuba River Charter School

189 students tested in 2023
Grass Valley, Nevada County
Waldorf model
Serving K-8 students

8 of 55

When evidence conflicts

The Dashboard versus K12 Measures and the�Stanford Educational Opportunity Project

Yuba River Charter School

189 students tested in 2023
Grass Valley, Nevada County
Waldorf model
Serving K-8 students

9 of 55

Assigned to the middle tier on March 2024 evaluation by CDE.

10 of 55

ELA 47.6% statewide

Math 34.6% statewide

11 of 55

Designed to measure growth (learning rate)

National geographic scope

Looks at state tests from 2009-2018

Provides a context of socio-economic status

Stanford Educational Opportunity Explorer

12 of 55

Average Students’ Test Scores, 2009-18

By Stanford Educational Opportunity Explorer

13 of 55

Average Students’ Learning Rates, 2009-18

By Stanford Educational Opportunity Explorer

14 of 55

By Stanford Educational Opportunity Explorer

Yuba River Charter School As Viewed by the Stanford Educational Opportunity Explorer 2009-2018

ELA and math results are combined to reach these conclusions.

15 of 55

By Stanford Educational Opportunity Explorer

Yuba River Charter School As Viewed by the Stanford Educational Opportunity Explorer 2009-2018

ELA and math results are combined to reach these conclusions.

16 of 55

Yuba River Charter School Grad Class 2026 as �K12 Measures Assessment Explorer Sees It

5^th grade n = 28

17 of 55

Yuba River Charter School Grad Class 2027 as �K12 Measures Assessment Explorer Sees It

4th grade n = 31

18 of 55

Yuba River Charter School Grad Class 2028 as �K12 Measures Assessment Explorer Sees It

6^th grade n = 29

19 of 55

When evidence conflicts

The Dashboard versus K12 Measures and the�Stanford Educational Opportunity Project

Winston Churchill Middle in San Juan USD

935 students tested in 2023
Carmichael, Sacramento County
International Baccalaureate
Serving students in grads 6-8

20 of 55

21 of 55

22 of 55

Average Students’ Test Scores, 2009-18

By Stanford Educational Opportunity Explorer

23 of 55

By Stanford Educational Opportunity Explorer

Average Students’ Learning Rates, 2009-18

24 of 55

By Stanford Educational Opportunity Explorer

Winston Churchill Middle School As Viewed by the Stanford Educational Opportunity Explorer 2009-2018

ELA and math results are combined to reach these conclusions.

25 of 55

Winston Churchill Middle School Grad Class 2027�as K12 Measures Assessment Explorer Sees It

n = 319 students

26 of 55

Winston Churchill Middle School Grad Class 2028 �as K12 Measures Assessment Explorer Sees It

n = 247 students

27 of 55

Summary: Two Schools’ Evidence

Dashboard conflicts with Stanford Ed Opportunity Explorer and K12 Measures Assessment Explorer results

False negative: Lower scores and higher learning rates (Yuba River)
False positive: Higher scores and lower learning rates (Winston Churchill)

28 of 55

Why do the Dashboard’s results conflict to this degree?

29 of 55

How can the Dashboard’s results conflict to this degree?

The Dashboard’s errors are fundamental flaws of four types

30 of 55

How can the Dashboard’s results conflict to this degree?

The Dashboard’s errors are fundamental flaws of four types

Joining year-to-year change with status

31 of 55

How can the Dashboard’s results conflict to this degree?

The Dashboard’s errors are fundamental flaws of four types

Joining year-to-year change with status
Failing to measure changes for the same students over time

32 of 55

How can the Dashboard’s results conflict to this degree?

The Dashboard’s errors are fundamental flaws of four types

Joining year-to-year change with status
Failing to measure changes for the same students over time
Disregarding imprecision and uncertainty

33 of 55

How can the Dashboard’s results conflict to this degree?

The Dashboard’s errors are fundamental flaws of four types

Joining year-to-year change with status
Failing to measure changes for the same students over time
Disregarding imprecision and uncertainty
Gaps compare a subgroup to the whole that it’s part of

34 of 55

Joining year-to-year change with status is a deep logic error

35 of 55

When they are related, like height and weight, the combo has meaning

35

36 of 55

Weather Bureau created a true “signal” when it joined wind with temperature

37 of 55

In Montana, they have a joke about joining two things that be kept apart.

What do you get when you cross a jack rabbit with an antelope?

38 of 55

What do you get when you cross a jack rabbit with an antelope?

… a Jack-a-lope

In Montana, they have a joke about joining two things that be kept apart.

39 of 55

Failing to measure changes for the same students over time

40 of 55

California’s Official Dashboard View

Graduating Class of 2020 in 2016 is yellow

Graduating Class of 2021 in 2016 is green

Graduating Class of 2022 in 2016 is blue

How the CDE Dashboard evaluates CAASPP results for this middle school. The students in this school met standard.

I don’t think so. Here’s why.

Here are the results for a hypothetical middle school with students in grades 6, 7 and 8. Each grade level is in a row. And the students in grades 6, 7 and 8 are different colors to remind us they are in different graduating class cohorts.

In the column to the left, for the year 2016, 6^th graders scored 10 scale score points above a magic line of “meeting standard,” which we’ve come to call Level 3.

Students in grade 7 score, on average, right at that Level 3 line, so they are zero points from meeting standard.

And 8^th graders score, on average, 10 scale score points below level 3. That’s what this -10 means.

The Dashboard takes that year of test results, and adds the distance from Level 3 for each grade level together. The result … zero.

Each year one-third of the kids tested in elementary and middle schools are gone, leaving you with 1/3 of the grade levels filled with different kids. ….

41 of 55

California’s Official Dashboard View

Graduating Class of 2022 in 2016 is blue

Graduating Class of 2021 in 2016 is green

Graduating Class of 2020 in 2016 is yellow

42 of 55

California’s Official Dashboard View

Two years of zero “difference from standard”

Graduating Class of 2022 in 2016 is blue

Graduating Class of 2021 in 2016 is green

Graduating Class of 2020 in 2016 is yellow

43 of 55

California’s Official Dashboard View Ignores Graduating Class Cohorts

0 0 0

Net 50 scale score point gain (DFS) for grad class of 2023, tinted apricot here.

But summing up the results in each year across all three grade levels leads you to zero. Measuring “change” year to year, the Dashboard would conclude “no change” occurred.

Graduating Class of 2020 in 2016 is yellow

Graduating Class of 2021 in 2016 is green

Graduating Class of 2022 in 2016 is blue

Just to make this example more dramatic, let’s look at what the Dashboard would do with a third year….

Another group of 6^th graders enters your school 30 scale score points below the magic line of Level 3 (meeting standard). But those kids who were 6^th graders last year (the salmon colored ones) made phenomenal learning gains (note the arrow). Their test scores as 7^th graders are now 50 scale score points above where they tested. And your blue kids had another good year, holding their own at 10 scale score points above level 3.

What does the Dashboard conclude? They add -30 +20 +10 and again get zero.

++++++ >>> If kids were widgets, interchangeable parts, this might make sense. But let me show you evidence that kids in different graduating class cohorts show different patterns of mastery as they pass through your schools.

44 of 55

Disregarding imprecision and uncertainty

45 of 55

Scores to the right from Morgan Hill USD

Disregarding imprecision

Imprecision for a student

+/- 25

To

+/- 35

Scale score points

46 of 55

School level imprecision

CAASPP Online Reporting System live report for Morgan Hill schools

Disregarding imprecision

+/- 7

+/- 9

47 of 55

School level imprecision

CAASPP Online Reporting System live report for Morgan Hill schools

Disregarding imprecision

+/- 7

+/- 9

48 of 55

Gaps compare a subgroup to the whole to which it belongs

49 of 55

Logic Error

“… any student group was two or more performance levels below the ‘all student’ performance …”

50 of 55

Logic Error

Comparing the part to the whole to which it belongs

51 of 55

Logic Error��Pine Beetle Infestation in Pacific Northwest

Why would you compare the CALIF rate to the rate of infestation in�

(WA + OR + CA)?

52 of 55

Logic Error

Should be comparing each part to the other parts to measure differences

53 of 55

Put the Dashboard aside.

Look at CAASPP results for the same students over years.

Frame CAASPP results within context of highly similar schools.

Ask smarter, more specific questions about evidence of learning.

What can you do now to get a handle on growth?

54 of 55

Steve Rees

Email: steve.rees@schoolwisepress.com

Book website: https://k12measures.com

Company site: �https://schoolwisepress.com

Company: K12 Measures team,�a project of School Wise Press

55 of 55

Resources

Yuba River Charter School’s Assessment Explorer:

https://public.tableau.com/shared/BSKGZBWBT?:display_count=n&:origin=viz_share_link

Winston Churchill Middle School’s Assessment Explorer:

https://public.tableau.com/shared/C4GF6K6ZZ?:display_count=n&:origin=viz_share_link

Link to chapter of “Mismeasuring Schools’ Vital Signs”