Tale of Two Schools
What authorizers can do when evidence conflicts
�by �Steve Rees, K12 Measures��For the conference of the �California Charter Authorizing Professionals
June 20, 2024
Designed to measure growth using CAASPP
Looking at more or less same kids over 3 to 8 years
Comparing results to schools with highly similar students
Norms for scale score and growth
Using the K12 Measures Assessment Explorer
Growth at School Level
Decide whose growth we’re measuring
Compare using what metric
View that comparison from a certain vantage point
Compare over a certain period of time
Compare who to whom (context)
To estimate a school’s effect on students, we have to …
Growth at School Level
Restructured results by graduating class cohorts
Used scale scores
Viewed of same students (more or less)
Over as many years as possible
Comparing to highly similar students in schools
To build growth estimates from CAASPP results …
The K12 Measures Assessment Explorer’s assumptions
Entity
Students (individual)
Subgroup
Classroom
Grade level
School or�district
Graduating class cohort
Metric
Scale score
Distance�from�standard
Percentage of students meeting or exceeding standard
Percentile
Time
<1 year
1 year
2 years
3 years
4+ years
Context
Your school alone
Your district
Your county average
Similar schools
All schools
State average
Norms
Vantage
Point
Cross-�sectional
Quasi-�longitudinal
Longitudinal
Triangulation slide
When evidence conflicts
The Dashboard versus K12 Measures and the�Stanford Educational Opportunity Project
Yuba River Charter School
When evidence conflicts
The Dashboard versus K12 Measures and the�Stanford Educational Opportunity Project
Yuba River Charter School
Assigned to the middle tier on March 2024 evaluation by CDE.
ELA 47.6% statewide
Math 34.6% statewide
Designed to measure growth (learning rate)
National geographic scope
Looks at state tests from 2009-2018
Provides a context of socio-economic status
Stanford Educational Opportunity Explorer
Average Students’ Test Scores, 2009-18
By Stanford Educational Opportunity Explorer
Average Students’ Learning Rates, 2009-18
By Stanford Educational Opportunity Explorer
By Stanford Educational Opportunity Explorer
Yuba River Charter School As Viewed by the Stanford Educational Opportunity Explorer 2009-2018
ELA and math results are combined to reach these conclusions.
By Stanford Educational Opportunity Explorer
Yuba River Charter School As Viewed by the Stanford Educational Opportunity Explorer 2009-2018
ELA and math results are combined to reach these conclusions.
Yuba River Charter School Grad Class 2026 as �K12 Measures Assessment Explorer Sees It
5th grade n = 28
Yuba River Charter School Grad Class 2027 as �K12 Measures Assessment Explorer Sees It
4th grade n = 31
Yuba River Charter School Grad Class 2028 as �K12 Measures Assessment Explorer Sees It
6th grade n = 29
When evidence conflicts
The Dashboard versus K12 Measures and the�Stanford Educational Opportunity Project
Winston Churchill Middle in San Juan USD
Average Students’ Test Scores, 2009-18
By Stanford Educational Opportunity Explorer
By Stanford Educational Opportunity Explorer
Average Students’ Learning Rates, 2009-18
By Stanford Educational Opportunity Explorer
Winston Churchill Middle School As Viewed by the Stanford Educational Opportunity Explorer 2009-2018
ELA and math results are combined to reach these conclusions.
Winston Churchill Middle School Grad Class 2027�as K12 Measures Assessment Explorer Sees It
n = 319 students
Winston Churchill Middle School Grad Class 2028 �as K12 Measures Assessment Explorer Sees It
n = 247 students
Summary: Two Schools’ Evidence
Dashboard conflicts with Stanford Ed Opportunity Explorer and K12 Measures Assessment Explorer results
Why do the Dashboard’s results conflict to this degree?
How can the Dashboard’s results conflict to this degree?
The Dashboard’s errors are fundamental flaws of four types
How can the Dashboard’s results conflict to this degree?
The Dashboard’s errors are fundamental flaws of four types
How can the Dashboard’s results conflict to this degree?
The Dashboard’s errors are fundamental flaws of four types
How can the Dashboard’s results conflict to this degree?
The Dashboard’s errors are fundamental flaws of four types
How can the Dashboard’s results conflict to this degree?
The Dashboard’s errors are fundamental flaws of four types
Joining year-to-year change with status is a deep logic error
When they are related, like height and weight, the combo has meaning
35
Weather Bureau created a true “signal” when it joined wind with temperature
In Montana, they have a joke about joining two things that be kept apart.
What do you get when you cross a jack rabbit with an antelope?
What do you get when you cross a jack rabbit with an antelope?
… a Jack-a-lope
In Montana, they have a joke about joining two things that be kept apart.
Failing to measure changes for the same students over time
California’s Official Dashboard View
Graduating Class of 2020 in 2016 is yellow
Graduating Class of 2021 in 2016 is green
Graduating Class of 2022 in 2016 is blue
How the CDE Dashboard evaluates CAASPP results for this middle school. The students in this school met standard.
California’s Official Dashboard View
Graduating Class of 2022 in 2016 is blue
Graduating Class of 2021 in 2016 is green
Graduating Class of 2020 in 2016 is yellow
California’s Official Dashboard View
Two years of zero “difference from standard”
Graduating Class of 2022 in 2016 is blue
Graduating Class of 2021 in 2016 is green
Graduating Class of 2020 in 2016 is yellow
California’s Official Dashboard View Ignores Graduating Class Cohorts
0 0 0
Net 50 scale score point gain (DFS) for grad class of 2023, tinted apricot here.
But summing up the results in each year across all three grade levels leads you to zero. Measuring “change” year to year, the Dashboard would conclude “no change” occurred.
Graduating Class of 2020 in 2016 is yellow
Graduating Class of 2021 in 2016 is green
Graduating Class of 2022 in 2016 is blue
Disregarding imprecision and uncertainty
Scores to the right from Morgan Hill USD
Disregarding imprecision
Imprecision for a student
+/- 25
To
+/- 35
Scale score points
School level imprecision
CAASPP Online Reporting System live report for Morgan Hill schools
Disregarding imprecision
+/- 7
+/- 9
School level imprecision
CAASPP Online Reporting System live report for Morgan Hill schools
Disregarding imprecision
+/- 7
+/- 9
Gaps compare a subgroup to the whole to which it belongs
Logic Error
“… any student group was two or more performance levels below the ‘all student’ performance …”
Logic Error
Comparing the part to the whole to which it belongs
Logic Error��Pine Beetle Infestation in Pacific Northwest
Why would you compare the CALIF rate to the rate of infestation in�
(WA + OR + CA)?
Logic Error
Should be comparing each part to the other parts to measure differences
Put the Dashboard aside.
Look at CAASPP results for the same students over years.
Frame CAASPP results within context of highly similar schools.
Ask smarter, more specific questions about evidence of learning.
What can you do now to get a handle on growth?
Steve Rees
Email: steve.rees@schoolwisepress.com
Book website: https://k12measures.com
Company site: �https://schoolwisepress.com
Company: K12 Measures team,�a project of School Wise Press
Resources
Yuba River Charter School’s Assessment Explorer:
https://public.tableau.com/shared/BSKGZBWBT?:display_count=n&:origin=viz_share_link
Winston Churchill Middle School’s Assessment Explorer:
https://public.tableau.com/shared/C4GF6K6ZZ?:display_count=n&:origin=viz_share_link
Link to chapter of “Mismeasuring Schools’ Vital Signs”