Measuring Growth
�What growth is (and isn’t) �How to estimate growth correctly (and incorrectly)�What’s wrong with the broken CDE Dashboard
�by �Steve Rees�K12 Measures��For the �California Charter School Association
March 20, 2024
Measuring Growth of Learning �of a Single Student
Not as easy as measuring height. But we are aiming for something as clear, and just as easy to explain.
Designed to measure growth
Scale enables estimating growth within a year and across the grade spans
Given 3x/year in reading, language usage and math
Delivers 45-60 questions, untimed
Norms for scale score and growth
Using the NWEA Measures of Academic Progress
Individual Student Growth
Morgan from grade 4 to start of grade 7
math
Individual Student Growth
Morgan from grade 4 to start of grade 7
reading
Individual Student Growth
Morgan from grade 4 to start of grade 7
math
reading
Analytic Exercise
What does a multi-year view of test scores reveal that a one-year view does not?
Guiding questions
Analytic Exercise
What does a multi-year view of test scores reveal that a one-year view does not?
What more do you know by seeing results for both math and reading together?
Guiding questions
Individual Student Growth
Connor from grade 4 to start of grade 7
math
reading
Individual Student Growth
Leilani from grade 4 to start of grade 7
math
reading
Growth is not linear. Expect ups and downs.
Some students have extended periods of flat results, followed by bursts of growth.
Sometimes, gains in scores occur over summer, when no schooling has occurred.
Math and reading patterns often differ.
Observations of Learning Growth Patterns
Measuring Growth of Learning �of Grad Class Cohorts Over Time
Planning with evidence from NWEA MAP results
Planning with evidence from NWEA MAP results
Our 3rd graders’ reading scores improved a lot in just 15 weeks, from about the 24th to about the 38th percentile.
NWEA MAP contains assumptions you need to know.
Entity
Students (individual)
Subgroup
Classroom
Grade level
School or�district
Graduating class cohort
Metric
Scale score
Distance�from�standard
Percentage of students meeting or exceeding standard
Percentile
Time
<1 year
1 year
2 years
3 years
4+ years
Context
Your school alone
Your district
Your county average
Similar schools
All schools
State average
Norms
Vantage
Point
Cross-�sectional
Quasi-�longitudinal
Longitudinal
Designed to measure growth using CAASPP
Looking at more or less same kids over 3 to 8 years
Comparing results to schools with highly similar students
Norms for scale score and growth
Using the K12 Measures Assessment Explorer
What is a school’s effect on what students know and can do.
The question we’re asking drives the evidence we’re building
Growth at School Level
What is the question we are trying to answer? It is too often this question. Not good.
“Did our kids in grades 3-5 make as much progress in math last year as kids in the same grade level in California?”
Analysis of same kids over longer time enables us to reduce the noise of student variability and see the school’s effect.
“Did our kids in the graduating classes of 2027 and 2028 make as much progress in math as California kids over the three years they’ve taken the CAASPP?”
Adding a context of schools with highly similar students, enables you to make claims like this.
“Over the last 3 years, our kids in the graduating classes of 2027 and 2028 made more progress in math than highly similar kids in 12 of 15 schools a lot like ours.”
Elements of similarity
Students
Elements of student similarity
Elements of student similarity
Elements of student similarity
Decide whose growth to measure
View that comparison from a certain vantage point
Select the right metric (scale score)
Choose a period of time
Decide who to compare to whom
To estimate a school’s effect on students, we have to …
Restructured results by graduating class cohorts
Used scale scores
Viewed same students (more or less)
Over as many years as possible
Compared to highly similar students in schools serving same grade range
To build growth estimates from CAASPP results …
The Assumptions of the �K12 Measures Assessment Explorer
Entity
Students (individual)
Subgroup
Classroom
Grade level
School or�district
Graduating class cohort
Metric
Scale score
Distance�from�standard
Percentage of students meeting or exceeding standard
Percentile
Time
<1 year
1 year
2 years
3 years
4+ years
Context
Your school alone
Your district
Your county average
Similar schools
All schools
State average
Norms
Vantage
Point
Cross-�sectional
Quasi-�longitudinal
Longitudinal
CAASPP reporting site looks at grad class cohorts
Source: CAASPP reporting site
The Case of Napa Valley USD’s Middle School Math Sag: Did COVID Cause It?
Napa Valley USD
Napa Valley USD’s Class of 2027
State average scale score
Napa Valley USD’s Class of 2027 in Context
Napa Valley USD’s Class of 2027 in Context
Gilroy USD
Napa Valley USD
Napa Valley USD’s Class of 2026 in Context
Napa Valley USD’s Class of 2028 in Context
Napa Valley USD’s Class of 2028 in Context
When evidence conflicts
The Dashboard versus K12 Measures and the�Stanford Educational Opportunity Project
Yuba River Charter School
Designed to measure growth (learning rate)
National in scope, covering schools and districts
Looks at state tests from 2009-2018
Provides a context of socio-economic status
Stanford Educational Opportunity Explorer
Average Students’ Test Scores, 2009-2018
Average Students’ Test Scores, 2009-18
By Stanford Educational Opportunity Explorer
Average Students’ Learning Rates, 2009-18
By Stanford Educational Opportunity Explorer
By Stanford Educational Opportunity Explorer
Yuba River Charter School As Viewed by the Stanford Educational Opportunity Explorer 2009-2018
ELA and math results are combined to reach these conclusions.
Yuba River Charter School (Class of 2028) as �K12 Measures Assessment Explorer Sees It
Yuba River Charter School (Class of 2027) as �K12 Measures Assessment Explorer Sees It
Yuba River Charter School (Class of 2026) as �K12 Measures Assessment Explorer Sees It
How can the Dashboard’s results conflict to this degree?
How can the Dashboard’s results conflict to this degree?
The Dashboard’s errors are fundamental flaws of four types
How can the Dashboard’s results conflict to this degree?
The Dashboard’s errors are fundamental flaws of four types
How can the Dashboard’s results conflict to this degree?
The Dashboard’s errors are fundamental flaws of four types
How can the Dashboard’s results conflict to this degree?
The Dashboard’s errors are fundamental flaws of four types
How can the Dashboard’s results conflict to this degree?
The Dashboard’s errors are fundamental flaws of four types
Joining year-to-year change with status is a basic logic error
When they are related, like height and weight, the combo has meaning
53
Weather Bureau created a true “signal” when it created the windchill factor
A cowboy joke about joining two things that should be kept apart.
What do you get when you cross a jack rabbit with an antelope?
What do you get when you cross a jack rabbit with an antelope?
… a Jack-a-lope
A cowboy joke about joining two things that should be kept apart.
Failing to measure changes for the same students over time
California’s Official Dashboard View
Graduating Class of 2020 in 2016 is yellow
Graduating Class of 2021 in 2016 is green
Graduating Class of 2022 in 2016 is blue
How the CDE Dashboard evaluates CAASPP results for this middle school. The students in this school met standard.
California’s Official Dashboard View
Graduating Class of 2022 in 2016 is blue
Graduating Class of 2021 in 2016 is green
Graduating Class of 2020 in 2016 is yellow
California’s Official Dashboard View
Two years of zero “difference from standard”
Graduating Class of 2022 in 2016 is blue
Graduating Class of 2021 in 2016 is green
Graduating Class of 2020 in 2016 is yellow
California’s Official Dashboard View Ignores Graduating Class Cohorts
0 0 0
Net 50 scale score point gain (DFS) for grad class of 2023, tinted apricot here.
But summing up the results in each year across all three grade levels leads you to zero. Measuring “change” year to year, the Dashboard would conclude “no change” occurred.
Graduating Class of 2020 in 2016 is yellow
Graduating Class of 2021 in 2016 is green
Graduating Class of 2022 in 2016 is blue
Disregarding imprecision and uncertainty
Scores to the right from Morgan Hill USD
Disregarding imprecision: student level
Imprecision for a student
+/- 25
To
+/- 35
Scale score points
School level imprecision
CAASPP Online Reporting System live report for Morgan Hill schools
Disregarding imprecision: school level
+/- 7
+/- 9
School level imprecision
CAASPP Online Reporting System live report for Morgan Hill schools
Disregarding imprecision
+/- 7
+/- 9
School level imprecision
CAASPP Online Reporting System live report for Morgan Hill schools
Disregarding imprecision
+/- 7
+/- 9
Maintained +/- 3 pts
Gaps compare a subgroup to the whole to which it belongs
Logic Error: Gaps
“… of any student group was two or more performance levels below the ‘all student’ performance …”
Logic Error: Gaps
Comparing the part to the whole to which it belongs
Logic Error: Gaps��Pine Beetle Infestation in Pacific Northwest
Why would you compare the CALIF rate to the rate of infestation in�
(WA + OR + CA)?
Logic Error: Gaps
Should be comparing each part to the other parts to measure differences
Put the Dashboard aside. Rely on higher quality evidence.
Look at CAASPP results for the same students over 3+ years.
Frame CAASPP results within context of highly similar schools.
Six steps you can you take now to get a handle on vital signs
Ask more specific questions about evidence of learning.
Interrogate the evidence together, like doctors at a case conference.
If you need evidence that isn’t available, build it yourself.
Six steps you can you take now to get a handle on vital signs
Steve Rees
Email: steve.rees@schoolwisepress.com
Book website: https://k12measures.com
Company site: �https://schoolwisepress.com
Company: K12 Measures team,�a project of School Wise Press
Discount code SS254 for 25% off