Evaluating humans in the age of AI
A backward-design approach to assessing student-AI collaborative work
Kiera Allison, University of Virginia
Cold open
Cold open
Have your answers to any of these questions changed since the emergence of GenAI?
Old questions
New urgency
What is the unit of comparison in educational assessment?
HUMAN
HUMAN+
“[I]t is not just the ‘person-solo’ who learns, but the ‘person-plus,’ the whole system of interrelated factors.”
Salomon, Distributed Cognitions
What is the unit of comparison in educational assessment?
HUMAN
HUMAN+
“To measure something is to separate it from its neighbors”
--James Vincent, 2022
Old tensions
New urgency
What is the unit of comparison in educational assessment?
HUMAN
HUMAN+
“To measure something is to separate it from its neighbors”
--James Vincent, 2022
Student-AI Competency
Student-AI Competency
Student?
“If someone is terrific at writing memos but horrible at using AI, they could have the same output as someone who is terrible at memo writing yet brilliant at using AI.”
How to assess the human in the context of AI
How to assess the human in the context of AI
Mega-SoTL Investigation of Artificial Intelligence In Teaching And Learning: Student Perspectives
Kiera Allison, Charlotte Hoopes, Gianluca Guadagni, Katya Koubek, Breana Bayraktar, Dayna Henry, Jess Taggart
This project is supported by a Fund for Excellence and Innovation (FFEI) grant from the State Council for Higher Education of Virginia (SCHEV).
Our study
HUMAN
HUMAN+
PART 1: Assessing the human within the collaboration
HUMAN
HUMAN+
PART 2: Assessing the collaboration as collaboration
1: The “human-alone”
What students said they contributed, and how they would want that work evaluated
Grade the assignment as usual
“I think it could be assessed based on the quality of the work since one must plug in a detailed prompt for a well written and completed answer.”
“I think that it should be assessed the same, because I think that you do need a genuine understanding of the material and topics to ensure that AI is not giving you incorrect answers.”
“[Y]es it is written by AI but I think if you use AI to do vast research and incorporate it into assignments, it will look very different from an assignment that is simply just done by AI by giving it the prompt.”
“Based on the output, the same as any other assignment. The better you use [AI], the better the assignment.”
“Input = output”
“I think it could be assessed based on the quality of the work since one must plug in a detailed prompt for a well written and completed answer.”
“I think that it should be assessed the same, because I think that you do need a genuine understanding of the material and topics to ensure that AI is not giving you incorrect answers.”
“[Y]es it is written by AI but I think if you use AI to do vast research and incorporate it into assignments, it will look very different from an assignment that is simply just done by AI by giving it the prompt.”
“Based on the output, the same as any other assignment. The better you use [AI], the better the assignment.”
Grade for humanness
“It should pass the ‘eye test’ - shouldn't look like it was written with AI. Beyond that, I think standard expectations are enough. If an AI-assisted assignment is good enough to earn an A without the reader knowing about the AI assistance, it should be good enough regardless.”
“I would grade it on how non-AI the assignment sounds. That is, people don't want to read something or look at something that they think was made by a chatbot. So, I'd say that the business skill here is passing off AI's work as that of a human.”
Compare student work to AI benchmarks
“I think it could be worth seeing the difference between a student written letter and an ai letter and judge which one is better.”
“I think the assignment should be compared to an all AI written assignment to understand the contribution of the human writer.”
Compare student work to AI benchmarks
“I think it could be worth seeing the difference between a student written letter and an ai letter and judge which one is better.”
“I think the assignment should be compared to an all AI written assignment to understand the contribution of the human writer.”
“Rather than banning AI, let’s just ban all C work.” (p.151)
Target the human advantage
“I think it should be assessed based on the quality of the ideas and how much they make sense. I don't think it should be assessed on the writing style or punctuation because AI does a good job in that, (sometimes it sounds robotic) but usually the grammar is correct.”
“I think it should be assessed largely based on effort and not knowledge or content, because those largely come from the AI.”
Target the human advantage
“I think it should be assessed based on the strength of the argument, and analysis of the human interactions in the call. Some things AI cannot detect, and we as humans can, such as the tone of the call, certain phrases used, all this.”
“It should be assessed based on clarity and conciseness of the argument, as AI tends to seek compromise instead of choosing a side.”
“I think it should be assessed based on the quality of the ideas and how much they make sense. I don't think it should be assessed on the writing style or punctuation because AI does a good job in that, (sometimes it sounds robotic) but usually the grammar is correct.”
“I think it should be assessed largely based on effort and not knowledge or content, because those largely come from the AI.”
Input = output
“In normal assignments, a grade is a reflection of correctness, skill, effort, and time put in, weighted in that order. For an AI based assignment, I would argue it is just a measure of time put in, and even that is not measured accurately.”
Input ≠ output
Grade the process
“I think you should judge the process not the outcome. If someone is terrific at writing memos but horrible at using AI, they could have the same output as someone who is terrible at memo writing yet brilliant at using AI. The process should be included in the grading.”
“I think it should be assessed purely on what the student can control. I wouldn't grade too heavily on what the AI outputs, but more on how the student is interacting with it.”
“Because not everyone will have the optimum version of AI, such as ChatGPT 4.0, the generated answers may give a disadvantage to those not having access to those premium versions. The best assessment would be on how the student prompts the AI to generate answers.”
Or take the AI out of it
“I think it should be assessed solely on the content that is not utilizing AI”…“on the quality of work presented, organization, and clear difference in effort from human and ai intelligence”…“I think that what should be assessed is what the student produces, not what AI produces. The student must have the ideas and the AI just helps bring it to life.” “I think that anything created by the generative AI should be completely and wholeheartedly separate. I think the idea of having reflections as well as documented interactions with AI does not devalue the work, and relying on the honor system in the other, non-AI, portions of the assignment is effective in assessment”… “I feel that it should be kept separate while being assessed”… “The work of the human should be separated from AI. AI should handle the less important details and humans should deal with the deeper analysis”… “Although I have never completed an assignment that was designed to be completed jointly with generative AI, I think that these assignments should be grade[d] more-so on portion of the task where the work is produced by the individual rather than by a computer.” “ I think it should be assessed based on what I produce, not necessarily what AI produces.”
…then separate themselves from AI for assessment purposes
We’re asking students to collaborate with AI on an activity
?
2: The “human-plus”
Suggestions toward a collaborative grading paradigm
HUMAN
HUMAN+
AI
AI+HUMAN
Serina Chang et al. (Microsoft Research, UC Berkeley), 2025
“ChatBench: From Static Benchmarks to Human-AI Evaluation”
“I think the goal of our memo was to effectively make a recommendation for a stock. Regardless of whether AI did a lot of the heavy lifting or a human did, as long as the memo communicates the goal effectively, it should be graded the same.”
“The goal of this class is to be able to communicate as efficiently as possible, if AI takes it to the next level, then that should be acknowledged as an advantage over not using it.”
“The more that a students ‘challenges’ an AI tool to produce the ‘best’ or most ‘sophisticated’ output should be reflected in the grade they receive.”
“[Assessment should be based on] Thoughtfulness and obvious openness to the assignment (AI is still unpredictable)”
“Based on willingness to fully collaborate with AI”
“There was no clear separation of what was AI and what was human.”
“I think the goal of our memo was to effectively make a recommendation for a stock. Regardless of whether AI did a lot of the heavy lifting or a human did, as long as the memo communicates the goal effectively, it should be graded the same.”
“The goal of this class is to be able to communicate as efficiently as possible, if AI takes it to the next level, then that should be acknowledged as an advantage over not using it.”
“The more that a students ‘challenges’ an AI tool to produce the ‘best’ or most ‘sophisticated’ output should be reflected in the grade they receive.”
“[Assessment should be based on] Thoughtfulness and obvious openness to the assignment (AI is still unpredictable)”
“Based on willingness to fully collaborate with AI”
“There was no clear separation of what was AI and what was human.”
How to assess human-AI collaboration (working recommendations)
3: “Do Something Impossible with AI”
Praxis 3
Persuasion Impossible
Choose a persuasive task that feels unusually difficult or impossible.
Then work with AI to achieve the task.
The Challenge
Scaffolding
Students know their strengths and weaknesses
They know when they’ve been successful
They have a large AI toolbox
Part One: Initial Proposal (200-400 words)
Part Two: Document Your Process
Part Three: Presentation Showcase
Part Four: Rubric Design
�The project will be graded on:
Task Ambition
Centers a genuinely difficult challenge, shows clear vision of enhanced outcomes, and deploys AI strategically to help you go further than ever before as a persuader.
AI Collaboration
Demonstrates sophisticated use of AI, clearly documenting your learning, key takeaways, and breakthrough moments.
Persuasive Impact
Polished and professional in-class presentation within time limits. Convincingly demonstrates “impossible” achievement and innovative use of AI.
Meta-Analysis
Clear insights into human-AI dynamics in persuasion, combined with deep awareness of persuasion as a discipline. Rubric contains specific, measurable criteria for evaluating your human contribution to the project.
10% of your course grade
=
Project assessment
The rubric you’ve created!
Old tensions
New urgency
opportunities