1 of 49

Evaluating humans in the age of AI

A backward-design approach to assessing student-AI collaborative work

Kiera Allison, University of Virginia

2 of 49

Cold open

  1. What is the primary purpose of assessment in your courses? (e.g. formative, summative, informational for you/the student, accreditation)
  2. To what standard are you comparing student performance? (e.g. assignment benchmarks, rubric criteria, professional standards, other students, their own past performance)
  3. What variables do you try to control or standardize in your assessment? (e.g. time, resource access, open/closed notes, collaboration rules)
  4. How do you distinguish between acceptable and unacceptable forms of assistance in your assessment?
  5. What's the greatest challenge to fair assessment in your context?

3 of 49

Cold open

  1. What is the primary purpose of assessment in your courses? (e.g. formative, summative, informational for you/the student, accreditation)
  2. To what standard are you comparing student performance? (e.g. assignment benchmarks, rubric criteria, professional standards, other students, their own past performance)
  3. What variables do you try to control or standardize in your assessment? (e.g. time, resource access, open/closed notes, collaboration rules)
  4. How do you distinguish between acceptable and unacceptable forms of assistance in your assessment?
  5. What's the greatest challenge to fair assessment in your context?

Have your answers to any of these questions changed since the emergence of GenAI?

4 of 49

Old questions

New urgency

5 of 49

What is the unit of comparison in educational assessment?

HUMAN

HUMAN+

6 of 49

“[I]t is not just the ‘person-solo’ who learns, but the ‘person-plus,’ the whole system of interrelated factors.”

Salomon, Distributed Cognitions

7 of 49

What is the unit of comparison in educational assessment?

HUMAN

HUMAN+

“To measure something is to separate it from its neighbors”

--James Vincent, 2022

8 of 49

Old tensions

New urgency

9 of 49

What is the unit of comparison in educational assessment?

HUMAN

HUMAN+

“To measure something is to separate it from its neighbors”

--James Vincent, 2022

10 of 49

11 of 49

12 of 49

Student-AI Competency

13 of 49

Student-AI Competency

Student?

“If someone is terrific at writing memos but horrible at using AI, they could have the same output as someone who is terrible at memo writing yet brilliant at using AI.”

14 of 49

How to assess the human in the context of AI

  • Option 1: Eliminate (or postpone) the tool
  • Option 2: Standardize the tool

15 of 49

16 of 49

How to assess the human in the context of AI

  • Option 1: Eliminate (or postpone) the tool
  • Option 2: Standardize the tool
  • Option 3: Students document how they are using the tool

17 of 49

  1. HOW CAN WE EFFECTIVELY MEASURE AND ASSESS THE COGNITIVE WORK OF STUDENTS WITHIN THEIR AI COLLABORATIONS? 
  2. HOW DO STUDENTS PERCEIVE THEIR ROLE AND CONTRIBUTION WITHIN AI-ASSISTED PROJECTS?

Mega-SoTL Investigation of Artificial Intelligence In Teaching And Learning: Student Perspectives

Kiera Allison, Charlotte Hoopes, Gianluca Guadagni, Katya Koubek, Breana Bayraktar, Dayna Henry, Jess Taggart

This project is supported by a Fund for Excellence and Innovation (FFEI) grant from the State Council for Higher Education of Virginia (SCHEV).

18 of 49

Our study

  • Students collaborate with AI on an assignment
  • Students then develop a rubric to assess their contribution to the AI-assisted work
  • A pre- and post-assignment survey is conducted to capture students’ experience with AI-enhanced work and their perspectives on how that work should be assessed
  • Consenting students have their work and survey data collected and analyzed following the submission of course grades

19 of 49

HUMAN

HUMAN+

PART 1: Assessing the human within the collaboration

20 of 49

HUMAN

HUMAN+

PART 2: Assessing the collaboration as collaboration

21 of 49

1: The “human-alone”

What students said they contributed, and how they would want that work evaluated

22 of 49

Grade the assignment as usual

“I think it could be assessed based on the quality of the work since one must plug in a detailed prompt for a well written and completed answer.”

“I think that it should be assessed the same, because I think that you do need a genuine understanding of the material and topics to ensure that AI is not giving you incorrect answers.”

“[Y]es it is written by AI but I think if you use AI to do vast research and incorporate it into assignments, it will look very different from an assignment that is simply just done by AI by giving it the prompt.”

“Based on the output, the same as any other assignment. The better you use [AI], the better the assignment.”

23 of 49

“Input = output”

“I think it could be assessed based on the quality of the work since one must plug in a detailed prompt for a well written and completed answer.”

“I think that it should be assessed the same, because I think that you do need a genuine understanding of the material and topics to ensure that AI is not giving you incorrect answers.”

“[Y]es it is written by AI but I think if you use AI to do vast research and incorporate it into assignments, it will look very different from an assignment that is simply just done by AI by giving it the prompt.”

“Based on the output, the same as any other assignment. The better you use [AI], the better the assignment.”

24 of 49

Grade for humanness

“It should pass the ‘eye test’ - shouldn't look like it was written with AI. Beyond that, I think standard expectations are enough. If an AI-assisted assignment is good enough to earn an A without the reader knowing about the AI assistance, it should be good enough regardless.”

“I would grade it on how non-AI the assignment sounds. That is, people don't want to read something or look at something that they think was made by a chatbot. So, I'd say that the business skill here is passing off AI's work as that of a human.”

25 of 49

Compare student work to AI benchmarks

“I think it could be worth seeing the difference between a student written letter and an ai letter and judge which one is better.”

“I think the assignment should be compared to an all AI written assignment to understand the contribution of the human writer.”

26 of 49

Compare student work to AI benchmarks

“I think it could be worth seeing the difference between a student written letter and an ai letter and judge which one is better.”

“I think the assignment should be compared to an all AI written assignment to understand the contribution of the human writer.”

“Rather than banning AI, let’s just ban all C work.” (p.151)

27 of 49

Target the human advantage

“I think it should be assessed based on the quality of the ideas and how much they make sense. I don't think it should be assessed on the writing style or punctuation because AI does a good job in that, (sometimes it sounds robotic) but usually the grammar is correct.”

“I think it should be assessed largely based on effort and not knowledge or content, because those largely come from the AI.”

28 of 49

Target the human advantage

“I think it should be assessed based on the strength of the argument, and analysis of the human interactions in the call. Some things AI cannot detect, and we as humans can, such as the tone of the call, certain phrases used, all this.”

“It should be assessed based on clarity and conciseness of the argument, as AI tends to seek compromise instead of choosing a side.”

“I think it should be assessed based on the quality of the ideas and how much they make sense. I don't think it should be assessed on the writing style or punctuation because AI does a good job in that, (sometimes it sounds robotic) but usually the grammar is correct.”

“I think it should be assessed largely based on effort and not knowledge or content, because those largely come from the AI.”

29 of 49

Input = output

“In normal assignments, a grade is a reflection of correctness, skill, effort, and time put in, weighted in that order. For an AI based assignment, I would argue it is just a measure of time put in, and even that is not measured accurately.”

Input ≠ output

30 of 49

Grade the process

“I think you should judge the process not the outcome. If someone is terrific at writing memos but horrible at using AI, they could have the same output as someone who is terrible at memo writing yet brilliant at using AI. The process should be included in the grading.”

“I think it should be assessed purely on what the student can control. I wouldn't grade too heavily on what the AI outputs, but more on how the student is interacting with it.”

“Because not everyone will have the optimum version of AI, such as ChatGPT 4.0, the generated answers may give a disadvantage to those not having access to those premium versions. The best assessment would be on how the student prompts the AI to generate answers.”

31 of 49

Or take the AI out of it

“I think it should be assessed solely on the content that is not utilizing AI…“on the quality of work presented, organization, and clear difference in effort from human and ai intelligence”…“I think that what should be assessed is what the student produces, not what AI produces. The student must have the ideas and the AI just helps bring it to life.” “I think that anything created by the generative AI should be completely and wholeheartedly separate. I think the idea of having reflections as well as documented interactions with AI does not devalue the work, and relying on the honor system in the other, non-AI, portions of the assignment is effective in assessment”… “I feel that it should be kept separate while being assessed”“The work of the human should be separated from AI. AI should handle the less important details and humans should deal with the deeper analysis”… “Although I have never completed an assignment that was designed to be completed jointly with generative AI, I think that these assignments should be grade[d] more-so on portion of the task where the work is produced by the individual rather than by a computer.” “ I think it should be assessed based on what I produce, not necessarily what AI produces.”

32 of 49

…then separate themselves from AI for assessment purposes

We’re asking students to collaborate with AI on an activity

?

33 of 49

2: The “human-plus”

Suggestions toward a collaborative grading paradigm

34 of 49

HUMAN

HUMAN+

AI

AI+HUMAN

Serina Chang et al. (Microsoft Research, UC Berkeley), 2025

“ChatBench: From Static Benchmarks to Human-AI Evaluation”

35 of 49

36 of 49

37 of 49

“I think the goal of our memo was to effectively make a recommendation for a stock. Regardless of whether AI did a lot of the heavy lifting or a human did, as long as the memo communicates the goal effectively, it should be graded the same.”

“The goal of this class is to be able to communicate as efficiently as possible, if AI takes it to the next level, then that should be acknowledged as an advantage over not using it.”

“The more that a students ‘challenges’ an AI tool to produce the ‘best’ or most ‘sophisticated’ output should be reflected in the grade they receive.”

“[Assessment should be based on] Thoughtfulness and obvious openness to the assignment (AI is still unpredictable)”

“Based on willingness to fully collaborate with AI”

“There was no clear separation of what was AI and what was human.”

38 of 49

“I think the goal of our memo was to effectively make a recommendation for a stock. Regardless of whether AI did a lot of the heavy lifting or a human did, as long as the memo communicates the goal effectively, it should be graded the same.”

The goal of this class is to be able to communicate as efficiently as possible, if AI takes it to the next level, then that should be acknowledged as an advantage over not using it.”

“The more that a students ‘challenges’ an AI tool to produce the ‘best’ or most ‘sophisticated’ output should be reflected in the grade they receive.”

“[Assessment should be based on] Thoughtfulness and obvious openness to the assignment (AI is still unpredictable)”

“Based on willingness to fully collaborate with AI”

“There was no clear separation of what was AI and what was human.”

39 of 49

How to assess human-AI collaboration (working recommendations)

  1. Assess the collaboration as a collaboration
  2. Have a clearly defined objective and assignment parameters
  3. Make it challenging: the assignment should be achievable by the human + AI, not the human OR AI

40 of 49

3: “Do Something Impossible with AI”

41 of 49

Praxis 3

Persuasion Impossible

42 of 49

Choose a persuasive task that feels unusually difficult or impossible.

Then work with AI to achieve the task.

The Challenge

43 of 49

Scaffolding

  • Multiple formative and summative assessments
  • Grounded, experiential learning
  • Multiple guided AI collaborations (role-play, interview simulations, debate with AI, challenge AI to change your mind…)

Students know their strengths and weaknesses

They know when they’ve been successful

They have a large AI toolbox

44 of 49

Part One: Initial Proposal (200-400 words)

  • Identify your task and explain how it will stretch or exceed your persuasive abilities.
  • Describe how you and AI will work together to achieve the task:
    • What unique strengths do you bring to the challenge?
    • What unique strengths does (or might) AI bring to the challenge?
    • What do you imagine you and AI will achieve in collaboration that you could not do alone?
  • How will you know you succeeded? List 3-5 specific quantitative and/or quantitative benchmarks.

45 of 49

Part Two: Document Your Process

  • Log at least five of your interactions with AI using the provided “Activity Log” template
  • Entries should include
    • The date of the interaction
    • The AI tool used
    • The purpose and goal of the interaction, e.g. “Soundboard initial ideas” or “Get feedback from a skeptical investor”
    • Highlights from the interaction (best prompts and outputs)
    • Key lessons and takeaways

46 of 49

Part Three: Presentation Showcase

  • Prepare a 6-min presentation for the class
    • Context (1 min): Tell us whom you’re trying to persuade, what you’re persuading them about, and why it seemed impossible
    • Persuasion (3 mins): Present your work, with the aim either of persuading the class OR demonstrating how you persuaded your intended audience
    • Working with AI (1-2 mins): Demonstrate your AI collaboration process, including key breakthroughs and takeaways
    • Note: All presentation formats are acceptable: speech, video, slides, conversation role-play, poster, podcast, etc. Be creative!
  • Judging criteria
    • Impossibility: How convincingly hard was the task for a human alone?
    • AI Collaboration: How creatively and innovatively did you use AI?
    • Top scorers win a prize!

47 of 49

Part Four: Rubric Design

  • Imagine you’re developing a rubric to assess future student-AI collaborations in COMM 4644. What would be important to measure and how would you measure it?
  • Use the following questions to guide your rubric design. Include your answers in the project document.
    1. What did the AI contribute to the process? In what ways, if any, did it enhance or accelerate your work?
    2. What did you contribute to the process? What aspects of your labor do you want spotlighted?
    3. What do you think are the hallmarks of a top-notch human-AI collaboration as opposed to a mediocre one?
    4. In an era of AI-augmented work, how do you measure the value added by humans?
  • Drawing from your responses to these questions and the record of your interactions with AI, identify and describe 5-7 evaluation criteria and add them to the provided rubric template.

48 of 49

The project will be graded on:

Task Ambition

Centers a genuinely difficult challenge, shows clear vision of enhanced outcomes, and deploys AI strategically to help you go further than ever before as a persuader.

AI Collaboration

Demonstrates sophisticated use of AI, clearly documenting your learning, key takeaways, and breakthrough moments.

Persuasive Impact

Polished and professional in-class presentation within time limits. Convincingly demonstrates “impossible” achievement and innovative use of AI.

Meta-Analysis

Clear insights into human-AI dynamics in persuasion, combined with deep awareness of persuasion as a discipline. Rubric contains specific, measurable criteria for evaluating your human contribution to the project.

10% of your course grade

=

Project assessment

The rubric you’ve created!

49 of 49

Old tensions

New urgency

opportunities