1 of 16

1

The Impact of Student-AI Collaborative Feedback Generation on Learning Outcomes

Anjali Singh

PhD Candidate

School of Information

Christopher Brooks

Associate Professor

School of Information

Xu Wang

Assistant Professor

Computer Science and Engineering

2 of 16

Background: Hint Generation at Scale

2

Peer Feedback

+ Effective form of active learning

- Writing good hints is challenging for students

AI for Generating Hints

+ Helpful for scaling and automation

- Can make learners over-reliant on AI support

- Limited success in generating good hints in complex domains

As we know, formative feedback plays an important role in learning. But in rapidly growing domains like data science education, teachers have limited capacity to provide individualized feedback to learners such as in the form of hints when they are struggling to learn a concept.

Now there are two popular approaches that have been explored for tackling this:

**CLICK**

The first is Peer feedback, where students provide feedback to each other by evaluating each others’ work, and this also serves an effective form of active learning for students.

A challenge in this approach is that writing good feedback is not easy. For instance, to write good hints, a student first needs to identify the mistakes and then provide guidance to fix those mistakes without giving away the full solution. So this can be challenging for students.

**CLICK**

Another popular approach is to use AI to generate feedback. Recently, some researchers have explored the use of LLMs for this task.

And while some of these studies had promising results, others found limited success in generating accurate or pedagogically valuable feedback. Another potential issue with using AI is that there is a risk of students becoming over reliant on AI support.

--------

3 of 16

Proposed Solution: Student-AI Collaboration

Singh et al. 2024

Students revised GPT-4 generated hints for given incorrect solutions

Findings:

+ Students who revised GPT-4 hints wrote better phrased and more specific hints

- Students got biased by GPT-4 hint; Low accuracy GPT-4 hints led to low accuracy student generated hints

�Implication: ��Providing AI-generated hints to students after they have attempted the task independently for a ‘second opinion’

3

Solution B:��<Incorrect code solution>

Singh, A., Brooks, C., Wang, X., Li, W., Kim, J., & Pandey, D. (2023). Bridging Learnersourcing and AI: Exploring the Dynamics of Student-AI Collaborative Feedback Generation. arXiv preprint arXiv:2311.12148.

4 of 16

A Three-arm Experiment

4

AI-assistance

Students can get immediate assistance from GPT-4 during hint-writing

Research Question: Impact of hint-writing activity design on students’ learning outcomes?

AI-revision

Students first write a hint, then see GPT-4 hint and then rewrite the hint��

No AI-support

Students write hint without any AI support

Level of AI support

So we conducted another study in an online introductory data science course at the University of Michigan, where each student was randomly assigned to one of the three experimental conditions

**CLICK**

These conditions varied in the level of AI support provided to students in the hint writing activity, as we wanted to understand how we can use AI to support students, while ensuring that they remain engaged in the learning process.

**CLICK**

In the first condition, students wrote a hint on their own without any AI support. We expected these students to have the best learning outcomes.

**CLICK*

The second condition was AI-revision, where students first wrote a hint on their own, then they read the GPT hint and finally, they were asked to rewrite their hint. We expected this condition to produce the highest quality hints as students would be less likely to get biased by the GPT hint.

**CLICK**

The third condition was AI assistance, where students could ask for immediate assistance by clicking on a button to display the GPT hint.

This mimics the design typically followed by popular AI assistants like Grammarly’s writing assistant. And, we expected this to the most time efficient condition, but less likely to help students learn deeply, or produce high quality hints.

Our goal is to see which of these three designs leads to the highest quality hints and the best learning outcomes, and in today’s talk I will focus on the latter, that is, the impact on student learning.

5 of 16

Hint Writing Activity

5

.�.

.

The hint writing activity was designed in a way that gave students the opportunity to reflect on their own work. Hint writing involves Reflection and self-explanation which have been found to be helpful for improve students’ understanding and retention of concepts.

�So, we asked students to write a hint to an incorrect solution A to a programming assignment that they had recently solved, by comparing it to their own correct solution B. In case a student’s programming assignment solution was incorrect, we asked them to compare the solution A to the instructor solution.

Since data science programming problems can usually be solved in several ways, we wanted to ensure that the task of comparing solutions A and B is not too cognitively demanding for the students. So we selected Solution A from incorrect submissions made by a previous cohort of students, using a similarity matching algorithm, such that it was the most similar in its approach to the solution B.

�

6 of 16

Study Outline

6

Week 1

Pre-test consisting of 10 MCQs (single correct answer) on basic python programming

Week 2

Hint-writing assignment based on programming assignment for week-2

Week 3

Hint-writing assignment based on programming assignment for week-3

Week 4

Post-test consisting of 6 MCQs (2 with single correct answer and 4 with more than one correct answer) based on course concepts and debugging skills

7 of 16

Prompting GPT-4

Write a brief hint in less than 100 words on the given incorrect solution for the given assignment. The hint should be written keeping in mind that the hint receiver is a novice data science learner with only introductory Python programming and statistics knowledge. The hint should have the following qualities: �- It should help the student who wrote the incorrect solution identify their mistakes and fix them, without giving away the full solution to them. �- It should be specific, i.e., it should provide information about how and where the code does or does not meet the assignment goals. �- It should not refer to the correct solution provided below ��Start your response with “Hint:” and highlight keywords, variable names, messages, line numbers and error names in bold. Do not write any text in addition to the hint. �———————————————————————— �<Programming Assignment Problem Statement>�� Correct Solution: <Correct Code> ��Incorrect Solution: <Incorrect Code>

7

8 of 16

Impact of Hint-writing Designs on Learning Outcomes

8

9 of 16

Study Results

Selected total 55 after propensity score matching to get students from each group with similar pre-test scores.
AI-assistance group had lowest mean post-test score
Difference in post-test scores between conditions was not statistically significant (p = 0.18),

9

Level of AI support

10 of 16

Implications

Students can possibly learn more when prompted to first think of the solution on their own before seeking AI-based assistance. �
Need for research on designs that promote active student engagement with AI tools

10

11 of 16

Limitations

Small sample size
Study conducted in a single course at a single institution

11

12 of 16

Next Steps

12

Creating a model for evaluating the quality of hints generated by LLMs or humans based on accuracy and pedagogical attributes (e.g., specificity, phrasing)

Using the high-quality hints to improve LLM based hint generation models using chain of thought prompting and fine-tuning.

13 of 16

Concluding Thoughts

What is the right amount of AI support in a given educational context?
How can we design educational AI tools that encourage active student participation?

13

Level of AI support

AI-assistance

AI-revision

No AI-support

14 of 16

�Anjali Singh�singhanj@umich.edu� @singhanj13

Thank

You ☺

14

�Co-authors:

Christopher Brooks

Xu Wang

15 of 16

Data Collected

Per student, we collected the following data:

Time spent on worked example and hint writing activity
Incorrect code shown to them
Correct code shown to them
Hint written by them
Revised hint written by them (for students in AI-revision condition)
GPT-4 hint shown to them (if not in the Baseline condition)
Survey responses (qualitative and Likert scale ratings)

Hint-writing activity experience
Perceived learning benefits

15

16 of 16

Hint-writing System Design

16

Finish and submit programming assignment

Start hint-writing assignment

All final submissions

Incorrect submissions from previous cohort

Go through worked example

Compare I and C;�Identify the mistakes in I

Assign randomly to one of the 3 conditions

C: Fetch learner’s final submission if it is correct, else fetch instructor solution

I: Fetch incorrect code that is most similar to C

AI-assistance��GPT-4 hint for I available on demand by clicking “Show ChatGPT hint” button

AI-revision�

Write hint with no support
Read GPT-4 hint
Rewrite hint

Baseline

Write hint with no additional available support

Submit final hint

Importantly, the hint writing assignment was personalized for each student. ��It involved comparing an incorrect solution to a problem that from the latest programming assignment to students own solution, if that was correct, and this was the case for majority of the students as they were allowed unlimited attempts at the programming assignment, In case a student’s solution was incorrect, the instructor solution was used as the correct solution. As for the incorrect solution, it was selected from incorrect submissions made by a previous cohort of learners. We selected the incorrect solution that was most similar in its approach to the correct solution that the student would be comparing it to.

The rationale behind doing this is that these data science programming problems could be solved in many ways. To ensure that the task is not too cognitively demanding, we wanted students to compare their correct solution to an incorrect solution that used a similar approach as theirs.

------Fix this slide------

Add a simple graphic showing incorrect soln being compared to correct soln. Call them solution A and B and refer to them as that

�