1 of 16

Explainable AI in the Loop: An Instructor-Transformer

Collaboration for Improving Explainability and Reliability

of Feedback in Introductory Programming Classrooms

Muntasir Hoq, Bradford Mott, Seung Lee, Jessica Vandenberg, Narges Norouzi, James Lester, Bita Akram

9th CSEDM Workshop at EDM 2025

2 of 16

Introduction

Adaptive and timely feedback in CS education is imperative for effective active learning [McConnell 1996, Denny et al. 2024, Tang et al. 2024].
Recent advancements in LLMs demonstrated their potential in feedback generation. [Phung et al. 2023, Jia et al. 2024, Xu et al. 2024]
Challenges related to reliable and explainable feedback generation still persist. [Phung et al. 2023, Jacobs et al. 2024, Jia et al. 2024]

3 of 16

Our Approach

Reliable and explainable feedback system.
Collaboration of Instructor-LLM-Explainable AI model.
Providing instructor-verified programming feedback based on key instructional points.

Instructor

Explainable AI

LLM

INSIGHT

4 of 16

INSIGHT Classroom Assistant System

An intelligent classroom assistant to support scalable, reliable and explainable feedback delivery system.

Instructor App

Instructor dashboard enables real-time distribution of exercises categorized by topics.
Authoring tool enables instructors to design new exercises collaborating with LLMs.

Student App

Enables real-time classroom engagement of students.
Allows students to submit assignments and get feedback.

5 of 16

Instructor Authoring Tool

6 of 16

Instructor Authoring Tool

7 of 16

Instructor Authoring Tool

9 of 16

Methodology

Instructor designs a new problem using the authoring tool and identifies important and common misconceptions in example solutions with associated feedback.
Pretrained SANN model fine-tuned on synthetic data and subtrees with logical errors extracted for incorrect code.
Instructor-verified feedback propagated when student code subtree with logical error matches with instructor provided code subtree, including an LLM verification layer.

10 of 16

Subtree-based Attention Neural Network (SANN) [Hoq. et al. 2023]

Trained on code correctness prediction.
Subtrees with higher attention contributes to model predicting a code as incorrect and contain logical errors.
[Hoq et al. 2025]

11 of 16

Feedback Propagation Pipeline

12 of 16

Code Correctness Prediction on Unseen Problem

Model	Accuracy	Precision	Recall	F1-score
Non-pretrained SANN	0.83	0.81	0.76	0.78
Mixed Fine-tuned SANN (1% pre training data+synthetic data)	0.84	0.83	0.77	0.80
Fine-tuned SANN (Synthetic Only)	0.89	0.89	0.84	0.86

13 of 16

Feedback Selection Performance

Correct Feedback: 56.8%

Correlates highly with instructor provided errors and feedback.

Incorrect Feedback: 28.4%

Includes mostly student misinterpretation or outside of instructor-authored example coverage.

No Feedback: 14.8%

Similar as incorrect feedback category.

14 of 16

Contribution

INSIGHT classroom assistant to promote active learning in CS classrooms.
Automated feedback system with instructor-LLM-explainable AI collaboration.
Step toward a reliable, explainable and instructor-verified feedback propagation system.
This approach allows instructors to select their preferred balance between reliability and coverage.
Feedback propagation focusing on key instructional points identified by instructors.

15 of 16

Limitations

Future Work

High incorrect feedback propagation.
Handling multiple errors in one code.
Coverage of instructor-generated solutions and synthetic data.

Handling the coverage issue.
Handling multiple error detection and composite feedback generation.
Investigating sequential, step-by-step, and immediate feedback supporting cognitive load management [Anderson et al. 1989].
Conducting classroom studies to measure student learning gains.

1 of 16

2 of 16

3 of 16

4 of 16

5 of 16

6 of 16

7 of 16

8 of 16

9 of 16

10 of 16

11 of 16

12 of 16

13 of 16

14 of 16

15 of 16

16 of 16