1 of 46

Practical, Explainable, and Actionable AI for Secure Software Dev.

CI/CD

CI Build Failure�Prediction [FSE’24]

Coding

PyCoder: Code Completion [IST’24]

Testing

Code to test case generation

Text to Test case generation�Fuzzing [ICSE’23, ICSE’24]

Code Review

Reviewer Recommendation [SANER’15]

Review Localization [SANER’22]

Review Comment Generation [FSE’22]

Bug Reports

Bug Localization, Bug Triage [ISSRE’13]

Defects

Defect Prediction�[ICSE’15, ICSE’16, TSE’15,’16,’17,’19]

Agile

GPT2SP: Story Point Estimation [TSE’22]�Augmented Agile [IEEE SW]

Vulnerability

Make AI4SE More Practical, Explainable, and Actionable!

XAI4SE Book/Tutorial/Papers [ASE’21, MSR’21, TSE’22]

Locate vulnerability/defective lines [MSR’21, MSR’22]

Explain vulnerability/defects [ASE’21, TSE’23, TOSEM’24]

Repair the vulnerability/defects [FSE’22, ICSE’22]

AIBugHunter in VSCode [EMSE’23]

AI for SE

Dr. Kla Tantithamthavorn

Chief Investigator

Responsible AI in MLOps/LLMOps

SE for AI

Fairness,biases,ethical issues in AI software [EMSE’24, TOSEM’24]

The Risks of Using ChatGPT and LLMs for SE Tasks [2xTOSEM’24]

FIT Collab

2 of 46

Explainable AI

Dr. Kla Tantithamthavorn

Senior Lecturer in Software Engineering

Monash University, Australia

http://chakkrit.com @klainfo

Software Engineering in the Age of Generative AI, Yet Not Explainable!

3 of 46

Dr. Kla Tantithamthavorn

  • Expertise in Explainable AI, Software Engineering
  • Co-authored the first online book on Explainable AI for Software Engineering (http://xai4se.github.io), attracting over 20,000 page views from 83 countries worldwide
  • Co-edited an IEEE Software Special Issue on XAI for SE
  • Awarded ARC DECRA Fellowship, JSPS Fellowship, ACM SIGSOFT Distinguished Paper, Distinguished Reviewer
  • Received strong media attention from Gizmodo, Australian Cyber Security Magazine, TechXplore, Cybersecurity Connect, Australian Computer Society, etc

Learn more http://chakkrit.com/

4 of 46

  • This is not a comprehensive introduction to Explainable AI theories or algorithms.
  • This is not an exhaustive survey of XAI, since there is a massive body of Explainable AI Research (2,000+ papers) in many venues (AI, ML, HCI, Social Science, and Software Engineering).
  • The ultimate goal of my research is to make AI-powered software development tools more explainable and actionable for practitioners, leading to worldwide adoption in practice
  • This talk aims to:
    • Motivate the Importance of Explainable for SE
    • Present Recent Advances in Explainable AI for Software Engineering
    • Discuss SE in the Age of Generative AI, including Opportunities, Challenges, and Roads Ahead Toward Using Generative AI Responsibly for Software Engineering

Disclaimer

5 of 46

Traditionally, AI is Used to Support Decision-Makings

“Actionable Analytics: Stop Telling Me What It Is; Please Tell Me What To Do.” Tantithamthavorn, Chakkrit et al. IEEE Softw., 2021�"A survey on deep learning for software engineering." Yang, Yanming, et al. ACM Computing Surveys (CSUR) 54.10s (2022): 1-73.

6 of 46

Predict Defects, So What? Tell Me Why?

7 of 46

Explainable AI for Defect Prediction (PyExplainer)

8 of 46

Explainable AI for Defect Prediction (PyExplainer)

“To build a local model to approximate the behaviours of the global model”

9 of 46

Explainable AI for Defect Prediction (PyExplainer)

What-if we change this, would it reverse the prediction of defect models?

10 of 46

Explainable AI for Agile Story Point Estimation (GPT2SP)

A GPT-2 pre-trained language model to estimate story points with explanations.

GPT2SP: A Transformer-Based Agile Story Point Estimation Approach Michael Fu, and Chakkrit Tantithamthavorn IEEE Trans. Software Eng., 2023

What are the best supporting examples that have the same word and the same story point from the same project?

11 of 46

AIBugHunter: A Practical Tool for Predicting, Classifying and Repairing Software Vulnerabilities, Michael Fu, Chakkrit Tantithamthavorn, Trung Le, Yuki Kume, Van Nguyen, Dinh Phung, John Grundy, Empirical Software Engineering (2023).

Explainable AI for Cybersecurity (AIBugHunter)

Problem: Cyber criminals continue to find new methods of attack, e.g., stealing your credit card information and password, asking to pay money, damaging business’s reputation and customer’s trusts.

Solution: AIBugHunter is an AI-powered approach that is learned from millions of software projects to understand the patterns of vulnerabilities so it can automatically detect, locate, explain, and suggest corrections in real-time.

12 of 46

Decades of Explainability in SE

13 of 46

Early Days of Explainability in SE (2000-2015)

AI4SE Booming Age (2015-2020)

14 of 46

40 Years of Defect Prediction Studies, Does it Help Any?

15 of 46

Practitioners’ Needs(User Survey)

Researchers’ Focuses�(Literature Review)

16 of 46

The REAL NEEDS of Explainable AI in SE

17 of 46

http://xai4se.github.io

18 of 46

Explainable AI in a Nutshell

Data

Black box AI

AI Product

Data

Explainable AI

Explainable AI product

Decision, recommendation

Decision & explanation

Feedback

Confusion with AI Black box�Why did you do that?

Why did you not do that?

Why do you succeed or fail?

How do I correct an error?

Clear & transparent predictions

I understand why

I understand why not

I know why you succeed or fail

I understand, so I trust you

The Explainable AI (XAI) aims to create a suite of AI/ML techniques that: (David Gunning, 2016)

  • Produce more explainable models, while maintaining a high level of prediction accuracy; and
  • Enable human users to understand and build an appropriate trust to the predictions

19 of 46

Two Types + Scopes of Explainable AI

Interpretable AI

  • Using models that are inherently interpretable, �e.g., small decision trees or linear models
  • Generate a global explanation

Explainable AI

  • Applying a model-agnostic method to explain the black-box model (post-hoc)
  • Generate a local explanation for each individual prediction

20 of 46

There Are Many Explainable AI Toolkits

21 of 46

Different Explainable AI Techniques = Different Explanations

22 of 46

Different Stakeholders Need Different Explanations

23 of 46

To make AI more explainable, we need to:

  • (Step 1) Domain Analysis to understand the AI problem, contexts, stakeholders
    • Stakeholders: Who do we want to explain? e.g., developers
  • (Step 2) Requirement Elicitation to understand practitioners’ needs
    • Goals: What is their purpose? e.g., gain deeper insights
    • Questions: What do we want to explain? e.g., why a file/commit is predicted as defective?
  • (Step 3) Multimodal explanation design
    • Scopes: Global Level, Local Level (Instance to be explained)
    • AI Models: What kind of AI models trying to explain? e.g., classification, regression, NLP, etc.
    • Forms: Variable Importance, Rule, Integrated Gradients, Example-based, Attention, Heatmap
    • XAI Techniques: LIME, LORE, SHAP, Anchors, PDP, DICE, Surrogate.

Human-Centric SE approaches must be used to design explanations to most suit practitioners’ needs

24 of 46

25 of 46

AI Has Emerged as a Powerful Tool for Software Companies

Syntax-Aware On-the-Fly Code Completion, Wannita Takerngsaksiri, Chakkrit Tantithamthavorn, Yuan-Fang Li, Under Review at IEEE Transactions on Soft. Eng. (2023)

26 of 46

Generative AI (ChatGPT, LLMs, etc.)

  • Since its introduction in November 2022, ChatGPT has rapidly gained popularity due to its remarkable ability in language understanding and human-like responses
  • Generative AI is a type of artificial intelligence that is capable of creating original and unique content, such as images, videos, music, or text. It works by learning patterns and styles from existing data and generating new content.
  • ChatGPT, based on GPT-3.5 architecture, has shown great promise for revolutionizing various research fields, including code generation, testing, and bug fixing, improving efficiency, enhancing creativity, and reducing costs.

27 of 46

What Software Engineering Tasks that ChatGPT Can Help?

28 of 46

ChatGPT for Software Planning (Generate Business Requirements)

29 of 46

ChatGPT for Software Planning (Generate User Stories)

30 of 46

ChatGPT for Software Design

31 of 46

ChatGPT for Software Design (Generate a Class Diagram)

32 of 46

ChatGPT for Software Design (Generate a State Diagram)

33 of 46

ChatGPT for Coding

34 of 46

ChatGPT for Coding

35 of 46

ChatGPT for Software Testing

36 of 46

ChatGPT for Software Testing (Generate acceptance test cases)

37 of 46

ChatGPT for Software Testing (Generate test cases)

38 of 46

ChatGPT Is Dumber Than You Think

Here is the fundamental problem with ChatGPT: it can provide answers and information that no one ever knows for sure is true because it is not referenceable.

39 of 46

Challenge 1: ChatGPT Can’t Generate High-Quality Code

We analyzed 4,066 ChatGPT- generated code implemented in two popular programming languages, i.e., Java and Python, for 2,033 LeetCode’s programming tasks.

Key Findings:

  • Code quality issues commonly happen in both code that pass or failed test cases, highlighting the need for characterizing and addressing these concerns alongside the functional correctness.
  • Issues in ChatGPT-generated code can be categorized into four categories: Compilation & Runtime Errors, Wrong Outputs, Code Style & Maintainability, Performance & Efficiency
  • Wrong Outputs and Code Style & Maintainability issues are the most common challenges faced by the ChatGPT-generated code, while Compilation & Runtime Errors and Performance & Efficiency issues are less prevalent.

Liu, Yue, et al. "Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues." Under Review at TOSEM, 2023.

40 of 46

Challenge 2: ChatGPT Can’t Generate Secure Code

Context: this program receives an email address as input, and passes it to a program (as a parameter) through a shell.

Problem: Handling input in this manner allows a malicious adversary to execute arbitrary code by appending shell instructions to a fictitious email. �CWE: Arbitrary code execution (CWE-94)

First Attempt

Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions, IEEE S&P 2021

41 of 46

Challenge 2: ChatGPT Can’t Generate Secure Code

Context: this program receives an email address as input, and passes it to a program (as a parameter) through a shell.

Problem: Handling input in this manner allows a malicious adversary to execute arbitrary code by appending shell instructions to a fictitious email. �CWE: Arbitrary code execution (CWE-94)

Second Attempt

42 of 46

Challenge 2: ChatGPT Can’t Generate Secure Code

Context: this program receives an email address as input, and passes it to a program (as a parameter) through a shell.

Problem: Handling input in this manner allows a malicious adversary to execute arbitrary code by appending shell instructions to a fictitious email. �CWE: Arbitrary code execution (CWE-94)

Last Attempt, Otherwise I will give up!

43 of 46

Challenge 2: ChatGPT Can’t Generate Secure Code

Context: this program receives an email address as input, and passes it to a program (as a parameter) through a shell.

Problem: Handling input in this manner allows a malicious adversary to execute arbitrary code by appending shell instructions to a fictitious email. �CWE: Arbitrary code execution (CWE-94)

With human intervention!

44 of 46

Challenge 3: ChatGPT Can’t Explain the Answers

45 of 46

Take-Away Messages:

  • Generative AI will transform SE.
  • However, it can’t generate high-quality, secure, explainable code.
  • Human-Centric Explainable AI and Quality Assurance techniques are needed.

Learn more http://chakkrit.com/

46 of 46

Designing an Explanation

  • What are the questions for explaining Generative AI? (Explain a generation != Explain a code)
  • Can we generate different explanations for different stakeholders?
  • What is the best form of explanations for SE tasks that are most understandable by software practitioners?

Developing and Evaluating XAI4SE techniques

  • How to explain LLMs and Generative AI?
  • How to evaluate such explanations?
  • What are the benefits of such explanations on real-world practices?

Learn more http://chakkrit.com/