1 of 12

FOUNDATIONS OF ARTIFICIAL INTELLIGENCE

Towards a working understanding of AI and its relationship to evaluation

AN OIL PAINTING BY MATISSE OF A HUMANOID ROBOT PLAYING CHESS

Generated by AI using DALL-E

2 of 12

Machine Learning�Systems

Data

Pattern detection

Extensible Model

At the core of AI:

3 of 12

How Machine Learning Systems are Trained

Machine learning systems may be supervised or unsupervised

Models can only be as strong as the data on which they are trained

The patterns these models detect and use are not always accessible to humans

4 of 12

5 of 12

6 of 12

Random Forests: �an example

https://datahacker.rs/012-machine-learning-introduction-to-random-forest/

7 of 12

Example ML System Types

Natural language processing

  • Used to convert human language into machine-readable signals

Neural networks/deep learning

  • Approximates ”cognition” by looking for patterns and yielding outputs that match those patterns
  • Neural networks and humans may come to the same conclusions but using different rationales

Generative AI

  • Creates novel content in response to a prompt

ANGLE VIEW OF CIRCUIT SHAPED LIKE A BRAIN

Recommended by AI based on text in this slide

8 of 12

AI AS BOTH SUBJECT AND OBJECT IN EVALUATION

Potential uses for AI in an evaluation space

vs

Evaluating work produced by AI

AN ABSTRACT PAINTING OF ARTIFICIAL INTELLIGENCE

Generated by AI using DALL-E

9 of 12

Evaluation using AI

Non-generative AI

  • Summarization
      • https://quillbot.com/summarize
  • Image analysis
      • https://cloud.google.com/vision
  • Text analysis

Generative AI

  • Code generation and translation
  • Annotated bibliography and background narrative

COMPUTER SCRIPT ON A SCREEN

Recommended by AI based on text in this slide

10 of 12

Evaluation of AI

Qualitative

    • Explanation Goodness
    • User satisfaction
    • User Curiosity/Attention Engagement
    • User Trust/Reliance
    • User understanding
    • Productivity/Productivity of use
    • System Controllability/Interactivity

Mixed Methods

    • User trust
    • User preference
    • User confidence
    • Level of disagreement

    • Galvanic Skin Response
    • Blood Volume Pulse
    • Number of operations depending on model size
    • Interaction strength

Quantitative

    • D, performance difference between models and XAI method execution;
    • R, the number of rules in the model explanation (rule based);
    • F, the number of features used to construct the explanation;
    • S, the stability of the explanation of models;

Hsiao, J.H.-W.; Ngai, H.H.T.; Qiu, L.; Yang, Y.; Cao, C.C. Roadmap of Designing Cognitive Metrics for Explainable Artificial Intelligence (XAI). arXiv 2021, arXiv:2108.01737

Lin, Y.-S.; Lee, W.-C.; Berkay Celik, Z. What Do You See? Evaluation of Explainable Artificial Intelligence (XAI) Interpretability through Neural Backdoors. arXiv 2020, arXiv:2009.10639

Rosenfeld, A. Better Metrics for Evaluating Explainable Artificial Intelligence. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, Virtual Event, UK, 3–7 May 2021

Measures in this field are emergent and consistent metrics for evaluation are yet to crystallize

11 of 12

Limitations to consider

  • Algorithmic discrimination
  • Validity and reliability concerns
  • Intellectual property
  • Labor concerns
  • Barriers to access

12 of 12

A Meta Conclusion:

What AI thinks evaluators need to know about AI