1 of 18

Explainability Across

the AI Lifecycle

Engineering Trust, Reproducibility, and Accountability

in Development Measurement

Dr. Mohammed Ba-Aoum

Blue Cross Blue Sheild NC | NIH

Presented at MeasureDev 2026 | World Bank

This presentation addresses a problem that sits at the intersection of technical practice and institutional accountability: the gap between what AI systems do and what the people affected by them can understand, scrutinise, or contest.��The argument is straightforward. As machine learning models are embedded in consequential decisions — social protection targeting, health resource allocation, credit access, labour market screening — the question of whether those decisions can be explained becomes a governance question, not merely a technical one. Explainability is the mechanism through which AI systems become auditable, correctable, and ultimately trustworthy.��This is not a talk about making AI simpler. Many of the most effective models are deliberately complex. It is a talk about making AI accountable — and about the tools, frameworks, and institutional commitments that make accountability possible.

2 of 18

“

Acceptance of black-box models, if it ever did happen, would be a strange, technocratic coup in which modellers have gained the power to shape the assumptions, perceptions, and conclusions of decision-makers in a way the decision-makers themselves do not quite understand.

Meadows & Robinson, 1985 — written forty years before the current debate, still unresolved.

3 of 18

Dr. Mohammed Ba-Aoum · MeasureDev 2026 · 4/22

Roadmap of This Talk

Why Explainability Matters

Business · Regulation · Development Contexts

Key Thesis & Conceptual Spectrum

The Right Reason · Interpretability → Understanding

Three Levels of Explainability

Data · Model · Outcome

Techniques & XAI Lifecycle

LIME · SHAP · By-Design vs Post-Hoc

System-Level Explainability

From Model to System Accountability

LLMs & Open Science

New Frontier · Development Contexts

The talk is organised around six interconnected themes.

We begin with the normative case: why explainability matters in regulatory, ethical, and development contexts.

We then sharpen the conceptual vocabulary — the terms interpretability, explainability, and understanding are frequently conflated in ways that generate confusion in both technical and policy discussions.��From there, we move to a three-level analytical framework: explainability at the level of data, model, and decision. We then survey the primary technical instruments — LIME, SHAP, counterfactuals, and related methods — and consider how they fit into a continuous governance lifecycle

The final two sections address system-level accountability and the emerging role of large language models as explainability interfaces.��The is a work in progress work framework — a way of thinking about explainability decisions in context — rather than a catalogue of methods.

4 of 18

Dr. Mohammed Ba-Aoum · MeasureDev 2026 · World Bank · 2/9

The Core Problem & Key Thesis

The Problem

AI is embedded in healthcare, development policy, finance, and social systems, but understanding has not kept pace.

WHAT

Prediction

≠

WHY

Explanation

This gap creates a fundamental tension between performance and accountability , felt most acutely in development contexts where decisions affect vulnerable populations.

The Challenge

"Did the model predict correctly ,

or for the right reason?"

A model can achieve high accuracy and still:

⚖️ Produce biased outcomes against vulnerable groups

🔗 Rely on misleading patterns that break in the field

💥 Fail when context shifts , shortcut models collapse

🚫 Generate unjustifiable decisions to affected people

In development contexts, where data are sensitive, decisions affect vulnerable populations, and institutional trust is paramount, a correct prediction only partially solves the problem. We need to know the why.

The foundational tension in this talk is captured by the distinction between two questions: what will happen, and why will it happen?

Predictive models are optimised for the first question.
They learn statistical associations between inputs and outcomes, and they do so with considerable effectiveness across a range of domains.�But the first question alone is insufficient for high-stakes decision-making.
A model that accurately predicts household poverty may rely on features that are proxies for protected characteristics.
A model that flags disease risk may depend on spurious correlations that disappear when the context shifts.
A model that determines benefit eligibility may produce outcomes that cannot be explained to the people they affect.�In each case, accuracy without explanation is an incomplete basis for institutional action. The slide's central question — did the model predict correctly, or for the right reason — is not rhetorical. It is a diagnostic test that every deployment decision should pass.

5 of 18

Why Explainability Matters

01 ⚖️ Regulation & Governance

▸ EU AI Act mandates documentation, testing & transparency for high-risk systems.

▸ Development datasets (health records, poverty surveys) are subject to HIPAA, GDPR, and national data sovereignty laws.

▸ Org must generate auditable trails to demonstrate compliance

02 🎯 Fairness, Bias & Equity

▸ Models inherit historical biases. Explainability reveals proxy discrimination (location → income) before deployment.

▸ Decisions affect underrepresented populations with limited ability to contest outcomes, the right to explanation is a matter of equity.

▸ Fragile institutional trust: opaque AI systems actively erode the legitimacy needed for effective policy intervention.

03 📊 Trust, Adoption & ROI

▸ Transparent systems drive acceptance among field staff, beneficiaries, and partner governments

▸ Explainability enables faster debugging → reduces the cost of model failure in production environments with high data costs.

▸ ROI framing is more relevant for private sector AI — here, trust and mission alignment are the primary returns.

Explainability is now subject to formal legal requirements in multiple jurisdictions. The EU AI Act classifies certain systems as high-risk and mandates transparency, documentation, and human oversight as conditions of deployment. GDPR Articles 13-15 establish rights to meaningful information about automated decision-making. Sector-specific frameworks — in healthcare, credit, and public administration — add further obligations.��Beyond legal compliance, the fairness case is empirically grounded. Models trained on historical data encode historical disparities. Without systematic inspection of how features contribute to predictions, proxy discrimination — where variables such as postcode or employment type serve as de facto racial or socioeconomic filters — may go undetected until it has already caused harm.��The trust dimension is particularly significant for development institutions. Programmes that cannot explain their targeting criteria to field staff, community representatives, or beneficiaries will encounter legitimate resistance. Opacity undermines the political legitimacy that social programmes depend on, irrespective of technical performance.

6 of 18

The Full Spectrum: Three Distinct Concepts

INTERPRETABILITY

By Design (Glass-Box)

• Concerns the internal mechanics of the model, answering HOW it arrives at results.

• A model is interpretable if a human can consistently predict its output from its structure.

"interpretability is the degree to which a human can understand the cause of a decision" (Miller, 2019)

• Example: Linear regression — each coefficient directly quantifies feature influence on the prediction.

• Contrast: Deep Neural Networks are not inherently interpretable ("black-box") due to complex, non-linear transformations across thousands of layers.

EXPLAINABILITY

Post-Hoc Justification

• Applies techniques to black-box models AFTER training to gain insight into their inner workings.

• Global methods (Feature Importance): Which features matter across the entire dataset?

• Local methods (LIME, SHAP): Why did the model make this specific prediction for this instance?

• Critical for justifying individual high-stakes decisions to affected individuals or regulators.

UNDERSTANDING

Holistic AI Governance

• The holistic appreciation of a model's capabilities, limitations, and societal impact.

• Synthesizes interpretability + explainability with domain expertise, ethics, and business context.

- Does the model logic make sense?

- Is it fair?

- Does it meet strategic goals?

• Without this, an organization runs a complex system whose true value and harms remain opaque.

((Biran & Cotton, 2017; Miller, 2019)

The three concepts presented here describe different things and should be used precisely.
Interpretability is an intrinsic property of a model: the degree to which a human can trace the reasoning from inputs to outputs by examining the model's structure directly.
Linear regression and shallow decision trees are interpretable; deep neural networks are not.��Explainability refers to the post-hoc practice of applying analytical tools to a model to generate insight that the model does not produce by design. SHAP values, LIME approximations, and saliency maps are explainability techniques.
They provide useful information about model behaviour without revealing the model's internal mechanics in full.��Understanding is the broader epistemic goal: a synthesis of technical insight, domain knowledge, ethical evaluation, and institutional judgment. A practitioner may have full interpretability of a model and still lack understanding if they cannot evaluate whether the model's logic is appropriate for the context. All three levels are necessary; they are not substitutes for one another.

7 of 18

Three Goals of Interpretability

"Interpretability is not the destination — it is an instrument for better modeling, better accountability, and better learning."

01 IMPROVE

Debug, Validate & Enhance the Model

• Reveal shortcuts, data leakage, incorrect feature effects.

• Example: Feature importance showed snow (not animal features) drove wolf/dog classification. (Ribeiro et al., 2016)

• Workflow: Train → Inspect → Identify → Improve features → Retrain.

• Interpretability is quality assurance, not just explanation.

02 JUSTIFY

Explain to Stakeholders, Regulators & Affected Individuals

• Different stakeholders need different justifications: Creators, Operators, Executors, Decision Subjects, Auditors. (Tomsett et al., 2018)

• Decision subjects need recourse: what would change the outcome?

• Regulators need auditable trails demonstrating appropriate behavior.

• In medical devices: interpretability is part of compliance and approval.

03 DISCOVER

Extract Insights from Models & Data

• Models are not only prediction machines — they encode learned relationships.

• Example: Churn model reveals drivers (price sensitivity, service quality) enabling targeted intervention, not just risk identification.

• Interpretability turns prediction into understanding.

Molnar,(2025)

The three goals presented here provide a practical organising principle for explainability work.
They shift the question from what tools should I use to what am I trying to accomplish? ��The improvement goal (use it as a development tool) — debugging, validation, feature quality — is arguably the most underappreciated. Practitioners who treat explainability as a reporting obligation rather than a development tool miss an opportunity to identify problems before they enter production.��The justification goal is the most visible in governance and regulatory discussions.
Different audiences require different explanations — a regulator needs an auditable trail, a programme beneficiary needs an account they can evaluate and contest, a data scientist needs technical specificity.��The discovery goal connects explainability to scientific inference. When interpretable models are applied to development data — agricultural yields, health outcomes, labour market transitions — the learned feature relationships can generate substantive hypotheses about causal mechanisms, not merely predictive signals.

8 of 18

Three Levels of Explainability

Explainability operates at distinct but interconnected levels — each requiring different methods and serving different goals.

DATA LEVEL

Understanding the data before any model is trained

Key Questions:

Are there biases or demographic imbalances?

Are proxy variables present (e.g., zip code as proxy for race)?

Is there data leakage that will inflate performance?

Which features are meaningfully related to the outcome?

Methods: Feature distributions, correlation analysis, PDP/PFI on simple surrogates, bias audits

💡 Poor data → misleading models → misleading explanations. Explainability must start here.

MODEL LEVEL

Understanding how the model processes inputs & makes predictions

Key Questions:

Which features drive overall model behavior?

Is the model relying on shortcuts or spurious correlations?

Does the model logic align with domain knowledge?

Is behavior consistent across demographic subgroups?

Methods: Global: Feature Importance, PDP, ALE, Surrogate Models. Model-specific: attention maps, SHAP global

💡 This is where dangerous shortcuts are caught before deployment.

OUTCOME LEVEL

Understanding why a specific individual decision was made

Key Questions:

Why was this household excluded from the benefit program?

What would need to change to get a different outcome?

Can this decision be contested and challenged?

Methods: Local: LIME, SHAP (instance), Counterfactuals, Anchors, ICE

💡 Critical for affected populations, regulators, and auditors. Enables contestability and recourse.

The three-level framework presented here reflects a practical reality: explanatory failures can occur at any point in the pipeline, and each level requires distinct analytical tools and raises distinct governance questions.�Data-level explainability concerns the inputs to model training. Before a model is fitted, the data should be examined for demographic imbalances, proxy variables, distributional shifts, and leakage. These are not problems that post-hoc explanation methods can correct after the fact. If the training data systematically underrepresents a population subgroup, the model will learn biased associations, and SHAP values computed on that model will accurately describe — but not resolve — the underlying problem.��Model-level explainability concerns the learned function: which features drive predictions, whether the model relies on theoretically coherent relationships, and whether behaviour is consistent across demographic subgroups. This is where methods like permutation feature importance, partial dependence plots, and SHAP global summaries are most relevant.��Outcome-level explainability concerns individual decisions. Why was this application rejected? What change in circumstances would produce a different result? Can the affected individual understand and contest the decision? These are the questions that define accountability in practice, and they require local explanation methods — LIME, SHAP instance-level, counterfactuals — that operate at the case level rather than the dataset level.

9 of 18

XAI Techniques: A Mental Map

BY DESIGN vs POST-HOC

By Design (Intrinsic)

Restrict model class to ensure transparency. Linear regression, decision trees, logistic regression. Coefficients are the explanation.

Post-Hoc

Train any model, then add interpretation afterward. Works on black-boxes. Can be model-agnostic or model-specific.

MODEL-AGNOSTIC vs MODEL-SPECIFIC

Model-Agnostic

Treat model as black box. SIPA principle: Sample → Intervene → Predict → Aggregate. Works for any algorithm. (Scholbeck et al., 2020)

Model-Specific

Use internal structure (weights, gradients, attention). Powerful but not transferable across model types. Best for neural network internals.

LOCAL vs GLOBAL

Local Methods

Explain one specific prediction. LIME, SHAP (instance), Counterfactuals, Anchors, ICE. Essential for "right to explanation" and auditing individual decisions.

Global Methods

Explain overall model behavior across dataset. PDP, ALE, Permutation Feature Importance, Surrogate Models. Reveal dominant patterns and potential biases.

Scholbeck et al. (2020)

The taxonomy presented here is a diagnostic tool.
Before selecting an explainability method, the analyst should be able to answer three questions:
Is interpretability needed by design, or is post-hoc analysis sufficient?
Is the method required to generalise across model types, or can it exploit internal structure?
Is the goal to explain the model's overall behaviour, or to explain a specific decision?�The answers to these questions substantially narrow the relevant technique space.

A regulatory audit of a black-box credit model, requiring local explanations for individual decisions, points toward model-agnostic local methods such as LIME or SHAP instance-level.

A scientific analysis of feature relationships across a development dataset points toward global methods such as permutation importance or accumulated local effects. A deployment context where interpretability is a legal requirement may necessitate a by-design approach — accepting a glass-box model rather than adding post-hoc explanation to a black-box.��Understanding the taxonomy prevents a common error: selecting a method because it is familiar rather than because it is appropriate for the task.

10 of 18

Key Techniques:

LIME — Local Interpretable Model-Agnostic Explanations

HOW:

Generates synthetic data around a specific instance → observes how predictions change → fits a simple interpretable model to those local perturbations.

USE:

Why was this loan rejected? Justifying individual decisions to regulators.

SHAP — SHapley Additive exPlanations

HOW:

Assigns each feature a contribution score based on game-theoretic Shapley values. Works globally (feature importance) and locally (instance-level attribution).

USE:

Credit model auditing: confirming that the model's logic aligns with lending best practices and fairness guidelines.

Counterfactuals — Minimal-Change Explanations

HOW:

Determines the smallest change to input features that would flip the model's prediction — directly answering "What would need to be different?"

USE:

Recourse: rejected loan applicants understand what actions could change the outcome.

PDP / ALE — Partial Dependence & Accumulated Local Effects

HOW:

Show the marginal effect of one or two features on the predicted outcome across the entire dataset. ALE corrects for feature correlations.

USE:

Development analytics: understanding the overall relationship between a development indicator and a predicted outcome across populations.

Molnar (2025)

Each technique addresses a distinct explanatory question and is suited to a distinct use case.��LIME (Ribeiro et al., 2016) answers: why did the model make this specific prediction? It does so by constructing a locally faithful approximation — a simple linear model fitted on perturbed samples near the instance of interest. The approximation is not globally accurate, but it reliably characterises the model's behaviour in the neighbourhood of the prediction. It is particularly useful for justifying individual decisions to non-technical stakeholders.��SHAP (Lundberg & Lee, 2017) answers: how much did each feature contribute to this prediction, relative to a baseline? The Shapley value framework from cooperative game theory provides a theoretically principled attribution that satisfies additivity, consistency, and dummy-player properties. Its dual applicability at the global and local level makes it the most versatile tool currently available.��Counterfactual explanations answer: what is the minimal change to inputs that would change the prediction? This is the recourse question — the information a rejected applicant needs to understand what actionable steps might alter their outcome. It is the explanation form most directly aligned with the right to explanation as understood in policy contexts.��PDP and ALE answer: what is the marginal relationship between a feature and the predicted outcome across the dataset? ALE corrects for feature correlation artefacts that affect PDP under collinearity, making it preferable for development datasets where features such as income, education, and geography tend to co-vary.

11 of 18

Key Tensions & Design Principles

Accuracy

Interpretability

Simple models ↑ transparency but ↓ performance.

In high-stakes development contexts, interpretability may matter MORE than marginal accuracy gains. Challenge this trade-off deliberately.

Fidelity

Simplicity

Short explanations necessarily omit causes — useful but incomplete. Good explanations must be simple enough for humans while faithful enough to the model.

Portability

Detail

Model-agnostic methods work for any algorithm (high portability) but sacrifice depth. Model-specific methods are more precise but become obsolete when you switch architectures.

Global

Local

Global methods reveal systemic behavior (bias, dominant patterns).

Local methods explain individual decisions (auditability, recourse). Both are essential — neither is sufficient alone.

Transparency

Gaming Risk

Revealing model logic can enable strategic manipulation (credit scoring). This argues for causal features over proxy features, and thoughtful disclosure policies.

All Explanations

Are Incomplete

There is never one true explanation — only selected explanations.

We choose which story to tell. Explainability can mislead if presented as complete truth.

(Molnar ,2025)

Rathar than thinking of explinablity as a problem we can fully solve, it is more accurate to think of it as a set of design tenstions
improving one aspect- like accuracy or simplinluty usually come at the cost of another
e.g increasing model accuracy reduce interpretablity
Simpler explanations are easier to understand but may not fully reflect the model’s true reasoning

Large language models introduce capabilities that are genuinely novel in the explainability landscape. The ability to produce natural-language explanations that adapt to the expertise level of the audience — from technically precise accounts for data scientists to plain-language summaries for programme beneficiaries — addresses a long-standing usability barrier. Traditional explanation outputs such as SHAP waterfall plots or LIME feature weight tables require interpretive competence that many end users and decision-makers do not possess.��The risks, however, are proportionally serious. The hallucination problem — where LLMs generate explanations that are syntactically fluent and semantically coherent but factually incorrect or ungrounded in the model's actual computation — is not a minor technical limitation. In high-stakes contexts, a confident but inaccurate explanation is more dangerous than no explanation, because it may foreclose questioning rather than invite it.��The faithfulness gap is a related but distinct problem: an explanation may accurately describe some aspect of model behaviour without capturing the features that actually drove the prediction. Current evaluation methods for LLM-generated explanations are insufficiently developed to detect these failures reliably. Institutions considering LLM-based explainability interfaces should treat them as experimental and subject them to rigorous validation against ground-truth explanation methods before relying on them for accountability purposes.

12 of 18

“

“The assumptions and reasoning behind a decision are not examinable, even to the decider. The logic, if there is any, leading to a social policy , is unclear to most people affected by the policy. As far as the general public and even many policymakers themselves are concerned, today's vital decisions are about as understandable and accessible as if they had been handed down by a Delphic oracle.“ Meadows & Robinson, 1985

On the opacity of policy decisions based on mental model

13 of 18

The XAI Lifecycle Framework

Explainability is not a one-time feature — it is a continuous governance process supported by MLOps.

INCEPTION

& DATA

▸ Bias Detection: Analyze feature distributions; identify demographic imbalances before model training.

▸ Feature Selection: Use PDP/PFI on surrogate models to identify relevant variables.

▸ Data Leakage Check: Interpretability reveals features that should not be available at inference.

MODEL

BUILDING

▸ Debugging: LIME/SHAP on individual training instances to expose incorrect learning patterns.

▸ Fairness Auditing: Check model behavior across cohorts before deployment.

▸ Validation: Confirm model logic aligns with domain expertise.

DEPLOYMENT

& INFERENCE

▸ Real-Time Explanation: Local explanations for every live prediction — essential for user trust and regulatory "right to explanation."

▸ User Trust & Adoption: Clear explanations of model reasoning drives acceptance in high-stakes contexts.

MONITORING

& GOVERNANCE

▸ Drift Detection: If feature importance or feature-prediction relationships change, XAI tools flag model drift requiring retraining.

▸ Continuous Auditing: MLOps automates XAI so governance is ongoing, not episodic.

MLOps Integration: Automates XAI across the full lifecycle — ensuring explainability is a continuous, governed process, not a one-off effort.

1 INCEPTION

2 MODEL BUILD

3 DEPLOYMENT

4 MONITORING

The lifecycle framing responds to a common implementation failure: explainability treated as a compliance checkpoint rather than a continuous practice. A model that passes an explainability review at the point of deployment can become unexplainable in production if feature distributions shift, model updates are applied, or the decision context changes.��At the inception and data stage, interpretability tools serve a diagnostic function — identifying features that should not be included, detecting imbalances that will propagate into predictions, and validating that the feature set is theoretically coherent. At the model-building stage, local explanation methods applied to training instances can reveal incorrect learning patterns that aggregate performance metrics conceal.��At deployment, the right to explanation creates an operational requirement for local explanations to be available at inference time, not computed retrospectively. At the monitoring stage, changes in global feature importance distributions are a sensitive indicator of model drift that precedes deterioration in headline performance metrics.��MLOps infrastructure makes this lifecycle approach operationally feasible by automating XAI computations at each stage and integrating outputs into monitoring dashboards and governance documentation.

14 of 18

Explainability could offer partial Substitute for Transparency

In contexts where data cannot be fully shared (privacy, sovereignty, ethics), explainability functions as a PARTIAL SUBSTITUTE for full transparency — enabling structured auditing, reproducibility checks, and verification of model logic.

🔍 Structured Auditing Without Data Sharing

XAI provides auditable trails — feature attributions, decision paths, bias checks — that allow external reviewers to validate model behavior without accessing raw sensitive data. Critical for development institutions handling population-level surveys or health records.

♻️ Reproducibility via Explanation Documentation

When data cannot be re-shared, documenting the model's learned logic (global feature effects, fairness diagnostics) enables reproducibility checks that would otherwise require the original dataset. Explanations become the reproducible artifact.

🏛️ Institutional Trust in Governance Contexts

Development institutions (World Bank, UN agencies, national statistics offices) require not just accurate predictions but justifiable ones. Explainability connects AI outputs to domain knowledge, policy objectives, and fairness norms — building institutional confidence in AI-powered measurement.

⚖️ Aligning with the Rashomon Insight

Multiple models may achieve similar performance. Explainability helps institutions choose WHICH model to trust — not just which performs best — by evaluating whether the model's learned logic is consistent with theory and values.

15 of 18

From Model Explainability to System Explainability

The critical shift for development institutions — and the culmination of this framework.

MODEL EXPLAINABILITY

Why did this model make this prediction?

→

SYSTEM EXPLAINABILITY

Is the entire AI system — from data to decision — understandable, trustworthy, and accountable?

System Explainability encompasses:

📊

Bias audits, feature validity, leakage checks — explainability before training begins

🤖

Interpretable or explainable predictions — local and global methods applied continuously

⚙️

MLOps integration — automated XAI across preprocessing, training, deployment, and monitoring

🏛️

Governance structures — who reviews explanations, who can contest decisions, what documentation exists

🎯

Explanations tailored to stakeholders , technical for auditors, accessible for affected populations

📊 Data

🤖 Models

⚙️ Pipelines

🏛️ Institutions

🎯 Decisions

the full pipeline — from data to decision

Bulind on the lifecyle orespesctive we just discussed, this slide extend the idea further moving from model explaninblity to system explainablity

in practice , deciosins ar not made by model alone, they are preduced by the systems that include the data , pipelines, and govemrence

The distinction between model explainability and system explainability marks a maturity threshold.

Model explainability — explaining why a specific model produced a specific prediction — is a technical capability.
System explainability — ensuring that the entire sociotechnical pipeline from data collection to consequential decision is understandable, auditable, and contestable — is an institutional capability.��The transition matters because consequential harms in AI-assisted decision-making rarely originate in a single model prediction.
They arise from the cumulative effect of choices made throughout the system: how data are collected and labelled, what features are included or excluded, how model outputs are translated into administrative decisions, who has access to challenge those decisions, and what documentation exists to support accountability after the fact.
Development institutions — the World Bank, UN agencies, national statistics offices, and their implementing partners — operate in contexts where each of these choices carries significant distributional consequences.

System explainability requires not only technical tools but governance structures: defined roles for explanation review, formal contestability mechanisms, documentation standards, and audit trails that persist beyond the active lifecycle of any individual model.

16 of 18

LLMs & The New Frontier of Interpretability

From Static Explanations to Interactive Understanding

OPPORTUNITIES

CHALLENGES

Natural Language Explanations

LLMs explain complex model behaviors in formats humans naturally understand — reducing the usability barrier of technical tools like saliency maps. Multi-level: from simple to technical.

Interactive & Conversational

"Why did you choose this answer?" or "What would happen if this input changed?" — users engage dynamically, not passively. Strongly preferred by decision-makers over static outputs.

Dataset-Level Understanding

LLMs extend interpretability beyond models to explaining entire datasets — identifying patterns, subgroups, and latent structure through natural language narrative.

Cross-Modality Bridge

LLMs interpret complex domains (genomics, chemistry, images) in human-readable form — enabling interpretability where traditional tools fail.

Hallucination Risk

LLMs produce explanations that are fluent and convincing but factually incorrect or not grounded in the model's actual reasoning. Most critical challenge — directly undermines trust.

Opacity at Scale

Hundreds of billions of parameters — impossible to directly inspect. Many LLMs are API-based, blocking access to weights or gradients needed for mechanistic analysis.

Faithfulness Gap

Natural-language flexibility increases the risk of post-hoc rationalization: explanations that appear coherent but do not logically entail the prediction. (Atanasova et al.)

Evaluation Difficulty

Simply asking if users "like" an explanation is insufficient. Empirical evidence shows explanations can vary from highly beneficial to completely unhelpful depending on context.

( Singh et al. 2024)

17 of 18

Towards Responsible AI in

Development Measurement

Explainability is a system-level requirement — spanning data, model, deployment, and monitoring — not a post-hoc add-on.

Three goals guide every decision: Improve the model · Justify to stakeholders · Discover insights.

In data-constrained environments, explainability is the open-science bridge — enabling auditing, reproducibility, and trust without data sharing.

LLMs open a new frontier: interactive, conversational interpretability , but hallucination and faithfulness gaps demand rigorous validation.

Interpretability is not about choosing one method — it is about choosing the right combination for your goal and your audience.

Explainability is about making AI decisions understandable, trustworthy, and usable in real-world systems.

explainblity is not just a technical feacute- it is a system level requirement that spans entire lifcyel , from data to deployment and monitoring

To close, five propositions that summarise the framework developed across this talk.��First: explainability is a system-level requirement. It must be built into data collection, model development, deployment architecture, and governance procedures. It cannot be added at the end of a pipeline as a reporting function.��Second: the three goals — improve, justify, discover — provide a practical basis for selecting methods and evaluating whether explainability work has been done adequately. If none of these goals is being served, the activity is likely compliance theatre rather than substantive accountability.��Third: in data-constrained environments, documented model explanations function as a substitute for data transparency. They are the open-science contribution that development institutions can make when data cannot be shared.��Fourth: LLMs represent a genuine advance in explainability interfaces, but they introduce risks — particularly hallucination and faithfulness failures — that require rigorous validation before these tools are deployed in accountability-critical contexts.��Fifth: method selection is a contextual judgment. There is no universal best approach. The appropriate combination of techniques depends on the decision being explained, the audience receiving the explanation, the regulatory environment, and the institutional capacity to act on what the explanation reveals. Developing that judgment is the core practitioner competence that this framework is intended to support.

18 of 18

A CLOSING REFLECTION

A visitor to the home of Niels Bohr noticed a horseshoe hanging over the door and asked:

“Surely you, a scientist, do not believe in superstitions?”

Bohr replied: “Of course not. But I have been told it works whether you believe in it or not.”

One approach to AI: use it because it works , without needing to understand why.

Bohr could afford that.

In development contexts affecting vulnerable populations

We need to understand how and why it works.

The Bohr anecdote is used here for a specific purpose, not merely as comic relief. The horseshoe story illustrates the distinction between personal belief and institutional function. Bohr does not believe in superstitions; he hangs the horseshoe anyway, because it is reported to work independently of belief.��The parallel for explainability is direct. Organisations may not believe that their models require explanation — they may consider the demand for interpretability a regulatory imposition, a distraction, or an unnecessary cost. The evidence suggests that explainability generates value regardless of that belief: it improves model quality, reduces deployment failures, builds the institutional trust on which programme legitimacy depends, and provides the documentation that regulatory accountability requires.��The operational implication is the same as Bohr's. Build it into the system. Whether stakeholders are convinced at the outset is a secondary consideration.