Explainability Across
the AI Lifecycle
Engineering Trust, Reproducibility, and Accountability
in Development Measurement
Dr. Mohammed Ba-Aoum
Blue Cross Blue Sheild NC | NIH
Presented at MeasureDev 2026 | World Bank
“
Acceptance of black-box models, if it ever did happen, would be a strange, technocratic coup in which modellers have gained the power to shape the assumptions, perceptions, and conclusions of decision-makers in a way the decision-makers themselves do not quite understand.
Meadows & Robinson, 1985 — written forty years before the current debate, still unresolved.
Dr. Mohammed Ba-Aoum · MeasureDev 2026 · 4/22
Roadmap of This Talk
01
Why Explainability Matters
Business · Regulation · Development Contexts
02
Key Thesis & Conceptual Spectrum
The Right Reason · Interpretability → Understanding
03
Three Levels of Explainability
Data · Model · Outcome
04
Techniques & XAI Lifecycle
LIME · SHAP · By-Design vs Post-Hoc
05
System-Level Explainability
From Model to System Accountability
06
LLMs & Open Science
New Frontier · Development Contexts
Dr. Mohammed Ba-Aoum · MeasureDev 2026 · World Bank · 2/9
The Core Problem & Key Thesis
The Problem
AI is embedded in healthcare, development policy, finance, and social systems, but understanding has not kept pace.
WHAT
Prediction
≠
WHY
Explanation
This gap creates a fundamental tension between performance and accountability , felt most acutely in development contexts where decisions affect vulnerable populations.
The Challenge
"Did the model predict correctly ,
or for the right reason?"
A model can achieve high accuracy and still:
⚖️ Produce biased outcomes against vulnerable groups
🔗 Rely on misleading patterns that break in the field
💥 Fail when context shifts , shortcut models collapse
🚫 Generate unjustifiable decisions to affected people
In development contexts, where data are sensitive, decisions affect vulnerable populations, and institutional trust is paramount, a correct prediction only partially solves the problem. We need to know the why.
Why Explainability Matters
01 ⚖️ Regulation & Governance
▸ EU AI Act mandates documentation, testing & transparency for high-risk systems.
▸ Development datasets (health records, poverty surveys) are subject to HIPAA, GDPR, and national data sovereignty laws.
▸ Org must generate auditable trails to demonstrate compliance
02 🎯 Fairness, Bias & Equity
▸ Models inherit historical biases. Explainability reveals proxy discrimination (location → income) before deployment.
▸ Decisions affect underrepresented populations with limited ability to contest outcomes, the right to explanation is a matter of equity.
▸ Fragile institutional trust: opaque AI systems actively erode the legitimacy needed for effective policy intervention.
03 📊 Trust, Adoption & ROI
▸ Transparent systems drive acceptance among field staff, beneficiaries, and partner governments
▸ Explainability enables faster debugging → reduces the cost of model failure in production environments with high data costs.
▸ ROI framing is more relevant for private sector AI — here, trust and mission alignment are the primary returns.
The Full Spectrum: Three Distinct Concepts
INTERPRETABILITY
By Design (Glass-Box)
• Concerns the internal mechanics of the model, answering HOW it arrives at results.
• A model is interpretable if a human can consistently predict its output from its structure.
"interpretability is the degree to which a human can understand the cause of a decision" (Miller, 2019)
• Example: Linear regression — each coefficient directly quantifies feature influence on the prediction.
• Contrast: Deep Neural Networks are not inherently interpretable ("black-box") due to complex, non-linear transformations across thousands of layers.
EXPLAINABILITY
Post-Hoc Justification
• Applies techniques to black-box models AFTER training to gain insight into their inner workings.
• Global methods (Feature Importance): Which features matter across the entire dataset?
• Local methods (LIME, SHAP): Why did the model make this specific prediction for this instance?
• Critical for justifying individual high-stakes decisions to affected individuals or regulators.
UNDERSTANDING
Holistic AI Governance
• The holistic appreciation of a model's capabilities, limitations, and societal impact.
• Synthesizes interpretability + explainability with domain expertise, ethics, and business context.
- Does the model logic make sense?
- Is it fair?
- Does it meet strategic goals?
• Without this, an organization runs a complex system whose true value and harms remain opaque.
((Biran & Cotton, 2017; Miller, 2019)
Three Goals of Interpretability
"Interpretability is not the destination — it is an instrument for better modeling, better accountability, and better learning."
01 IMPROVE
Debug, Validate & Enhance the Model
• Reveal shortcuts, data leakage, incorrect feature effects.
• Example: Feature importance showed snow (not animal features) drove wolf/dog classification. (Ribeiro et al., 2016)
• Workflow: Train → Inspect → Identify → Improve features → Retrain.
• Interpretability is quality assurance, not just explanation.
02 JUSTIFY
Explain to Stakeholders, Regulators & Affected Individuals
• Different stakeholders need different justifications: Creators, Operators, Executors, Decision Subjects, Auditors. (Tomsett et al., 2018)
• Decision subjects need recourse: what would change the outcome?
• Regulators need auditable trails demonstrating appropriate behavior.
• In medical devices: interpretability is part of compliance and approval.
03 DISCOVER
Extract Insights from Models & Data
• Models are not only prediction machines — they encode learned relationships.
• Example: Churn model reveals drivers (price sensitivity, service quality) enabling targeted intervention, not just risk identification.
• Interpretability turns prediction into understanding.
Molnar,(2025)
Three Levels of Explainability
Explainability operates at distinct but interconnected levels — each requiring different methods and serving different goals.
01
DATA LEVEL
Understanding the data before any model is trained
Key Questions:
Methods: Feature distributions, correlation analysis, PDP/PFI on simple surrogates, bias audits
💡 Poor data → misleading models → misleading explanations. Explainability must start here.
02
MODEL LEVEL
Understanding how the model processes inputs & makes predictions
Key Questions:
Methods: Global: Feature Importance, PDP, ALE, Surrogate Models. Model-specific: attention maps, SHAP global
💡 This is where dangerous shortcuts are caught before deployment.
03
OUTCOME LEVEL
Understanding why a specific individual decision was made
Key Questions:
Methods: Local: LIME, SHAP (instance), Counterfactuals, Anchors, ICE
💡 Critical for affected populations, regulators, and auditors. Enables contestability and recourse.
XAI Techniques: A Mental Map
BY DESIGN vs POST-HOC
By Design (Intrinsic)
Restrict model class to ensure transparency. Linear regression, decision trees, logistic regression. Coefficients are the explanation.
Post-Hoc
Train any model, then add interpretation afterward. Works on black-boxes. Can be model-agnostic or model-specific.
MODEL-AGNOSTIC vs MODEL-SPECIFIC
Model-Agnostic
Treat model as black box. SIPA principle: Sample → Intervene → Predict → Aggregate. Works for any algorithm. (Scholbeck et al., 2020)
Model-Specific
Use internal structure (weights, gradients, attention). Powerful but not transferable across model types. Best for neural network internals.
LOCAL vs GLOBAL
Local Methods
Explain one specific prediction. LIME, SHAP (instance), Counterfactuals, Anchors, ICE. Essential for "right to explanation" and auditing individual decisions.
Global Methods
Explain overall model behavior across dataset. PDP, ALE, Permutation Feature Importance, Surrogate Models. Reveal dominant patterns and potential biases.
Scholbeck et al. (2020)
Key Techniques:
LIME — Local Interpretable Model-Agnostic Explanations
HOW:
Generates synthetic data around a specific instance → observes how predictions change → fits a simple interpretable model to those local perturbations.
USE:
Why was this loan rejected? Justifying individual decisions to regulators.
SHAP — SHapley Additive exPlanations
HOW:
Assigns each feature a contribution score based on game-theoretic Shapley values. Works globally (feature importance) and locally (instance-level attribution).
USE:
Credit model auditing: confirming that the model's logic aligns with lending best practices and fairness guidelines.
Counterfactuals — Minimal-Change Explanations
HOW:
Determines the smallest change to input features that would flip the model's prediction — directly answering "What would need to be different?"
USE:
Recourse: rejected loan applicants understand what actions could change the outcome.
PDP / ALE — Partial Dependence & Accumulated Local Effects
HOW:
Show the marginal effect of one or two features on the predicted outcome across the entire dataset. ALE corrects for feature correlations.
USE:
Development analytics: understanding the overall relationship between a development indicator and a predicted outcome across populations.
Molnar (2025)
Key Tensions & Design Principles
Accuracy
vs
Interpretability
Simple models ↑ transparency but ↓ performance.
In high-stakes development contexts, interpretability may matter MORE than marginal accuracy gains. Challenge this trade-off deliberately.
Fidelity
vs
Simplicity
Short explanations necessarily omit causes — useful but incomplete. Good explanations must be simple enough for humans while faithful enough to the model.
Portability
vs
Detail
Model-agnostic methods work for any algorithm (high portability) but sacrifice depth. Model-specific methods are more precise but become obsolete when you switch architectures.
Global
vs
Local
Global methods reveal systemic behavior (bias, dominant patterns).
Local methods explain individual decisions (auditability, recourse). Both are essential — neither is sufficient alone.
Transparency
vs
Gaming Risk
Revealing model logic can enable strategic manipulation (credit scoring). This argues for causal features over proxy features, and thoughtful disclosure policies.
All Explanations
vs
Are Incomplete
There is never one true explanation — only selected explanations.
We choose which story to tell. Explainability can mislead if presented as complete truth.
(Molnar ,2025)
“
“The assumptions and reasoning behind a decision are not examinable, even to the decider. The logic, if there is any, leading to a social policy , is unclear to most people affected by the policy. As far as the general public and even many policymakers themselves are concerned, today's vital decisions are about as understandable and accessible as if they had been handed down by a Delphic oracle.“ Meadows & Robinson, 1985
.
On the opacity of policy decisions based on mental model
The XAI Lifecycle Framework
Explainability is not a one-time feature — it is a continuous governance process supported by MLOps.
INCEPTION
& DATA
▸ Bias Detection: Analyze feature distributions; identify demographic imbalances before model training.
▸ Feature Selection: Use PDP/PFI on surrogate models to identify relevant variables.
▸ Data Leakage Check: Interpretability reveals features that should not be available at inference.
MODEL
BUILDING
▸ Debugging: LIME/SHAP on individual training instances to expose incorrect learning patterns.
▸ Fairness Auditing: Check model behavior across cohorts before deployment.
▸ Validation: Confirm model logic aligns with domain expertise.
DEPLOYMENT
& INFERENCE
▸ Real-Time Explanation: Local explanations for every live prediction — essential for user trust and regulatory "right to explanation."
▸ User Trust & Adoption: Clear explanations of model reasoning drives acceptance in high-stakes contexts.
MONITORING
& GOVERNANCE
▸ Drift Detection: If feature importance or feature-prediction relationships change, XAI tools flag model drift requiring retraining.
▸ Continuous Auditing: MLOps automates XAI so governance is ongoing, not episodic.
MLOps Integration: Automates XAI across the full lifecycle — ensuring explainability is a continuous, governed process, not a one-off effort.
1 INCEPTION
2 MODEL BUILD
3 DEPLOYMENT
4 MONITORING
Explainability could offer partial Substitute for Transparency
In contexts where data cannot be fully shared (privacy, sovereignty, ethics), explainability functions as a PARTIAL SUBSTITUTE for full transparency — enabling structured auditing, reproducibility checks, and verification of model logic.
🔍 Structured Auditing Without Data Sharing
XAI provides auditable trails — feature attributions, decision paths, bias checks — that allow external reviewers to validate model behavior without accessing raw sensitive data. Critical for development institutions handling population-level surveys or health records.
♻️ Reproducibility via Explanation Documentation
When data cannot be re-shared, documenting the model's learned logic (global feature effects, fairness diagnostics) enables reproducibility checks that would otherwise require the original dataset. Explanations become the reproducible artifact.
🏛️ Institutional Trust in Governance Contexts
Development institutions (World Bank, UN agencies, national statistics offices) require not just accurate predictions but justifiable ones. Explainability connects AI outputs to domain knowledge, policy objectives, and fairness norms — building institutional confidence in AI-powered measurement.
⚖️ Aligning with the Rashomon Insight
Multiple models may achieve similar performance. Explainability helps institutions choose WHICH model to trust — not just which performs best — by evaluating whether the model's learned logic is consistent with theory and values.
From Model Explainability to System Explainability
The critical shift for development institutions — and the culmination of this framework.
MODEL EXPLAINABILITY
Why did this model make this prediction?
→
SYSTEM EXPLAINABILITY
Is the entire AI system — from data to decision — understandable, trustworthy, and accountable?
System Explainability encompasses:
📊
Bias audits, feature validity, leakage checks — explainability before training begins
🤖
Interpretable or explainable predictions — local and global methods applied continuously
⚙️
MLOps integration — automated XAI across preprocessing, training, deployment, and monitoring
🏛️
Governance structures — who reviews explanations, who can contest decisions, what documentation exists
🎯
Explanations tailored to stakeholders , technical for auditors, accessible for affected populations
📊 Data
🤖 Models
⚙️ Pipelines
🏛️ Institutions
🎯 Decisions
the full pipeline — from data to decision
LLMs & The New Frontier of Interpretability
From Static Explanations to Interactive Understanding
OPPORTUNITIES
CHALLENGES
Natural Language Explanations
LLMs explain complex model behaviors in formats humans naturally understand — reducing the usability barrier of technical tools like saliency maps. Multi-level: from simple to technical.
Interactive & Conversational
"Why did you choose this answer?" or "What would happen if this input changed?" — users engage dynamically, not passively. Strongly preferred by decision-makers over static outputs.
Dataset-Level Understanding
LLMs extend interpretability beyond models to explaining entire datasets — identifying patterns, subgroups, and latent structure through natural language narrative.
Cross-Modality Bridge
LLMs interpret complex domains (genomics, chemistry, images) in human-readable form — enabling interpretability where traditional tools fail.
Hallucination Risk
LLMs produce explanations that are fluent and convincing but factually incorrect or not grounded in the model's actual reasoning. Most critical challenge — directly undermines trust.
Opacity at Scale
Hundreds of billions of parameters — impossible to directly inspect. Many LLMs are API-based, blocking access to weights or gradients needed for mechanistic analysis.
Faithfulness Gap
Natural-language flexibility increases the risk of post-hoc rationalization: explanations that appear coherent but do not logically entail the prediction. (Atanasova et al.)
Evaluation Difficulty
Simply asking if users "like" an explanation is insufficient. Empirical evidence shows explanations can vary from highly beneficial to completely unhelpful depending on context.
( Singh et al. 2024)
Towards Responsible AI in
Development Measurement
01
Explainability is a system-level requirement — spanning data, model, deployment, and monitoring — not a post-hoc add-on.
02
Three goals guide every decision: Improve the model · Justify to stakeholders · Discover insights.
03
In data-constrained environments, explainability is the open-science bridge — enabling auditing, reproducibility, and trust without data sharing.
04
LLMs open a new frontier: interactive, conversational interpretability , but hallucination and faithfulness gaps demand rigorous validation.
05
Interpretability is not about choosing one method — it is about choosing the right combination for your goal and your audience.
Explainability is about making AI decisions understandable, trustworthy, and usable in real-world systems.
A CLOSING REFLECTION
A visitor to the home of Niels Bohr noticed a horseshoe hanging over the door and asked:
“Surely you, a scientist, do not believe in superstitions?”
Bohr replied: “Of course not. But I have been told it works whether you believe in it or not.”
One approach to AI: use it because it works , without needing to understand why.
Bohr could afford that.
In development contexts affecting vulnerable populations
We need to understand how and why it works.