NeuroSymbolic AI
for
Grounding,
Instructibility, and
Explainability
Tutorial 2025
Spread the Word
Making LLMs Explainable, Grounded, and Instructible
Focus of Tutorial
NeuroSymbolic AI and Instructible AI
Vector Symbolic Architectures
Explainability with Knowledge-infused Learning
Grounding with Retrieval Augmented Generation
OpenCHA: Domain Knowledge-driven
LLM-based Conversational Agent for Health
Tutorial’s Central Question
The "black box" nature of AI systems in
High Stakes Decision-Making Application
research has raised concerns about transparency and reproducibility.
How can we go about reducing Blackboxness??
Attention Maps and Feature Visualization
Layer Analysis and Activation Patterns
Attention Analysis and Probing Tasks
Token Attribution and Hidden States with Sparse Autoencoders
Mechanistic Interpretability
Behavioral Testing
?
Statistical AI is a Blackbox
NeuroSymbolic AI
Why NeuroSymbolic AI
Amit Sheth, Kaushik Roy, Manas Gaur, Neurosymbolic Artificial Intelligence (Why, What, and How), IEEE Intelligent Systems, 38 (3), May-June 2023
Explainability
🡪 Explanations that use terms and connections specific to a particular field or industry are more useful than general words that don't help people take action.
NeuroSymbolic AI
Neuro-symbolic AI techniques incorporate broader forms of knowledge (lexical, domain-specific, common-sense, and constraint-based) into addressing limitations of either symbolic or statistical AI approaches, such as model interpretations and user-level explanations. Compared to powerful statistical AI that exploits data, NeSy benefits from data and knowledge.
Neurosymbolic AI in the Era of Large Language Models
Applications f
Applications of NeuroSymbolic AI with a focus on Grounding, Explainability, and Instructibility (EGI)
NeuroSymbolic in Machine Learning and Natural Language Processing
Petroni, F., Rocktäschel, T., Riedel, S., Lewis, P., Bakhtin, A., Wu, Y., & Miller, A. (2019, November). Language Models as Knowledge Bases?. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Neural AI
Data
Hyperparameters
Activation Functions
Loss Functions
Computing Power
and Model
Compression
Optimization
Symbolic AI
Knowledge Graphs
Lexicons
Rules
Workflow or Procedural Knowledge
Constraints
Benchmarking Example
WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions,
In BlackboxNLP @ EMNLP 2024
Wellness Dimension
Wellness Dimension
Wellness Dimension Definitions and Questionnaire
https://store.samhsa.gov/sites/default/files/sma16-4958.pdf
MultiWD and
WellXplain Datasets
Content worth 4000 users
6 Wellness Dimensions
Clinical expert explanations
Input
Attention Matrix
Explanation
Prediction
Input
Attention Matrix
Explanation
Prediction
Definitions In-Context Learning
Design 1
Design 2
Input
Attention Matrix
Explanation
Prediction
Questionnaire
Workflow-based
In-Context Learning
Design 4
Input
Attention Matrix
Explanation
Prediction
Chain of Thoughts with Definitions
Design 3
Domain-Specific LLMs
General Purpose LLMs
Hybridized Architectures: NeuroSymbolic AI
Process Knowledge-infused Learning
Simple Text Classification
�I am really struggling with my bisexuality, which is causing chaos in my relationship with a girl. Being a fan of the LGBTQ community, I am equal to worthless to her. I’m now starting to get drunk because I can’t cope with the obsessive, intrusive thoughts I need to get out of my head.
Don’t want to live anymore. Sexually assault, ignorant family members, and my never-ending loneliness brights up my path to death.
I do have the potential to live a decent life, but not with people who abandon me. Hopelessness and feelings of betrayal have turned my nights into days. I am developing insomnia because of my restlessness.
I just can’t take it anymore. Been abandoned yet again by someone I cared about. I've been diagnosed with borderline for a while, and I’m just going to isolate myself and sleep forever.
�Y = Suicide Ideation
Process Knowledge
Infusion is
better form than data-driven Classification
Simple Text Classification
�I am really struggling with my bisexuality, which is causing chaos in my relationship with a girl. Being a fan of the LGBTQ community, I am equal to worthless to her. I’m now starting to get drunk because I can’t cope with the obsessive, intrusive thoughts I need to get out of my head.
Don’t want to live anymore. Sexually assault, ignorant family members, and my never-ending loneliness brights up my path to death.
I do have the potential to live a decent life, but not with people who abandon me. Hopelessness and feelings of betrayal have turned my nights into days. I am developing insomnia because of my restlessness.
I just can’t take it anymore. Been abandoned yet again by someone I cared about. I've been diagnosed with borderline for a while, and I’m just going to isolate myself and sleep forever.
�Y = Suicide Ideation
Process Knowledge-based Classification
�Has the subject wished he was dead or wished
he could go to sleep and not wake up?
YES
�Has the subject had any thoughts of killing himself?
YES
�Has the subject been thinking about how he might do this?
NO
�Has the subject has these thoughts and some intentions of acting on them?
NO
Simple Text Classification
�I am really struggling with my bisexuality, which is causing chaos in my relationship with a girl. Being a fan of the LGBTQ community, I am equal to worthless to her. I’m now starting to get drunk because I can’t cope with the obsessive, intrusive thoughts I need to get out of my head.
Don’t want to live anymore. Sexually assault, ignorant family members, and my never-ending loneliness brights up my path to death.
I do have the potential to live a decent life, but not with people who abandon me. Hopelessness and feelings of betrayal have turned my nights into days. I am developing insomnia because of my restlessness.
I just can’t take it anymore. Been abandoned yet again by someone I cared about. I've been diagnosed with borderline for a while, and I’m just going to isolate myself and sleep forever.
�Y = Suicide Ideation
NeuroSymbolic AI in Social Media
NeuroSymbolic AI in Social Media
Symbolic AI
NeuroSymbolic AI in Social Media
Symbolic AI Scoring
NeuroSymbolic in Machine Learning and Natural Language Processing
Neural AI with Symbolic information
NeuroSymbolic in Machine Learning and Natural Language Processing
Anxiety | Depression, Cognitive distortions, panic attacks, hopelessness, physical sensations. |
Depression | Mood swings, weight gain, rapid cycling, depressive episode, Impulsivity, mood swings, antisocial conduct, personality disorder |
Addiction | Buying oxycodone, pain management, chronic pain, alienation, crippling alcohol, dependent on crack |
NeuroSymbolic in Machine Learning and Natural Language Processing
The tables compare model performance for mental health classification across Precision, Recall, and F1-Score. The left table shows traditional models’ results with and without the Neurosymbolic approach, while the right table contrasts the NeuroSymbolic model with state-of-the-art LLMs like LLama, Phi, and Mistral.
The NeuroSymbolic model consistently outperforms both traditional models and state-of-the-art LLMs, achieving higher performance metrics and adaptability in mental health sentiment classification.
NB: Naïve Bayes
RF: Random Forest
CREST
Neural AI and Symbolic AI for achieving Consistency
Bonagiri, Vamshi Krishna, Sreeram Vennam, Priyanshul Govil, Ponnurangam Kumaraguru, and Manas Gaur. "SaGE: Evaluating Moral Consistency in Large Language Models." In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation LREC-COLING 2024.
Claim: LLMs are not semantically consistent, and can give contradictory answers to paraphrased questions
NeuroSymbolic Empirical Analysis
Semantic Graph-driven Consistent LLM Training (SaGE)
Moral Consistency Corpus (MCC)
Forbes, Maxwell, Jena D. Hwang, Vered Shwartz, Maarten Sap, and Yejin Choi. "SOCIAL CHEMISTRY 101: Learning to Reason about Social and Moral Norms." In EMNLP. 2020.
BLEURT
BLEU
ROUGE-L
BERT Score
SaGE (LLAMA 3)
GPT-4
Forbes, Maxwell, Jena D. Hwang, Vered Shwartz, Maarten Sap, and Yejin Choi. "SOCIAL CHEMISTRY 101: Learning to Reason about Social and Moral Norms." In EMNLP. 2020.
Integrated Mental Health Instruction Dataset (105K Samples)
Yang, Kailai, Tianlin Zhang, Ziyan Kuang, Qianqian Xie, Jimin Huang, and Sophia Ananiadou. "MentaLLaMA: interpretable mental health analysis on social media with large language models." In Proceedings of the ACM Web Conference 2024, pp. 4489-4500. 2024.
NeuroSymbolic AI for Reliability
Reliability
Grounding
Ensemble of Large Language Models
Bias Awareness
Mechanistic Interpretability
NeuroSymbolic AI for Reliability
Reliability
Grounding
Ensemble of Large Language Models
Bias Awareness
Mechanistic Interpretability
Explainability
Knowledge Gap Resolution
Source Attribution
https://www.transformer-circuits.pub/2022/mech-interp-essay - By Chris Olah
Gorti, Atmika, Aman Chadha, and Manas Gaur. "Unboxing Occupational Bias: Debiasing LLMs with US Labor Data." In Proceedings of the AAAI Symposium Series, 2024.
Instructibility
Grounding
A successful AI teammate requires several cognitive capacities including situation assessment, task behavior, language comprehension and generation , and knowledge gap resolution processes. Grounding enables agents with different capabilities to communicate.
Knowledge Gap Resolution
Language Gap : Pay attention to Important Domain Concepts
Language Gap : Pay attention to Important Domain Concepts
What Is the Knowledge Gap?
More Details during
Grounding with Retrieval Augmented Generation
NeuroSymbolic AI for Instructibility
Why Instructibility Needs More Than Instruction Tuning?
Lin, Bill Yuchen, Abhilasha Ravichander, Ximing Lu, Nouha Dziri, Melanie Sclar, Khyathi Chandu, Chandra Bhagavatula, and Yejin Choi. "The unlocking spell on base llms: Rethinking alignment via in-context learning." arXiv preprint arXiv:2312.01552
NeuroSymbolic AI for Instructibility
Why Instructibility Needs More Than Instruction Tuning?
Lin, Bill Yuchen, Abhilasha Ravichander, Ximing Lu, Nouha Dziri, Melanie Sclar, Khyathi Chandu, Chandra Bhagavatula, and Yejin Choi. "The unlocking spell on base llms: Rethinking alignment via in-context learning." arXiv preprint arXiv:2312.01552
Most Importantly …
🚫 Why This Is Not Enough
How Instructibility Relates to Knowledge Gaps
Weighted Contextual Mutual Information
Knowledge gap
Gou, Tian, Boyao Zhang, Zhenglie Sun, Jing Wang, Fang Liu, Yangang Wang, and Jue Wang. "Rationality of thought improves reasoning in large language models." In International Conference on Knowledge Science, Engineering and Management, pp. 343-358. Singapore: Springer Nature Singapore, 2024.
LLM Generator
LLM Evaluator
Summary
Definition Integration: The WellDunn framework formalizes the incorporation of clinical definitions into mental health assessment systems, enabling a more accurate understanding of psychological conditions
Rule of Thumb Extraction and Contextualization: SAGE extracts clinical heuristics from mental health knowledge bases as rules of thumb, turning LLM agents into more empathetic and grounded agents.
Semantic Encoding and Decoding Optimization: The SEDO framework preserves nuanced psychological semantics when integrating expert knowledge into mental health assessment systems.
Process Knowledge-infused Learning demonstrates how therapeutic processes and intervention sequences can be incorporated into AI systems to provide ethically sound mental health support.
Knowledge Gaps Assessment enables LLMs to dynamically measure intrinsic contextual uncertainty during conversations, strategically resolving persona knowledge gaps through targeted questions rather than producing hallucinated responses when information is incomplete.
Handoff
Vector Symbolic Architectures
Vector Symbolic architectures in deep learning
Edward Raff
AAAI TUTORIAL NEUROSYMBOLIC AI FOR EGI| 24 FEB 2025
Innovation center, Washington, D.C.
Vector Symbolic Architectures
This document is confidential and intended solely for the client to whom it is addressed.
67
VSA operations
This document is confidential and intended solely for the client to whom it is addressed.
68
Holographic Reduced Representations: a primer
69
Booz Allen Hamilton
Consider a d = 3 dimensional space, where we wish to compute c † ⊗ (c ⊗ x), we will get the result that:
Extreme Multi-label classification (XML)
This document is confidential and intended solely for the client to whom it is addressed.
70
Eli Chien, Jiong Zhang, Cho-Jui Hsieh, Jyun-Yu Jiang, Wei-Cheng Chang, Olgica Milenkovic, and Hsiang-Fu Yu. 2023. PINA: leveraging side information in extreme multi-label classification via predicted instance neighborhood aggregation. In Proceedings of the 40th International Conference on Machine Learning (ICML'23), Vol. 202. JMLR.org, Article 224, 5616–5630.
A symbolic version of XML
This document is confidential and intended solely for the client to whom it is addressed.
71
Smaller models
This document is confidential and intended solely for the client to whom it is addressed.
72
Deploying Convolutional Networks on Untrusted Platforms Using 2D Holographic Reduced Representations
This document is confidential and intended solely for the client to whom it is addressed.
73
Connectionist Symbolic Pseudo Secrets (CSPS)
This document is confidential and intended solely for the client to whom it is addressed.
74
CSPS classification accuracy
This document is confidential and intended solely for the client to whom it is addressed.
75
Do you get any data saving?
This document is confidential and intended solely for the client to whom it is addressed.
76
Are we giving away any secrets?
This document is confidential and intended solely for the client to whom it is addressed.
77
Network’s Input
Network’s Output
What if the adversary was even smarter
This document is confidential and intended solely for the client to whom it is addressed.
78
Random Guess |
10% |
10% |
10% |
1% |
1% |
Adversary has access to some of the original images, can it learn the secret using projected gradient descent (PGD)?
WHAT ABOUT USING ADVERSARIAL ML?
Adversary has access to some of the original images, can it learn by training an Autoencoder?
WHAT ABOUT TRAINING AN AUTO-ENCODER?
Self Attention
81
Booz Allen Hamilton
Self Attention with HRRs
82
Booz Allen Hamilton
We can think of Self-Attention as a fuzzy dictionary. We are finding the match between query and key and returning the corresponding value – made fuzzy by averaging based on similarity.
So we can perform querying against this “dictionary”, using HRRs as an inductive biased toward key/value lookups!
Self Attention with HRRs: Implementation
83
Booz Allen Hamilton
Self Attention with HRRs: Noise?
84
Booz Allen Hamilton
Our self attention works without Gaussian IID coefficients, how? Consider the H dimensional vectors a, b, c, d, and z. If each element of all these vectors is sampled from N (0, 1/H), then we would expect that (a⊗ b+c⊗ d)⊤a† ≈ 1.Similarly, the value z is not present, so we expect that (a ⊗ b + c ⊗ d)⊤z† ≈ 0. Now let's pretend we have 2D data:
We can query for a + z get and get
Or we can do c + z and get:
In either case, the noise terms share many coefficents, and will result in similar magnitude noise. We can interpret this as an additional noise constant ε that we must add to each�noise term. Then when we apply the softmax operation, we obtain the benefit that the softmax function is invariant to constant shifts in the input, i.e., ∀ε ∈ R, softmax(x + ε) = softmax(x). Thus, our softmax effectively acts as a clean-up operation over the original values!�
Long Range Arena Results
85
Booz Allen Hamilton
Interpretability
Fast & Low Memory Training
Fast Predictions
Malware results
89
Booz Allen Hamilton
A Walsh Hadamard Derived Linear Vector Symbolic�Architecture
This document is confidential and intended solely for the client to whom it is addressed.
90
Properties of the HLB
This document is confidential and intended solely for the client to whom it is addressed.
91
Good at classical VSA tasks
This document is confidential and intended solely for the client to whom it is addressed.
92
Better at XML classification
This document is confidential and intended solely for the client to whom it is addressed.
93
Does better at CSPS
This document is confidential and intended solely for the client to whom it is addressed.
94
Questions?
95
Edward Raff
EdwardRaff.com
Raff_Edward@bah.com
We can use VSAs to create neuro-symbolic AI methods
Handoff
Grounding Blackbox Language Models with Retrieval Augmented Generation of Diverse Knowledge Form
Deepa Tilwani, �Phd Candidate �University of South Carolina
Introduction and Motivation (Part 1)
Progress in Language Modelling
Symbolic Era
Pre - 1990
Statistical Era
1990 - 2006
Scale Era
2006 onwards
Turing Test
1950
ELIZA
ChatGPT
1966
2022
ELIZA (1966) : THE FIRST CHATBOT
Early NLP program developed by Joseph Weizenbaum at MIT. Created illusion of a conversation by rephrasing user statements as questions using pattern matching and substitution methodology. One of the first programs capable of attempting the Turing test.
Try it out at https://web.njit.edu/~ronkowit/eliza.html
The LLM Era – How they work?
Word Embeddings
Represent each word using a “vector” of numbers.
Seq2Seq Models
Recurrent Neural Networks (RNNs)
● Long Short-Term Memory Networks (LSTMs)
● Capture dependencies between input tokens
● Gates control the flow of information
A simple RNN shown unrolled in time. Network layers are recalculated for each time step, while weights U, V and W are shared across all time steps.
Transformers
In encoding the word "it", one attention head is focusing most on "the animal", while another is focusing on "tired". The model's representation of the word "it" thus bakes in some of the representation of both "animal" and "tired".
https://jalammar.github.io/illustrated-transformer/
Pre-Training: Data Preparation
A typical data preparation pipeline for pre-training LLMs:
W. Zhao et al. A Survey of Large Language Models. 2023.
What LLMs Can do?
Evolution of LMs from Perspective of Task-Solving Capacity
W. Zhao et al. A Survey of Large Language Models. 2023.
Few-Shot Prompting
T. Brown et al. Language Models are Few-Shot Learners. NeurIPS 2020.
"Great product, 10/10": {"label": "positive"}
"Didn't work very well": {"label": "negative"}
"Super helpful, worth it": {"label": "positive"}
Instruction:
Classify the sentiment of the given text as either positive or negative based on the examples provided.
Few shots examples:
Input: "Amazing quality and fast shipping!"
LLM
Ideal Output:
{"label": "positive"}
Chain-of-Thought Prompting
Instruction:�Classify the sentiment of the given text as either positive or negative. Follow a step-by-step reasoning process to determine the sentiment.
Reasoning:
Output: {"label": "positive"}
LLM
Input: "Wow! This is fantastic quality and fast shipping!"
Examples:
J. Wei et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. NeurIPS 2022.
From Prompting to Fine-Tuning
Unlike prompting, fine-tuning actually changes the model under the hood, giving better domain- or task-specific performance.
https://x.com/karpathy/status/1655994367033884672
Fine-Tuning
Custom Trained Model in Law: Harvey AI
Open AI Customer Stories: Harvey. April 2024.
Parameter Efficient Fine-Tuning (PEFT)
Techniques like LoRA construct a low-rank parameterization for parameter efficiency during training. For inference, the model can be converted to its original weight parameterization to ensure unchanged inference speed.
E. Hu et al. LoRA: Low-Rank Adaptation of Large Language Models. ICLR 2022.
GPT-3 175B validation accuracy vs. number of trainable parameters of several adaptation methods on WikiSQL. LoRA exhibits better scalability and task performance.
Why the need for Trustworthiness in Generative AI?
Unreliable Reasoning Even On Simple Tasks
Probably due to tokenization!
Generated by gpt-4o’s tokenizer.
Try it out at:
Easy reasoning, Sure!
Got confused ??
Jailbreaking Can Bypass Safety
Jailbreaking is the process of altering prompts to evade an LLM’s safeguards, resulting in harmful outputs.
PAIR, influenced by social engineering attacks, involves an attacker LLM that autonomously generates jailbreaks for a targeted LLM. The attacker LLM repeatedly interacts with the target LLM, refining and improving a jailbreak—often within twenty queries.
P. Chao et al. Jailbreaking Black Box Large Language Models in Twenty Queries. 2023.
The Story of a Lawyer Who Employed ChatGPT … trust issues remain
A lawyer, representing a client against an airline, turned to AI assistance for drafting legal documents. The results were less than ideal. https://www.nytimes.com/2023/05/27/nyregion/avianca-airline-lawsuit-chatgpt.html
Legal Consequences for Attorneys Using ChatGPT
Lawyer Acknowledges AI Misuse in Court: During court session, an attorney admitted excessively relying on AI, resulting in a legal motion filled with artificial legal references. https://www.nytimes.com/2023/06/08/nyregion/lawyer-chatgpt-sanctions.html
Neuro Symbolic Legal AI
Open AI Customer Stories: Harvey. April 2024.
Neuro Symbolic Legal AI
Open AI Customer Stories: Harvey. April 2024.
Neuro Symbolic Legal AI
Open AI Customer Stories: Harvey. April 2024.
Nearly Impossible to Explain or Reason Generative Answers
Prompt Injections can leak data
Context Windows are and will remain limited
Bias in Large Language Models that Supervised Learning cannot reduce
Reliability Issues: Different Large Language Models Yield Different Outcomes
Inconsistency in Prompts for Completeness in Outcomes
More Challenges for the Generative AI
Grounding (Part 2)
Grounding
Grounding is defined as ensuring every claim in an LLM response generate verifiable and well-grounded responses to any prompt, relying solely on information from a user-specified knowledge base.
Grounded means that every claim in the response is *attributable to a document in the knowledge base
Verifiably-grounded means that every claim is backed by an appropriate citation
Knowledge base may be a private corpus, a public domain, entire Web
E.g., a healthcare customer may specify a set of journals they trust
Two Core Approaches to Grounded AI
Grounded Generation – Enhancing AI with Verified Knowledge
Method:
Grounding Verification – Ensuring AI’s Responses Are Factually Correct
Method:
123
Why Grounded Generation ?
LLAMA
“Grounded generation retrieves latest clinical guidelines and provides an evidence-based response”
Not grounded but a generic answer!
Why Grounding Verification ?
INPUT:�What is the target blood pressure for men?
Not according to 2017 guidelines
It should first verify who the intended audience is before ensuring factual accuracy.
Types of Grounding in AI & LLMs
AI must retrieve, recognize, and structure information correctly before using it (i.e. LLMs should understand and link symbols like words, phrases, numbers to their real-world meanings)
2) Functional Grounding
LLMs should reason, verify, and adapt responses based on context (i.e apply it correctly in context)
Symbolic Grounding
LLMs lack knowledge beyond their training date, and frequent model updates are impractical.
Idea: Enhance LLMs with a retrieval system!
Lewis, Patrick, et al. "Retrieval-augmented generation for knowledge-intensive nlp tasks." Advances in Neural Information Processing Systems 33 (2020): 9459-9474.
Advantages of RAG
Fact-Checking
Safe
Custom Train
Cost-Effective
Continuous Update
Accessible and Affordable
Domain Knowledge
Easier to Customize
Symbolic Retrieval based Grounding
Source Attribution: Retrieve, recognize, and attribute
Grounded and targeted for generating citations with structured metadata
Image: Tilwani, Deepa, et al. "REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs." arXiv preprint arXiv:2405.02228 (2024).
An Evaluation study of Citation Generation on Recent LLMs
How we do Symbolic Retrieval based Grounding ?
But Few Limitations of RAG..
Needs Existing Database
Context Length
Limitation
Latency Issues
Dependent on
Semantic Search
Hallucination still exists
At scale, sensitive to choices of:
1) Chunking Strategy,
2) Embedding Model, and
3) Generation Model.
Flaws in RAG from REASONS Dataset
Only Adv. RAG was able to correctly generate author names
Latency Issues
Domain | OpenAI | M | L | D | RM | RL | P | AdvRAG(L) | AdvRAG(M) |
AI | 34:25 | 26:03 | 11:10 | 34:11 | 74:49 | 73:09 | 34:31 | 156:24 | 163:28 |
CV | 47:45 | 18:35 | 19:24 | 50:22 | 189:20 | 198:45 | 42:05 | 259:32 | 302:14 |
Cryptography | 03:50 | 02:18 | 04:59 | 32:21 | 83:28 | 89:21 | 13:23 | 190:19 | 194:25 |
Graphics | 07:08 | 08:55 | 06:08 | 58:43 | 108:08 | 127:48 | 16:52 | 214:25 | 227:23 |
HCI | 03:01 | 01:10 | 00:42 | 21:56 | 48:32 | 50:51 | 02:47 | 95:56 | 98:44 |
IR | 20:31 | 11:40 | 06:52 | 33:34 | 91:30 | 99:43 | 19:50 | 193:37 | 202:23 |
NLP | 28:26 | 11:42 | 05:09 | 47:24 | 91:07 | 88:40 | 13:06 | 175:58 | 156:49 |
2. Knowledge Graphs (KG) Based Grounding
Speer et al. AAAI’17
Vrandečić et al. ACM Comm’14
Gaur et al. ICSC’19
Miller, ACM Comm’95
ConceptNet
World War I fought_with Poisonous Gas
Subject
Predicate
Object
“KG-based grounding structures information in graphs, linking concepts to improve AI systems' ability to retrieve and generate meaningful responses.”
Illustration of ISEEQ
Sentence BERT Encoder
Sentence BERT Encoder
1. What is gross_domestic_product?
2. What is the measure of gross_domestic product?
3. What is the reason nation income relations gross_domestic_product?
4. What is the influence of inflation to gross_domestic_product?
5. What is the meaning of unemployment in inflation?
6. What is the influence of inflation on cost_of_living?
Query
Title: Economy and Employment Statistics
Description: Learn Information about key economic concepts including gdp, inflation, and the influence on employment
Constituency Parsing
Information + { economy, employment statistics, employment, influence employment, inflation influence employment, gdp, gdp influence employment, key economic concepts}
economics
economy
inflation
employment
gdp
gross domestic product
unemployment
gnp
gross national product
national income
cost of living
income
personal income
income tax
ConceptNet Graph for Semantic Query Expansion
ISQ by Generative Adversarial Reinforcement Learning
(A) An example of curiosity-driven ISQs generated by
ISEEQ. (B) overview of ISEEQ
Gaur, M., Gunaratna, K., Srinivasan, V., & Jin, H. (2022). ISEEQ: Information Seeking Question Generation Using Dynamic Meta-Information Retrieval and Knowledge Graphs. AAAI 2022
Functional Grounding
Helps to work on :
A fact-checking approach for cross-referencing news claims with verified sources before publishing.
Evaluating attribution and identifying specific types of
errors with AttrScore. We explore two approaches in AttrScore: (1) prompting LLMs, and (2) fine-tuning LMs on simulated and repurposed datasets from related tasks
Attribution-Based Functional Grounding
Yue, Xiang et al. “Automatic Evaluation of Attribution by Large Language Models.” EMNLP (2023).
Reinforcement Learning for Grounding
What constitutes a good response for a query and context is quite nuanced?
Idea: Capture this using a reward model that scores each <query, context, response> on the appropriateness of the response. The model may be trained on a dataset specifying preferences between response pairs
We can then use reinforcement learning to tune the model to maximize reward while staying within a bounded KL-divergence from the initial model.
References:
Interactive & Reinforcement-Based Grounding
Interactive & Reinforcement-Based Grounding ensures that LLMs do not just generate blindly but engage in a feedback-driven, iterative process to reason, verify, and adapt responses based on context.
The code-generating language models as an actor network, and introduce a critic network that is trained to predict the functional correctness of generated programs and provide dense feedback signals to the actor.
Le, H., Wang, Y., Gotmare, A. D., Savarese, S., & Hoi, S. C. H. Coderl: Mastering code generation through pretrained models and deep reinforcement learning. Neurips 2022
Check out our Dataset for Interactive & Reinforcement-Based Grounding at AAAI 2025
POSTER PRESENTATION ON 28TH FEB
Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation
Seyedreza Mohseni, Seyedali Mohammadi, Deepa Tilwani, Yash Saxena, Gerald Ketu Ndawula, Sriram Vema, Edward Raff, Manas Gaur
Grounding Verification
Despite progress in generating grounded responses, post-hoc verification of generated responses is still indispensable
144
Symbolic and Functional Grounding Together
LLAMA
Domain Knowledge: PHQ9 Depression ontology
LLAMA + Domain Knowledge Output
S. Dalal, D. Tilwani, M. Gaur, S. Jain, V. L. Shalin and A. P. Sheth, "A Cross Attention Approach to Diagnostic Explainability Using Clinical Practice Guidelines for Depression," in IEEE Journal of Biomedical and Health Informatics
“Grounded generation retrieves latest clinical guidelines and provides an evidence-based response”
“Knowledge Graphs (symbolic grounding) and adapting to domain (functional grounding)”
How to do Symbolic and Functional Grounding Together ?
“Grounded generation retrieves latest clinical guidelines and provides an evidence-based response”
Original Text:
Why do i have sudden bursts of depression know the title probably doesn't make sense but stopped working for a while to peruse business idea had which failed and now i'm about go back into work force only 19 these moments where just feel lost like my family friends as is what dedicated life past 6 months most that time was me sitting in room trying get it off ground floor. really nervous getting job again haven't real one entire am overthinking or will be not bad think.
Self Attention Text (No Highlighting)
(Don’t know Why?)
Why do i have sudden bursts of depression know the title probably doesn't make sense but stopped working for a while to peruse business idea had which failed and now i'm about go back into work force only 19 these moments where just feel lost like my family friends as is what dedicated life past 6 months most that time was me sitting in room trying get it off ground floor. really nervous getting job again haven't real one entire am overthinking or will be not bad think. �
Attention Over PHQ 1:How often have you been bothered by little interest or pleasure in doing things? (No Highlighting)
Why do i have sudden bursts of depression know the title probably doesn't make sense but stopped working for a while to peruse business idea had which failed and now i'm about go back into work force only 19 these moments where just feel lost like my family friends as is what dedicated life past 6 months most that time was me sitting in room trying get it off ground floor. really nervous getting job again haven't real one entire am overthinking or will be not bad think.
Why do i have sudden bursts of depression know the title probably doesn't make sense but stopped working for a while to peruse business idea had which failed and now i'm about go back into work force only 19 these moments where just feel lost like my family friends as is what dedicated life past 6 months most that time was me sitting in room trying get it off ground floor. really nervous getting job again haven't real one entire am overthinking or will be not bad think.
Attention Over PHQ 2 : How often are you bothered by feeling down, depressed, or hopeless?
Attention Over PHQ 9: How often have you been bothered by thoughts that you would be better off dead or of hurting yourself in some way ?
Why do i have sudden bursts of depression know the title probably doesn't make sense but stopped working for a while to peruse business idea had which failed and now i'm about go back into work force only 19 these moments where just feel lost like my family friends as is what dedicated life past 6 months most that time was me sitting in room trying get it off ground floor. really nervous getting job again haven't real one entire am overthinking or will be not bad think.
PHQ-1, PHQ-5, and PHQ-6 are unanswered questions. These are the relevant questions to be asked.
Similarity score between phrases highlighted in Self-Attention and PHQ-9 questions.
(equal attention, confused, and Unexplainable)
Cumulative Cross-Attention Scores
(PHQ-9 infusion explains model attention)
Check Grounding API [Google Cloud]
Check Grounding determines how grounded a given response is in a given set of facts (context)
Returns:
Based on custom NLI model
Generally available at: https://cloud.google.com/generative-\ai-app-builder/docs/check-grounding
Open Questions..
Handoff
TH10: Neurosymbolic AI for EGI:
Explainable and Grounded Generations
Feb 25th 25
Ali Mohammadi, M294@umbc.edu
Ph.D. student at UMBC
University of Maryland, Baltimore County (UMBC), Knowledge Infused AI and Inference (KAI2) Lab
Why NeuroSymbolic explainable AI?
Safety in High-Stake Application
Alignment with Human Values
User Adoption and Confidence
Debugging and Improving Models
Ethical and Fair AI
Trust and Transparency
Key Focus Areas
Large Language Models (LLMs)
Explainability
Wellness Dimension
External Knowledge
Wellness Dimension Datasets
6 wellness dimensions:
Explanation and Prediction
Fine-tuned LMs
Fine-tuned/Prompting LLMS
on WD Datasets
The fall semester was one of the worst experiences of my life, and I barely passed my four classes.
Textual Post
Explanation & Label (Expected/Predicted)
Intellectual and Vocational Aspect
fall semester was one of the worst experiences
barely passed
Wellness dimension sample
External Knowledge
External Knowledge
Findings:
General task (e.g. e-snli)
Domain specific task (e.g. WellXplain and HateXplain)
Mohammadi, S., Raff, E., Malekar, J., Palit, V., Ferraro, F., & Gaur, M. (2024). WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions. In Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 364–388, Miami, Florida, US. ACL.
Instruction
Post: They make me feel unhappy and miserable (SpEA). What should I do?
Output:
SpEA (PA:0, IVA:0, SA:0, SpEA:1)
Explanation: unhappy, miserable
WELLXPLAIN Training Examples
Post: My mum, dad and step-mum (SA) won't leave me alone and they constantly make choices for me and it's starting to get to me.
Output: SA(PA:0, IVA:0, SA:1, SpEA:0)
Explanation: My mum, dad, step-mum
Evaluation
WELLXPLAIN Test Examples
In
Out
In
WellDunn Benchmarking
Encoder 3
Encoder 2
Encoder 1
Decoder 1
Decoder 2
Decoder 3
Fine-tuned LM
Tokenizer
…
…
…
Robustness
Explainability
SCE
GL
SVD
AO
FFNN
Prediction and Explanation
The fall semester was one of the worst experiences of my life, and I barely passed my four classes.
Mohammadi, S., Raff, E., Malekar, J., Palit, V., Ferraro, F., & Gaur, M. (2024). WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions. In Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 364–388, Miami, Florida, US. ACL.
[Ground Truth Explanation]
[Generated Explanation]
The fall semester was one of the worst experiences of my life, and I barely passed my four classes.
SCE vs GL attention (Post 1)
I
I don’t cry anymore. want to be around anyone, do anything. Work keeps me getting up everyday. Without it would probably stare at my ceiling until passed back out again m so tired know if there is a question in this, There just isn else tell.
I don't cry anymore. want to be around anyone do anything Work keeps me getting up everyday Without it would probably stare at my ceiling until passed back out again so tired know if there is a question in this, There just isn't else tell.
With SCE Loss:
With GL:
Future Directions
Developing a Transparent Classifier Rooted in Clinical Understanding – Addressing the disparities between prediction accuracy and attention.
Improving Attention Alignment with Ground Truth – Enhancing attention explanations to better reflect actual outcomes.
Exploring Different Prompting and Retrieval-Augmented Generation (RAG) Strategies – Testing alternative methods to improve LLM performance.
Developing a Suitable Dataset for Mental Health Applications – Curating knowledge and constructing a well-suited dataset for retrieval-augmented methods.
Wrap up!
Large Language Models (LLMs)
Explainability
Wellness Dimension
External Knowledge
Reference
openCHA:
Building Explainable
and Personalized Conversational Agent
Iman Azimi, PhD
February 25, 2025
Healthcare chatbots or Conversational Health Agents
Chatbots have the potential to play a crucial role in healthcare: assisting patients and healthcare providers:
176
Bedi, Suhana, et al. "A Systematic Review of Testing and Evaluation of Healthcare Applications of Large Language Models (LLMs)." medRxiv (2024): 2024-04.
Why are healthcare chatbots not widely used?
Existing chatbots are not able to provide:
177
Abbasian, M., Khatibi, E., Azimi, I., Oniani, D., Shakeri Hossein Abad, Z., Thieme, A., ... & Rahmani, A. M. (2024). Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI. npj Digital Medicine, 7(1), 82.
openCHA (Conversational Health Agents)
A holistic LLM-powered framework to integrate health data, knowledge, and analytical tools into healthcare chatbots.
178
openCHA framework
179
Interface
Acts as a bridge between the users and agents
180
Orchestrator
Responsible for problem solving, decision making, and response generation
181
External sources
Obtain essential information from the broader world
182
Demo:
Nutrition causal effects
Tasks involved:
183
Z. Yang, E. Khatibi, N. Nagesh,, M. Abbasian, I. Azimi, R. Jain, and A. Rahmani, “ChatDiet: Empowering Personalized Nutrition-Oriented Food Recommender Chatbots through an LLM-Augmented Framework,” Elsevier Smart Health, IEEE/ACM CHASE, 2024
Patient health record reporting
(1)
Tasks involved:
184
Dataset: S. Labbaf, et al. "Physiological and Emotional Assessment of College Students Using Wearable and Mobile Devices During the 2020 Covid-19 Lockdown: An Intensive, Longitudinal Dataset." Longitudinal Dataset (2023).
Patient health record reporting
(2)
Tasks involved:
185
Dataset: S. Labbaf, et al. "Physiological and Emotional Assessment of College Students Using Wearable and Mobile Devices During the 2020 Covid-19 Lockdown: An Intensive, Longitudinal Dataset." Longitudinal Dataset (2023).
Objective stress level estimation
Tasks involved:
186
Dataset: S. Labbaf, et al. "Physiological and Emotional Assessment of College Students Using Wearable and Mobile Devices During the 2020 Covid-19 Lockdown: An Intensive, Longitudinal Dataset." Longitudinal Dataset (2023).
Use cases
187
Future directions
We are looking for contribution from diverse communities: to contribute their ideas and connect their tools to CHA, leading to more precise user responses.
188
Thank You
Questions?
More info about openCHA:
GitHub repository:
github.com/Institute4FutureHealth/CHA
User guide and quick start:
Should you be interested, please reach out to me at
189
Slides Available :
Thanks! Questions?