1 of 83

Neuro-Symbolic AI for Deep Analysis of Social Media Big Data

Vedant Khandelwal, Manas Gaur, Ugur Kursuncu, Valerie L. Shalin, Amit P. Sheth

Check Tutorial site for latest information: https://aiisc.ai/smbd24/

2 of 83

2

Presenters

Vedant Khandelwal LinkedIn vedant@email.sc.edu

Phd Student @

AI Institute of South Carolina

Manas Gaur Web LinkedIn manas@umbc.edu

Assistant Professor @

Dept. CSE, UMBC

Ugur Kursuncu Web LinkedIn ugur@gsu.edu

Assistant Professor @

Institute for Insight, GSU

Amit Sheth Web LinkedIn amit@sc.edu

NCR Chair & Professor; Founding Director, AI Institute of South Carolina

3 of 83

3

Content

  • Foundations
    • Why Social Media Data
    • LLMs and its challenges
    • NeuroSymbolic Approach
    • Types of Knowledge Infused Learning and its advantages
  • Neurosymbolic Approach for an health application
    • Use-Case COVID-19
  • Hands-on Session

4 of 83

4

Content

  • Foundations
    • Why Social Media Data
    • LLMs and its challenges
    • NeuroSymbolic Approach
    • Types of Knowledge Infused Learning and its advantages
  • Neurosymbolic Approach for an health application
    • Use-Case COVID-19
  • Hands-on Session

5 of 83

Why Social Media Data?

5

6 of 83

6

The youngest adults stand out in their social media consumption

88% of 18- to 29-year-olds indicate that they use any form of social media.

By Pew Research Center “Social Media Use Report 2018

5.2 Billion

Users

7 of 83

7

“Information that comes directly from consumers,

often via social media, is deemed more helpful than data

from reports or government research.”

Insights from Social Media, How useful?

8 of 83

8

Contexts where Social Media Matters: the Good & the Bad

A spectrum to demonstrate the good and the bad on social media.

  1. Kursuncu, U., Purohit, H., Agarwal, N., & Sheth, A. (2021). When the Bad is Good and the Good is Bad: Understanding Cyber Social Health through Online Behavioral Change. IEEE Internet Computing, 25(01), 6-11.

Marketing

Understanding & Predicting Consumer Behavior

Monitoring Opioid Usage

Extremism

Illicit Drugs

Disinformation

Harassment

More Good

More Bad

More Good

9 of 83

  • 1,213,046 death in the U.S.*
    • 103,436,829 positive cases
  • 7,077,725 death globally**.
    • 776,973,432 positive cases

Multiple lockdowns, guidance for staying at home, social distancing, accelerated the use of technology, including social media.

9

COVID-19; Sudden Emergence leads to Rapid Adaptation

10 of 83

  • Early detection of pandemics and outbreaks
    • Social media data analysis during the COVID-19 pandemic led to early outbreak predictions [1].
  • Pandemic leading to mental health crisis?
  • Real-time detection of mental health crises (e.g., depression, anxiety).
  • Awareness and resource allocation to respond emerging mental health issues.

10

COVID-19, Public Health and Social Media

  1. Shi, B., Huang, W., Dang, Y., & Zhou, W. (2024). Leveraging social media data for pandemic detection and prediction. Humanities and Social Sciences Communications, 11(1), 1-18.

11 of 83

Prevalence in Mental Health Issues & Online Toxicity

11

51%

↑15 YoY

Teens experienced some form of Online Harassment [1]

52%

↑12 YoY

Online Harassment

Ever Experienced among American Adults [1]

37%

↑10 YoY

Severe Online Harassment, Sexual, physical threats, swatting, doxing and sustained harassment [1]

  1. Anti-Defamation League (ADL), Online Hate and Harassment Report: The American Experience 2023. https://www.adl.org/resources/report/online-hate-and-harassment-american-experience-2023
  2. Pew Research Center, Parenting in America Today, 2023. https://www.pewresearch.org/social-trends/2023/01/24/parenting-in-america-today/
  3. CDC Adolescent Behaviors and Experiences Survey (ABES), 2021. https://www.cdc.gov/abes/index.html
  4. Forsberg, J. T., & Thorvaldsen, S. (2022). The severe impact of the COVID-19 pandemic on bullying victimization, mental health indicators and quality of life. Scientific reports.

40%

Children struggling with anxiety or depression, reported by parents [2]

37% of U.S. adolescents had regular mental health struggles during COVID-19 pandemic [3].

“... increased prevalence in bullying, more mental health problems and significantly reduced quality of life compared to before the pandemic” [4]

12 of 83

A Social Media Data Concern: Content Quality

  • Social data often contain noise and irrelevant content:
    • Semantic filtering in preprocessing�
  • Distinguish genuine mental health indicators in data from unrelated or satirical posts.
    • Contextual understanding through domain specific knowledge graphs in model learning.

Challenge; content moderation on social media platforms for problematic content or gain awareness about potential mental health crisis.

12

13 of 83

Increasing Reliance on AI -Content Moderation

13

13

  • The abundance of online big (social) data enabled recent breakthroughs in AI
    • Limitations: Bias and toxicity have seeped into models�
  • Detection and Countering challenging
    • Ambiguity, Subjectivity, Context�
  • Impact of greater reliance on AI
    • AI may reinforce existing social biases; �racial, gender, sexual orientation.
    • Prohibitive Adverse Implications

14 of 83

14

5% of all Google searches are health-related.

Source: https://googleblog.blogspot.com/2015/02/health-info-knowledge-graph.html

Healthcare data will experience a compound annual growth rate (CAGR) of 36% through 2025.

Source: https://healthitanalytics.com/news/big-data-to-see-explosive-growth-challenging-healthcare-organizations

FDA Sets Goals for Big Data, Clinical Trials, Artificial Intelligence.

Source: https://healthitanalytics.com/news/fda-sets-goals-for-big-data-clinical-trials-artificial-intelligence

15 of 83

15

16 of 83

Information is cheap. Understanding is expensive.

Karl Fast,

Professor of UX Design,�Kent State University

16

AI is about converting data into knowledge, insights and actions.

17 of 83

Challenges with Current LLMs

17

18 of 83

Explainability for People, not just Designers and Developers

18

LLAMA

NeuroSymbolic AI

Domain Knowledge: PHQ 9

LLAMA + Domain Knowledge Output

Dalal, S., Tilwani, D., Gaur, M., Jain, S., Shalin, V., & Seth, A. (2023). A Cross Attention Approach to Diagnostic Explainability using Clinical Practice Guidelines for Depression. arXiv preprint arXiv:2311.13852.

19 of 83

Knowledge-Verified Prediction via Linking to KGs

19

Really struggling with my bisexuality which is causing chaos in my relationship with a girl. I am equal to worthless for her. I’m now starting to get drunk because I can’t cope with the obsessive, intrusive thoughts, and need to get out of my head.

288291000119102: High risk bisexual behavior

365949003: Health-related behavior finding

365949003: Health-related behavior finding

307077003: Feeling hopeless

365107007: level of mood

225445003: Intrusive thoughts

55956009: Disturbance in content of thought

26628009: Disturbance in thinking

1376001: Obsessive compulsive personality disorder

Multi-hop traversal on medical knowledge graphs

<is symptom>

Obsessive-compulsive disorder is a disorder in which people have obsessive, intrusive thoughts, ideas or sensations that make them feel driven to do something repetitively

Gaur, M., Desai, A., Faldu, K., & Sheth, A. (2020). Explainable ai using knowledge graphs. In ACM CoDS-COMAD Conference. Link, slide.

Rawte, V., Chakraborty, M., Roy, K., Gaur, M., Faldu, K., Kikani, P., ... & Sheth, A. P. TDLR: Top Semantic-Down Syntactic Language Representation. In NeurIPS'22 Workshop on All Things Attention: Bridging Different Perspectives on Attention., link

20 of 83

Knowledge-Verified Prediction via Process KGs Structures

20

Process Knowledge Structure in C-SSRS

C-SSRS: Columbia Suicide Severity Rating Scale

I wish I could give a shit about what would make it to the front page. I have been there and got nothing. Same as my life. I do have a gun.’, ’I thought I was talking about it. I am not on a ledge or something, but I do have my gun in my lap.’, ’No. I made sure she got an education and she knows how to get a job. I also have recently bought her clothes to make her more attractive. She has told me she only loves me because I buy her things.

1. Wish to be dead - Yes

2. Non-specific Active Suicidal Thoughts - Yes

3. Active Suicidal Ideation with Some Intent to Act - Yes

4. Label: Suicide Behavior or Attempt

Interpretable for System Users i.e., Clinicians and Patients

(1,2,3 verify adherence to the clinical guideline on diagnosis which a clinician understands)

47%

70%

LLMs

Process Knowledge (Ours)

Agreement with Experts

Sheth, A., Gaur, M., Roy, K., Venkataraman, R., & Khandelwal, V. (2022). Process Knowledge-Infused AI: Toward User-Level Explainability, Interpretability, and Safety. IEEE Internet Computing, 26(5), 76-84., link

21 of 83

Generative AI has Significant Potential for Harm!

21

22 of 83

Recent Case of Character.ai

22

https://apnews.com/article/chatbot-ai-lawsuit-suicide-teen-artificial-intelligence-9d48adc572100822fdbc3c90d1456bd0

23 of 83

23

24 of 83

How Current Language Models Work

24

What is Mark Zuckerberg’s net worth?

Did you mean: net worth

Did you mean: salary

Did you mean: rich for

net worth:

0.00567%

net worth

salary

rich for

Language Models Predict based on Context-Specific Distributional Mappings

Prediction Context

25 of 83

25

26 of 83

26

Longer list of Failures …

LLM

Limited accuracy in complex decision-support requests

ChatGPT showed only 56% accuracy in medical queries (Wei et al., 2023), raising concerns about trustworthiness in clinical use.

Lack of domain-specific expertise

General-purpose LLMs struggle with specialized medical knowledge, leading to errors in diagnosis and treatment recommendations (Szymanski et al., 2024).

Inability to handle & Follow guidelines

LLMs often rely on outdated or incomplete information, failing to incorporate the latest medical research or evolving clinical guidelines (Sheth et al., 2024).

Potential for generating harmful or biased content

LLMs can provide inaccurate or harmful suggestions, particularly if the input data is biased or not representative of diverse patient populations (Gupta et al., 2023).

For Decision-Support Assistance

? Data - Why train on Voluminous open web data?

? Knowledge

  • Representing Domain-Specific Information
  • Representing Relevant Facts about the World
  • Representing Domain-relevant Decision Processes, incl. societal/professional value (laws, rules, guidelines, protocols)

? Human Expertise

- How to Ensure Knowledge and data are Leveraged correctly?

27 of 83

Role of Knowledge in understanding content and deeper analysis through Neurosymbolic AI

27

28 of 83

28

Symbolic AI Statistical AI Neuro-symbolic AI

Where are we in AI Evolution now?

29 of 83

29

Knowledge Graph (Labeled Nodes and Edges)

NeuroSymbolic Reasoning

System 2

Neural Network and Deep Learning

Decisions/Actions

System 1

Low-level Data

Sensors, Text, Image, and Collection

Symbolic Explicit Knowledge Representation

Neural Implicit/Parametric Knowledge Representations

Expert Human

Amit Sheth, Kaushik Roy, Manas Gaur, Neurosymbolic Artificial Intelligence (Why, What, and How), IEEE Intelligent Systems, 38 (3), May-June 2023

NeuroSymbolic AI

30 of 83

30

Knowledge and

Experience

System 1:

Perception:

DL/Neural AI

System 2:

Cognition:

Symbolic AI

Data to Concepts,

Abstractions, Understanding

NeuroSymbolic: System 1 (Neuro) + System 2 (Symbolic)

31 of 83

31

Knowledge and

Experience

System 1:

Perception:

DL/Neural AI

System 2:

Cognition:

Symbolic AI

Data to Concepts,

Abstractions, Understanding

Natural Language (NL)-Processing (P) to NL-U (Understanding)

32 of 83

32

Neural Network

Abstract / Contextualization

ACT

DECIDE

reasoning

Planning

Inference

Apply Process Knowledge: User has Specific concerns due to X, Y, Z Concepts

Action:

Further Interact with System User on their concerns

Explicit Knowledge

Data

Contextualization

is at the heart of

understanding

Natural Language (NL)-Processing (P) to NL-U (Understanding)

33 of 83

33

From NLP to NLU: Deeper understanding of content

34 of 83

34

Neurosymbolic Customized and Compact (NeSy-CC) Copilots

A Granular Look at The Features of a NeSy-CC Systems

35 of 83

35

Shallow Infusion

Sheth, Gaur, Kursuncu, & Wickramarachchi, (2019). Shades of knowledge-infused learning for enhancing deep learning. IEEE Internet Computing, 23(6), 54-63., link

Machine Learning Model

How well the model learned the task?

Shapley plots on Feature Importances or Dependencies

PTt = tth topic/phrase extracted from free form input text

KSc = cth concept in a knowledge source ( graph, base, ontology, and/or lexicon

Mapping

36 of 83

36

Semi-Deep Infusion

Sheth, Gaur, Kursuncu, & Wickramarachchi, (2019). Shades of knowledge-infused learning for enhancing deep learning. IEEE Internet Computing, 23(6), 54-63., link

Machine Learning Model

37 of 83

37

Deep Infusion

Sheth, Gaur, Kursuncu, & Wickramarachchi, (2019). Shades of knowledge-infused learning for enhancing deep learning. IEEE Internet Computing, 23(6), 54-63., link

Backpropagation

Connector acting like a toggle switch

38 of 83

38

Content

  • Foundations
    • Why Social Media Data
    • LLMs and its challenges
    • NeuroSymbolic Approach
    • Types of Knowledge Infused Learning and its advantages
  • Neurosymbolic Approach for an health application
    • Use-Case COVID-19
  • Hands-on Session

39 of 83

COVID-19 Use Case

39

40 of 83

As of December 14th, 2024

  • ~7M deaths and >704M confirmed cases globally
  • ~1.2M death with >111M confirmed cases in the US
  • Massive, once in a century societal impact on health, economy & well-being

40

Massive impact of pandemic on health and society

Photo: The European Society for Medical Oncology

41 of 83

Impact on mental Health

    • Mental Health: Depression, Anxiety
    • Addiction: Substance use/abuse

41

Massive impact of pandemic on health and society

Photo: unsplash.com

Source: https://www.statista.com/statistics/1241055/us-adults-mental-health-changes-covid-vs-last-ten-years-by-gender/

42 of 83

Social media reveals impact

Photo: American Psychological Association

"All the things are being shut down by #Covid19 but my anxiety & depression 🙁"

"A feeling of hopelessness. Seems I am in a dark age. #coronavirus #COVID19"

“I drive the streets of #LA looking 4 my #Homeless kids,drug & alcohol #addicted Often, I find them emaciated & delusional.”

“i blame my parents for manipulating me into thinkin i’m nothing without them and i blame myself for believing it >:| #abusiveparents”

Mental Health

Addiction

43 of 83

43

  • Twitter Data: 12 billion tweets analyzed, capturing public sentiment and mental health signals during the COVID-19 pandemic. ​
  • Subreddit Data: 2.5 million subreddit posts offering deep community-based insights on mental health topics like depression and anxiety. ​
  • News Articles: 700,000 COVID-19-related articles providing a broader societal and policy context. ​
  • Knowledge Graphs: A combination of domain-specific resources like DSM-5 and Drug Abuse Ontology (DAO) and general-purpose graphs like DBpedia and Wikidata. ​
  • Neologisms: Captured emerging terms such as “Zoom fatigue” and “coronasomnia” from social media, enabling real-time adaptation to evolving language trends.​

The Massive Social Media Corpus

44 of 83

44

Technical Approach Overview

Vedant Khandelwal, Manas Gaur, Ugur Kursuncu, Valerie Shalin, and Amit Sheth. "A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19." In Proceedings of the IEEE International Conference on Big Data, 2024

45 of 83

Technical Approach Overview

45

46 of 83

Domain-Specific Topic and Language Modelling

46

  • Topics describing each subreddits are identified through:
    • Skip Gram model to generate n-grams
    • LDA over subreddits
    • LDA over bigrams of subreddits
  • Relevant topics were identified constraining through Topic Coherence measure.
  • We utilize UCI topic coherence model which is Pointwise Mutual Information.
  • Sub-reddit language model is trained through:
    • Skip Gram model to generate n-grams
    • Word2Vec over ngrams of subreddits

47 of 83

Some of the Topics Identified after LDA

47

Anxiety

Depression, Cognitive distortions, panic attacks, hopelessness, physical sensations.

Depression

Mood swings, weight gain, rapid cycling, depressive episode, Impulsivity, mood swings, antisocial conduct, personality disorder

Addiction

Buying oxycodone, pain management, chronic pain, alienation, crippling alcohol, dependent on crack

48 of 83

DSM-5: Background

48

2013, 5th Edition Diagnostic and Statistical Manual of Mental Disorders (DSM-5) is a psychiatric bible that can cure 46.4% of adult US population suffering from Mental Illness.

There are 21 Diagnostic categories of which 20 are specific to Mental Health

49 of 83

DSM-5 Catalog

49

Neurodevelopmental Disorders

Schizophrenia Spectrum

Psychotic Disorders

Bipolar and Related Disorders

Depressive Disorders

Anxiety Disorders

Obsessive-Compulsive and Related Disorders

Trauma- and Stressor-Related Disorders

Dissociative Disorders

Sleep-Wake Disorders

Feeding and Eating Disorders

Elimination Disorders

Suicidal Behavior/Ideation Disorders

Sexual Dysfunctions

Gender Dysphoria

Disruptive Impulse Control and Conduct Disorders

Substance Use and Addictive Disorders

Neurocognitive Disorders

Personality Disorders

Paraphilic Disorders

50 of 83

DAO: Drug Abuse Ontology

50

Conceptual framework interconnecting sets of Drug-focused and Health-related concepts.

The advantage of DAO is that it is not limited to medical terminology, but also includes commonly used lay and slang terms for mental health conditions and associated symptoms.

Concept

315

Relations

31

Instances

814

Drug Abuse Ontology

Lokala, Usha, Raminta Daniulaityte, Francois Lamy, Manas Gaur, Krishnaprasad Thirunarayan, Ugur Kursuncu, and Amit P. Sheth. "DAO: An Ontology for Substance Use Epidemiology on Social Media and Dark Web." JMIR Public Health and Surveillance (2020).

51 of 83

Content Enrichment

  • Mental Health - Drug Abuse (MHDA) Knowledge Base :
    • It is obtained by aggregating mental health and drug abuse related entities from PHQ-9, SNOMED-CT, DSM-5, DAO, MeSH Terms.�
  • Entities:
    • Entities are extracted from data sources.
    • Candidate entities are filtered using Knowledge bases
    • Further filtered set of entities are used to enrich lexicon categories.

51

Candidate Entities

Enriched Lexicon

Domain Knowledge

DAO

DSM-5

52 of 83

Neologisms

52

  • The system captures emerging terms like "coronapocalypse" and "Zoom fatigue," reflecting shifts in public discourse during key COVID-19 milestones. ​
  • These neologisms, derived from semantic filtering, enhance contextual understanding, ensuring the model remains relevant to evolving societal language trends.​

53 of 83

Content Enrichment

  • Semantic Filtering
    • Removal of irrelevant noisy data
    • Finding mapping between Lexicons and Tweets Phrases

  • Location Extraction
    • Obtained data about US states, county, city and alias information from OpenStreetMap, data.gov.us and Geonames Ontology.
    • Filter tweets based on the location metadata

53

54 of 83

Technical Approach Overview

54

55 of 83

55

Semantic Proximity: alignment with MHDA-Kb.

    • Removal of ambiguity.
    • Example: “palpitations and social anxiety are killing me” → “anxiety is killing me”

Semantic Mapping:

  • Every comment or post is pre-categorized into one of various subreddit on Reddit.
  • We trained topic model for each Mental Health-related subreddit and generate compound topics.
  • We define semantic mapping as a procedure to Match compound topics from sub-reddit to those obtained from tweets

Hit Score Calculation

56 of 83

Hit Score Calculation

56

Medical Knowledge Bases

LDA

LDA over Bi-grams

Hit

Score

DSM-5

Lexicon

<Reddit Post>

DAO

Drug Abuse Ontology

*Gaur, Manas, et al. "" Let Me Tell You About Your Mental Health!" Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention." Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018.

<Tweets>

N-Gram Key Phrases

57 of 83

Hit Score Calculation

57

S: Index of a particular tweet

D: Concepts extracted from the lexicons of different category {Depression, Addiction, Anxiety, HealthCare, Financial, StayAtHome}

HS: Collection of Hit Score calculated for S

ngS: Ngrams extracted from S

LDAS: Compound topics extracted

bLDAS : Compound Ngram Topics extracted

H(a,b): Number of hits of a that maps with hits in b.

nhsSD: Index Score

*Gaur, Manas, et al. "" Let Me Tell You About Your Mental Health!" Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention." Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018.

Semantic Proximity

Semantic Mapping

57

58 of 83

Tweet Examples From Dataset

Anxiety: “All things are being shut down by #COVID19 but my anxiety and depression🙁”

Depression: “A feeling of hopelessness. Seems I am in a dark age. #SARSCOV-2”

Addiction: “I drive the streets of #LA looking 4 my #Homeless kids, drug and alcohol Often, I find them emaciated and delusional.”

HealthCare:Meanwhile: NHS staff to be asked to treat coronavirus patients without gowns #novelcorona”

StayAtHome: “The stress, uncertainty and isolation of #COVID19 can be even more frightening for people in abusive relationships. #DomesticViolence #COVID19 #stayhome”

Financial: “The <username> has three new credits to help your business through these rough times, including immediate assistance to keep your employees in your payroll. #COVIDreliefUT #business #SmallBusiness #COVID19 #COVID”, “Our child care system is on the verge of collapsing beneath the economic burden of this pandemic. If we don't act, millions of parents will be unable to return to work and our economic recovery will suffer. <username> and I have a plan to fix it—before it's too late. #COVID #creditfreeze #relieffund”

58

59 of 83

Technical Approach Overview

59

60 of 83

Previous Work: Architecture

60

SEDO

Semantic Encoding and Decoding Optimization. It is a procedure to modulate word embedding (vectors) of a word.

Reddit with

DSM-5 labels

Word Embedding Model

Correlation Matrix (Q)over word vectors

Medical Knowledge Bases

Domain

Experts

Correlation Matrix (P)

over DSM-5 Lexicon or DAO

SEDO

Optimize P, Q & Z

DSM-5 Lexicon

DSM-5 Vocabulary Matrix

Word-modulated Word Embeddings

DSM-5 Classification

Cross Correlation Matrix (Z)

between word vectors and DSM-5 Lexicon or DAO

HLF+VLF+FGF

Feature set

DAO

*Gaur, Manas, et al. "" Let Me Tell You About Your Mental Health!" Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention." Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018.

61 of 83

Semantic Encoding and Decoding Optimization (SEDO)

61

We have incorporate background knowledge in DSM-5-DAO to classification process utilizing SEDO.

We introduce SEDO as an approach for obtaining a discriminative weight matrix between the DSM-5 lexicon and Reddit embedding space

SEDO modulates the embeddings of each word in the Reddit content of the user based on proximity of the word to DSM-5 category.

Correlation Matrix (Q)over word vectors

Correlation Matrix (P)

over DSM-5 Lexicon or DAO

Cross Correlation Matrix (Z)

between word vectors and DSM-5 Lexicon or DAO

SEDO

Optimize P, Q & Z

*Gaur, Manas, et al. "" Let Me Tell You About Your Mental Health!" Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention." Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018.

62 of 83

Semantic Encoding and Decoding Optimization

62

12808 Words

300 dimension embedding

300 dimension embedding

20 DSM-5 Categories

R

Reddit Word Embedding Model

DSM-5 -DAO Lexicon

W

Solvable Sylvester Equation

63 of 83

Model Training for Covid-19

63

SEDO

Semantic Encoding and Decoding Optimization. It is a procedure to modulate word embedding (vectors) of a word.

Tweets Ngrams mapped to MHDA Lexicon

Word Embedding Model

Correlation Matrix (Q)over Tweet word vectors

Correlation Matrix (P) over MHDA Lexicon

SEDO

Optimize P, Q & Z

MHDA Vocabulary Matrix

Word-modulated Word Embeddings

Tweet Classification

Cross Correlation Matrix (Z)

between Tweet word vectors and MHDA Lexicon

Modulated Tweet Embedding

*Gaur, Manas, et al. "" Let Me Tell You About Your Mental Health!" Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention." Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018.

64 of 83

Experimental Setup

  • Purpose: Validate the neurosymbolic approach for dynamic sentiment analysis in mental health discourse during the COVID-19 pandemic. ​

  • Experiments Conducted: ​
    • Baseline Classification: Evaluate binary classification for Depression, Addiction, and Anxiety categories. ​
    • Triangulation Study: Validate generalizability using external datasets. ​
      • Analyze empirical significance of pre-trained vs. fine-tuned SEDO weight matrices. ​
    • Comparison with LLMs: Evaluate performance against state-of-the-art language models like LLama, Phi, and Mistral. ​
    • Focus: Precision, recall, F1 score, and computational efficiency. ​

64

65 of 83

Results - Baseline Classification

  • The table compares precision, recall, and F1 scores for four models across categories. Values in red parentheses indicate the percentage performance drop when the SEDO matrix is excluded.​
  • The Neurosymbolic Balanced Sub-sample Random Forest (BSRF) consistently achieves the highest F1 scores across all categories, underscoring its effectiveness in handling imbalanced data. ​
  • The significant decrease in performance metrics (precision, recall, and F1-score) without the SEDO matrix (-21% to -29%) demonstrates the critical role of SEDO optimization in improving classification accuracy. ​

65

66 of 83

Results - Triangulation Study

  • This study assesses the generalizability and robustness of the proposed SEDO framework by applying it to previously unseen datasets annotated for depression, addiction, and anxiety. ​
  • The datasets consist of social media posts collected from published datasets specific to each category—depression, addiction, and anxiety. ​
  • One with pre-trained SEDO weight matrix and the other with a fine-tuned matrix for improved classification. ​
  • Fine-tuning the SEDO matrix on new datasets led to significant performance improvements, demonstrating.​

66

67 of 83

Results - Comparison with LLMs

  • The neurosymbolic model was compared with three state-of-the-art LLMs: LLama (7B parameters), Phi (2.7B parameters), and Mistral (7B parameters). These LLMs were used in an open-source, instruct-tuned, zero-shot setting.​
  • The evaluation dataset included 1,000 tweets per category (depression, addiction, and anxiety), collected across three timeframes—April-May 2020, August-September 2020, and December 2020-January 2021. ​
  • The neurosymbolic approach outperformed the LLMs in all metrics (precision, recall, F1-score) for all three categories, achieving F1-scores between 88.84% and 91.85%, compared to 68.95%-78.56% for LLMs. This highlights its adaptability and computational efficiency over traditional LLMs​.​

67

68 of 83

68

A calculated Social Quality Index (SQI) aggregates mental health components (Depression, Anxiety), Addiction and Substance Use Disorders.

Social Quality Index (SQI)

vecteezy.com

  • Change in SQI informs comparisons between states.
  • Raw transformed SQI into relative state rankings changing over time.

69 of 83

69

e.g., IN, NH, OH, OR, WA, WY �are worsening.

Results: Relative State Rankings Reveal Patterns

SQI Ranking April 4 - 10

SQI Ranking March 14 - 20

SQI Ranking March 21 - 27

SQI Ranking March 23-April 3

Darker: Better Social Quality

70 of 83

70

�IL, NY, MD, AZ, NM, MA

WI, RI, NV, NJ, CT, LA, OK

WA, KS, IN, WY, OH, OR, NH

Relative SQI Ranking

Results: Three of the Observed Temporal Patterns

March 14-20 March 21-27 March 28-April 3 April 4-10

71 of 83

Results: Cluster --Improving SQI Ranking

71

SQI bad SQI better SQI better SQI better

Frequency

Depression: 125037

Addiction: 92897

Anxiety: 81891

Total: 299825

Frequency

Depression: 113830

Addiction: 81810

Anxiety: 74080

Total: 269720

Frequency

Depression: 81463

Addiction: 60166

Anxiety: 45998

Total: 187627

Frequency

Depression: 59088

Addiction: 49086

Anxiety: 46887

Total: 155061

IL, NY, MD, AZ, NM, MA.

March 14-20 March 21-27 March 28-April 3 April 4-10

72 of 83

72

Results: Cluster --Declining SQI Ranking

March 14-20 March 21-27 March 28-April 3 April 4-10

SQI good SQI worse SQI worse SQI worse

WA, KS, IN, WY, OH, OR, NH

Frequency

Depression: 88491

Addiction: 24373

Anxiety: 37725

Total: 146589

Frequency

Depression: 68491

Addiction: 37846

Anxiety: 53189

Total: 159526

Frequency

Depression: 81746

Addiction: 59756

Anxiety: 78885

Total: 220387

Frequency

Depression: 123244

Addiction: 84879

Anxiety: 94999

Total: 303122

73 of 83

Results: Cluster --A Non-Linear SQI Ranking

73

WI, RI, NV, NJ, CT, LA, OK.

SQI worse SQI better SQI better SQI worse

Frequency

Depression: 91,480

Addiction: 103549

Anxiety: 88293

Total: 283322

Frequency

Depression: 62825

Addiction: 81400

Anxiety: 54184

Total: 198409

Frequency

Depression: 58223

Addiction: 76232

Anxiety: 41484

Total: 175949

Frequency

Depression: 78061

Addiction: 87463

Anxiety: 63865

Total: 229389

March 14-20 March 21-27 March 28-April 3 April 4-10

74 of 83

Explanation: Two threads of influence

74

External events

(business and school closing)

Short term Human Coping Processes (content changes in focus of attention)

SQI

75 of 83

Results: Influence of External Events

75

SQI worse

Cluster 4:

CT, LA, NJ, NV, OK, RI, WI.

School Closures: CT, LA, NJ, NV, RI, WV, WI

Business Closures: CT, LA, NJ, RI, WV, WI

Social Distancing Reg: LA, NJ, RI, WV, WI

Business Relief: WI

Unemployment increase:

CT 2.5K %, LA 2.5K %, NJ 1.2K %,

NV 1.2K %, OK 1.2K %, RI 2.5K %, WI 1.2K %.

Stay at home: CT, LA, NJ, OK, RI, WI, WV

Extension School: CT, WV

Major Disaster: NJ

Business Relief: NJ

Unemployment increase:

CT 180%, LA 0 %, NJ 64 %,

NV 0 %, OK 99 %, RI -23%, WI 99 %.

Major Disaster: CT, WV

Strict Social Dist: CT, RI

Extensions deadlines: CT

Medical shortage: NJ

Extension Stay home: OK

Extension School: RI

Extension Business Closure: RI

Business Relief: NJ, RI

Individual Relief: RI

Unemployment increase:

CT 0%, LA 5 %, NJ 3 %,

NV 11 %, OK 7 %, RI 0%, WI -5 %.

Extension School: CT

Extension Stay home: LA

Strict Social Dist: NJ

Business Relief: WI

Cluster 5:

FL, GA, MI, NE, TN, VA, WV.

School Closures: FL, GA, MI, TN, VA, WV,

Business Closures: WV, MI

Social Distancing Reg: FL, MI, NE, TN, VA, WV,

Business Relief: FL, GA, MI, NE, TN, VA

Individual Relief: TN, VA

Unemployment increase:

FL 600%, GA 650%, MI 180%,

NE 70%, TN 180%, VA 180%,

WV 600%

Stay at home: MI, WV

Shelter in Place: GA

Business Closure: GA, TN

Extension School: GA, WV

Major Disaster: FL

Business Relief: TN

Individual Relief: TN

Unemployment increase:

FL 3.1K%, GA 3K%, MI 1.8K%,

NE 200%, TN 700%, VA 1.6K%,

WV 1.7K%

Stay at home: FL, VA

Shelter in Place: TN

Major Disaster: GA, MI, TN, VA, WV

Strict Social Dist: GA

Extension School: GA, MI

Unemployment increase:

FL -25%, GA 190%, MI 27%,

NE 8%, TN 26%, VA 33%,

WV 0%

Extension School: GA

Extension Stay home: MI

SQI worse

SQI worse

SQI worse

SQI better

SQI better

SQI better

SQI better

March 14-20 March 21-27 March 28-April 3 April 4-10

76 of 83

Hashtag Content Mirrors SQI

(steadily improving states)

76

SQI:

SQI bad SQI better SQI better SQI better

Hashtag:

#

Cluster 7:

IL, NY, MD, AZ, NM, MA.

March 14-20 March 21-27 March 28-April 3 April 4-10

77 of 83

Hashtag Content Mirrors SQI

(steadily declining states)

77

SQI:

SQI better SQI worse SQI worse SQI worse

Hashtag:

#

Cluster 1:

WA, KS, IN, WY, OH, OR, NH

March 14-20 March 21-27 March 28-April 3 April 4-10

78 of 83

78

Content

  • Foundations
    • Why Social Media Data
    • LLMs and its challenges
    • NeuroSymbolic Approach
    • Types of Knowledge Infused Learning and its advantages
  • Neurosymbolic Approach for an health application
    • Use-Case COVID-19
  • Hands-on Session

79 of 83

HANDS-ON Session

  • Modulating word embedding with Zero-shot learning
  • Neologism

79

Complete Github Repo:

80 of 83

Conclusion

  • Neurosymbolic AI integrates symbolic reasoning with neural networks to enhance adaptability.
  • Tackling LLM challenges improves efficiency in dynamic, noisy data environments like social media.
  • Applications in health domains showcase impactful use-cases, e.g., COVID-19 analysis.
  • The hands-on session equipped participants with practical skills to implement word embedding modulation and neologism.

80

NeuroSymbolic AI

Open Source Gen AI

Instructability

Alignment

Grounding

81 of 83

Conclusion

81

81

Instructability

Grounding

Alignment

  • The capability of AI systems to be taught and guided by humans to cause intentional behaviours.
  • Features:
    • Skill Acquisition
    • Knowledge-Gap management
    • Human-AI Interaction
      • Explainability
      • Interpretability
      • Observability
  • The process of establishing meaningful connections between AI representations and the real world, ensuring AI systems understand and interact with their environment effectively.
  • Features:
    • Symbol Grounding
    • Pragmatic Grounding
    • Compositional Grounding
  • Ensuring AI systems' goals, actions, and behaviors are consistent with end user expectations, for example human values and norms.
  • Features:
    • Value-based / Ethical Orientation
    • Task Orientation
    • Collaborative Functioning

Image by pngtree.com

Image by kjpargeter on Freepik

Image by gun21awan740843 on vecteezy

Interaction

Feedback

Human

AI

Human-Values

AI Actions

Real World Concepts

Last Layer

(AI Representation)

Commands, Queries and Responses

Corrections and Learnings

Image by pngtree.com

Image by medium

Image by emiltimplaru, santima.studio on vecteezy

Image by oval on clker.com

Image by pngtree.com

82 of 83

82

Primary funding support by NSF Awards #: 2133842, 2335967, 2119654, 2350302, WIPRO, BOSCH, others.

Learn more:

83 of 83

Thank You!

83