RL‑Powered Mental Health Support Bot
Chirasmayee B, Pallavi Bichpuriya, Diya Shah
Hack With CAIR, April 19 2025
Problem & Motivation
• One‑size‑fits‑all tips feel generic
• Live human coaches don’t scale
High‑Level Solution
System Architecture
Key Technologies
PPO Agent Logic
• State: Mood score (0–1)
• Action: One of eight wellness strategies
• Reward: Mood improvement
• Goal: Maximize cumulative mood increase
over episodes
Live Demo
•User: “I’m sad.”
• Bot: “I sense you might be feeling sadness. How about a short walk in nature? A little fresh air might help”
• Mood chart updates
The PPO Algorithm (Formulas)
st- the user’s current mood_score (a number between 0 and 1)
at- one of the 8 wellness suggestions (meditation, breathing_ex, etc.)
∏Θ(at|st)- the probability your new policy assigns to picking suggestion at in mood st
∏Θ(old) is the same probability under the policy before that update.
Key Implementation Snippets
Tests & Results
Future Work
Thank You & Q&A