1 of 19

Bhagesh Gaur^⧪, Karan Gupta^⧪, Aseem Srivastava^⧪, Manish Gupta^♠, Md Shad Akhtar^⧪

Assess and Prompt: A Generative RL Framework for Improving Engagement in Online Mental Health Communities

^⧪ Indraprastha Institute of Information Technology Delhi (IIIT Delhi), India

^♠ Microsoft, India

2 of 19

“Over 40% of help-seeking posts on Reddit mental health forums get no response.” (Sharma et al., 2020; Kim et al., 2023)

Even in supportive spaces, silence can deepen isolation.

Why do so many cries for help online go unanswered?

We aim to understand and bridge this communication gap.

3 of 19

Online forums give safe, peer-based spaces for mental health support.
Yet, many posts lack clarity about what happened, how it felt, and what support is needed.
In therapy, expressing these elements is essential to being understood.
We model these as Support Attributes (Event, Effect, Requirement) - signals of help-seeking clarity.

Support-seeking posts often miss key ingredients of help

Clear expression is the bridge between asking for help and receiving it.

4 of 19

Posts without clear ‘support attributes’ fail to elicit engagement

Online help-seeking posts often omit key support cues — what happened, how it felt, and what’s needed.

This lack of “support attributes” leads to lower empathy and response rates.

Prior NLP work focuses on empathy detection or response generation, but not on assessing and improving post clarity.

5 of 19

Posts without clear ‘support attributes’ fail to elicit engagement

Online help-seeking posts often omit key support cues — what happened, how it felt, and what’s needed.

This lack of “support attributes” leads to lower empathy and response rates.

Prior NLP work focuses on empathy detection or response generation, but not on assessing and improving post clarity.

We shift focus from ‘how to respond’ → to ‘how to help users express better.’

6 of 19

Including Event, Effect, Requirement in post increases the number of comments

7 of 19

Can a language model identify missing support attributes in a post and prompt the user to express them?

To study this aspect and address the gaps, we propose two major contributions:

�

A novel dataset, REDDME, along with a taxonomy, CueTaxo, to study the engagement in posting behavior for support seeking.�
MH-Copilot, an assistive framework for prompting users with missing support attributes in their post for better support seeking in peer community.

8 of 19

We propose REDDME, a manually annotated corpus of Reddit posts.

The following attributes are annotated

with spans (rationales), their intensity

levels and guided question as per

taxonomy.

Event
Effect
Requirement

Stats:

Total posts: 4760

Average Post Length: 179.62

Total Guided Questions: 7909

Dataset: REDDME

9 of 19

Taxonomy: CueTaxo

10 of 19

MH-COPILOT empowers support-seekers to tell their stories better.

Can we help users express what they need - before they give up asking?

11 of 19

Assess the post: extract Event, Effect, Requirement spans (CSpan), then rate each attribute’s intensity (absent / moderate / present).

Prompt the user: a generator produces guided questions targeted to missing/weak attributes, using a hierarchical taxonomy (CUETAXO).

Learn with RL: a verifier scores each question along multiple dimensions; scores feed a preference-based objective (DPO) to improve the policy.

MH-COPILOT: Assess → Prompt → Learn (RL)

12 of 19

POST

POST w/ Attribute Spans

Contextual Attribute Span Classifier

(CSpan)

Support Attribute Intensity Detection

(Intensity Classifier)

Attribute Level Intensity

TAXONOMY

Taxonomy-based Question Prompt

LM Layer (D_n)

LM Layer (D_n-1)

LM Layer (D_n-2)

Attribute Intensity

VERIFIER MODULE

Reference Model’s

Response Ranking

Reward Computation

DPO

What made you feel <X> ?

Can you elaborate more on <X> ?

What can help you overcome <X> ?

What made you feel anxious?

Can you elaborate more on how you feel?

What can help you overcome your anxiety?

Level 1

Level 2

Level 3

Level 4

Level 5

Structural

Assessor

Empathy

Assessor

Context Evaluator

Suggestive Question Generator

(Language Model)

CSpan - Support Attributes’ presence in the post is identified and highlighted as an NER task
Intensity Classifier - Presence of these attributes is rated to identify potential areas of improvement.

MH-COPILOT: Assess → Prompt → Learn (RL)

13 of 19

POST

POST w/ Attribute Spans

Contextual Attribute Span Classifier

(CSpan)

Support Attribute Intensity Detection

(Intensity Classifier)

Attribute Level Intensity

TAXONOMY

Taxonomy-based Question Prompt

LM Layer (D_n)

LM Layer (D_n-1)

LM Layer (D_n-2)

Attribute Intensity

VERIFIER MODULE

Reference Model’s

Response Ranking

Reward Computation

DPO

What made you feel <X> ?

Can you elaborate more on <X> ?

What can help you overcome <X> ?

What made you feel anxious?

Can you elaborate more on how you feel?

What can help you overcome your anxiety?

Level 1

Level 2

Level 3

Level 4

Level 5

Structural

Assessor

Empathy

Assessor

Context Evaluator

Suggestive Question Generator

(Language Model)

CUETAXO levels encode how complete each attribute is; we include these levels in the LM prompt so generation is attribute-aware.

MH-COPILOT: Assess → Prompt → Learn (RL)

14 of 19

POST

POST w/ Attribute Spans

Contextual Attribute Span Classifier

(CSpan)

Support Attribute Intensity Detection

(Intensity Classifier)

Attribute Level Intensity

TAXONOMY

Taxonomy-based Question Prompt

LM Layer (D_n)

LM Layer (D_n-1)

LM Layer (D_n-2)

Attribute Intensity

VERIFIER MODULE

Reference Model’s

Response Ranking

Reward Computation

DPO

What made you feel <X> ?

Can you elaborate more on <X> ?

What can help you overcome <X> ?

What made you feel anxious?

Can you elaborate more on how you feel?

What can help you overcome your anxiety?

Level 1

Level 2

Level 3

Level 4

Level 5

Structural

Assessor

Empathy

Assessor

Context Evaluator

Suggestive Question Generator

(Language Model)

MH-COPILOT: Assess → Prompt → Learn (RL)

The generator targets only absent/moderate attributes and keeps wording aligned to the taxonomy (e.g., “Can you describe more about the event…?”).
Output format is constrained (JSON schema with event_question, effect_question, requirement_question) to keep structure consistent.

15 of 19

POST

POST w/ Attribute Spans

Contextual Attribute Span Classifier

(CSpan)

Support Attribute Intensity Detection

(Intensity Classifier)

Attribute Level Intensity

TAXONOMY

Taxonomy-based Question Prompt

LM Layer (D_n)

LM Layer (D_n-1)

LM Layer (D_n-2)

Attribute Intensity

VERIFIER MODULE

Reference Model’s

Response Ranking

Reward Computation

DPO

What made you feel <X> ?

Can you elaborate more on <X> ?

What can help you overcome <X> ?

What made you feel anxious?

Can you elaborate more on how you feel?

What can help you overcome your anxiety?

Level 1

Level 2

Level 3

Level 4

Level 5

Structural

Assessor

Empathy

Assessor

Context Evaluator

Suggestive Question Generator

(Language Model)

MH-COPILOT: Assess → Prompt → Learn (RL)

Attribute Intensity - category correctness score
Context Evaluator - contextual grounding score
Empathy Assessor - empathy score
Structure Assessor - structure(taxonomy) assessor score

16 of 19

Results

17 of 19

Annotators reported MH-COPILOT’s outputs “occasionally surpass gold standard”

Metric	w/o Verifier	w/ Verifier
Empathy (D1)	3.27	3.43
Relevance (D2)	1.82	2.27
Context (D3)	2.19	3.31
Fluency (L3)	3.82	4.02

Human Eval

Verifier + Taxonomy → Quality Improvement Beyond Numbers

18 of 19

Reinforcement via preference learning (Verifier + DPO) produces qualitatively superior outputs.
Combining CUETAXO taxonomy + reward model yields large gains in alignment and clarity.
MH-COPILOT generalizes across LLMs (Gemma-2, Mistral, Phi-3, Llama-3).
Human evaluators confirmed the framework helps posts become clearer and more actionable for peers.

MH-COPILOT transforms generative RL from text optimization → social interaction enhancement.

Generative RL can teach models to ask better questions

At its heart, this work is about improving human connection — not just model accuracy.

Through Assess and Prompt, we learned that engagement in online mental health spaces depends not only on empathy in responses but on clarity in self-expression.

Our experiments demonstrate that Generative RL can be re-purposed — beyond reward hacking or stylistic tuning — to encourage prosocial communication behaviors.

By combining:

Therapeutic structure (Event–Effect–Requirement)
Prompt design (CUETAXO taxonomy)
Feedback learning (Verifier-based DPO)�MH-COPILOT consistently produced prompts that made users’ posts clearer, more contextualized, and more actionable.

Quantitatively, this shows up as higher BERTScores and ROUGE improvements.

Qualitatively, it shows up in human ratings for empathy, relevance, and clarity — the dimensions that actually matter in mental-health discourse.

The larger takeaway: Generative RL frameworks can be aligned toward socially constructive objectives, not just linguistic ones.

This opens pathways for responsible AI systems that assist rather than replace human expression — helping people find words when it’s hardest to.

19 of 19

Assess and Prompt: A Generative RL Framework for Improving Engagement in Online Mental Health Communities�Bhagesh Gaur, Karan Gupta, Aseem Srivastava, Manish Gupta, Md Shad Akhtar

Scan the QR code to access:

Paper
Code

Thank You