EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs
Tushar Abhishek*, Manas Jain*, Shishir Hardia, Shreevignesh Suriyanarayanan, Sandra Anil, Rushabh Gandhi, Manish Gupta
Applied Research Paper Track
EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs. CIKM 2025.
Problem Statement: Short & Ambiguous Chat Queries
��Observations: ~70% of Microsoft Copilot queries are broad and short (< 10 words).
Consequences:
Key Question: Can we semi-automatically enhance short queries into specific, well-formed ones with clear intent?
Microsoft Copilot
ChatGPT
Google Gemini
EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs. CIKM 2025.
Our Solution: Enhance My Prompt (EMP)
Novel Problem Formulation�Unlike query expansion: We add relevant sub-intents/constraints, not just related terms�Unlike prompt optimization: We enhance with detail, not compress
Unlike prompt engineering: We don't change prompting strategy
�Enhance My Prompt System
Example Transformation:
Original: “create an image of rhino playing football”
Enhanced (Level 4): “create a <photo-realistic> image in <bright colors> of a rhinoceros wearing a <red jersey> playing in a <football stadium>”
User can click on enhance my prompt button to invoke the system.
EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs. CIKM 2025.
Our Solution: Enhance My Prompt (EMP)
Novel Problem Formulation�Unlike query expansion: We add relevant sub-intents/constraints, not just related terms�Unlike prompt optimization: We enhance with detail, not compress
Unlike prompt engineering: We don't change prompting strategy
�Enhance My Prompt System
Example Transformation:
Original: “create an image of rhino playing football”
Enhanced (Level 4): “create a <photo-realistic> image in <bright colors> of a rhinoceros wearing a <red jersey> playing in a <football stadium>”
User can analyze, edit or choose from different placeholder in the enhanced prompt before submitting their query to copilot.
EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs. CIKM 2025.
Four Levels of Enhancements
Key Advantages:
EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs. CIKM 2025.
Dataset & Enhanced Query Generation
Enhanced Query Generation
Average query sizes (in words) across different levels of enhancement for both datasets using GPT-4T.
EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs. CIKM 2025.
Model Training & Architecture
We evaluated the following models:
�Training Setup:�
Why SLMs?�
EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs. CIKM 2025.
Evaluation Metrics
To design the relevant metrics, we performed a user study with 10 users with diverse backgrounds and ethnicities worldwide. Based on the feedback, we propose three novel metrics to measure enhanced query draft (evaluated using GPT4 Turbo):
Enhanced Query Draft Quality (EQDQ)
Additional User Effort (AUE)
LLM Response Quality Improvement (LRQI)
EQDQ takes user query and enhanced query draft as input to measure 6 dimensions on 4-point Likert scale
AUE take enhanced query draft and final enhanced query (after user/agent edit) as input.
LRQI measures how much the LLM’s response improves when the final enhanced query is used, by comparing the response quality against that from the original user query.
EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs. CIKM 2025.
Detailed Results - Copilot
EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs. CIKM 2025.
Detailed Results - LMSYS+NQ
EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs. CIKM 2025.
Key Results : Overall Model Performance
Best Performing Model: Instruction-tuned Phi-3-Mini
Model | Level-1 | Level-2 | Level-3 | Level-4 | Average |
Pretrained Phi-3 | 2.57 | 2.59 | 2.6 | 2.6 | 2.59 |
Pretrained LLaMA-3 | 2.79 | 2.84 | 2.82 | 2.84 | 2.82 |
GPT2-Medium (FT) | 2.66 | 2.61 | 2.64 | 2.73 | 2.66 |
T5-Large (FT) | 2.73 | 2.68 | 2.71 | 2.8 | 2.73 |
Phi-3 (IT) | 2.8 | 2.79 | 2.81 | 2.86 | 2.82 |
LLaMA-3 (IT) | 2.77 | 2.75 | 2.79 | 2.84 | 2.79 |
GPT-4T (reference) | 2.81 | 2.8 | 2.82 | 2.87 | 2.83 |
Overall EQDQ Scores - Copilot Dataset (Higher is Better):
Model | Level-1 | Level-2 | Level-3 | Level-4 | Average |
Pretrained Phi-3 | 0.06 | 0.08 | 0.05 | 0.05 | 0.06 |
Pretrained LLaMA-3 | -0.01 | -0.08 | -0.07 | -0.14 | -0.08 |
GPT2-Medium (FT) | 0.13 | 0.09 | 0.1 | 0.12 | 0.11 |
T5-Large (FT) | 0.12 | 0.1 | 0.11 | 0.11 | 0.11 |
Phi-3 (IT) | 0.12 | 0.09 | 0.11 | 0.12 | 0.11 |
LLaMA-3 (IT) | 0.13 | 0.08 | 0.09 | 0.11 | 0.10 |
GPT-4T (reference) | 0.13 | 0.09 | 0.11 | 0.13 | 0.12 |
Overall LRQI Scores - Copilot Dataset (Higher is Better):
AUE Scores – Copilot Dataset (Lower is Better):
EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs. CIKM 2025.
Impact & Practical Insights
Coverage of Future User Intents using best performing model: Instruction-tuned Phi-3-Mini
�Experiment: 10K LMSYS conversations with ≥3 turns
Key Finding: EnhanceMyPrompt can predict user intents up to 3 turns ahead in ~23% of conversations
Latency (Phi-3 IT on V100 GPU):
Edit Distance Analysis (Enhanced Draft Query and Enhanced Final Query)
Average Inference Times for 100 samples on a NVIDIA V100 GPUwithbatch size as 1.
Edit distance (in words) between enhanced query draft and final enhanced query.
EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs. CIKM 2025.
Examples
Enhanced Draft Query using EnhanceMyPrompt models. Query=“can you recommend some investment opportunities?”
EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs. CIKM 2025.
Qualitative Analysis
Examples of errors made by our best Enhance MyPromptSLMmodel - instruction-tuned Phi-3.
EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs. CIKM 2025.
Conclusion
Thank You
Tushar Abhishek, Manas Jain, Shishir Hardia, Shreevignesh Suriyanarayanan, Sandra Anil, Rushabh Gandhi, Manish Gupta