Understanding Regular Expression Denial of Service (ReDoS): Insights from LLM-Generated Regexes and Developer Forums
Security & Software Engineering Research Lab
Joanna C. S. Santos
joannacss@nd.edu
Mohammed Latif Siddiq*
msiddiq3@nd.edu
Jiahao Zhang* jzhang38@nd.edu
*Equal contribution
Regex-based Denial of Service
(aka Regex DoS or ReDoS)
Some regexes (genuine or not) can take an extremely long time to match (or mismatch) – catastrophic backtracking
How do you know a regex is “bad”?
Regexes that contain ambiguity:
When the regex engine could make two different choices and end up at the same point
Ambiguity commonly occurs as the result of ORs and quantifiers (*, +, {5,10}, etc.).
Examples:
aa → can be parsed in 4 different ways
aaa → can be parsed in 8 different ways
a → can be parsed in 2 different ways
(a|a)
(a|a)*
It is exponentially ambiguous
(# of paths doubles for every additional "a").
How do I know I have a problematic regex?
Regex Equivalence Class Groups
5
Motivation
6
Goal of the Work
RQ1: To what extent LLMs can generate non-vulnerable and correct regexes?
RQ2: What are the characteristics of LLM-generated ReDoS vulnerabilities?
RQ3: What are the characteristics of real-world ReDoS vulnerabilities that were fixed?
RQ4: How are ReDoS discussed by developers?
7
RQ1: ReDoS Generation
8
RQ1: RegEx/ReDoS Generation
Models:
Prompt: “<prompt description here>. Generate a regex for this description:”
Temperature: 0.0, 0.2, 0.4, 0.6, 0.8, 1.0
Number of RegEx Generated: 10
Generated Token Size: 128
9
RQ1: RegEx/ReDoS Generation
Functional Correctness Metrics:
10
RQ1: RegEx/ReDoS Generation
Functional Correctness Metrics:
k= {1, 3, 10}
11
RQ1: RegEx/ReDoS Generation
Functional Correctness Metrics:
k= {1, 3, 10}
12
RQ1: RegEx/ReDoS Generation
Vulnerability Metrics:
Vulnerable@k: Measures the probability of finding an insecure regex within the top k results.
Tool: ReDoSHunter
k= {1, 3, 10}
13
RQ2: ReDoS Characteristics
ReDoS Computational Complexity:
14
RQ2: ReDoS Characteristics
Regex Classification:
Sample from Original and Refined prompts with 95% confidence level and 5% margin of error.
Total Samples: 714
15
RQ3: Analyzing real-world ReDoS
Data Sources:
16
RQ3: Analyzing real-world ReDoS
Comparison with RQ2 Result:
Mann–Whitney U test
Two-tailed test and Significant level, 𝛼 = 0.05
17
RQ4: Developers’ Concerns
Sources:
Total samples: 48 from SO and 426 from GitHub.
18
RQ4: Developers’ Concerns
19
Implications
20
Conclusion
21
Preprint
Source Code
Understanding Regular Expression Denial of Service (ReDoS): Insights from LLM-Generated Regexes and Developer Forums
Mohammed Latif Siddiq, Jiahao Zhang & Joanna C. S. Santos
Managing LLM’s instability
22