1 of 22

Understanding Regular Expression Denial of Service (ReDoS): Insights from LLM-Generated Regexes and Developer Forums

Security & Software Engineering Research Lab

Joanna C. S. Santos

joannacss@nd.edu

Mohammed Latif Siddiq*

msiddiq3@nd.edu

Jiahao Zhang* jzhang38@nd.edu

*Equal contribution

2 of 22

Regex-based Denial of Service

(aka Regex DoS or ReDoS)

Problem:

Some regexes (genuine or not) can take an extremely long time to match (or mismatch) – catastrophic backtracking

Preconditions for a Regex DoS attack

Use of ReDoS regex on an untrusted input
Regex engine uses a slow regex matching algorithm (aka most of the built-in engines on the popular languages)
Regex engine has no resource usage limit

3 of 22

How do you know a regex is “bad”?

Regexes that contain ambiguity:

When the regex engine could make two different choices and end up at the same point

Ambiguity commonly occurs as the result of ORs and quantifiers (*, +, {5,10}, etc.).

Examples:

aa → can be parsed in 4 different ways

aaa → can be parsed in 8 different ways

a → can be parsed in 2 different ways

(a|a)

(a|a)*

It is exponentially ambiguous

(# of paths doubles for every additional "a").

4 of 22

How do I know I have a problematic regex?

Nested quantifiers can be exponentially dangerous

Ex: (a*)*

Quantifying a disjunction can be exponentially dangerous

Ex: (a|a)*

Concatenated quantifiers can be polynomially dangerous

Ex: abc.*def.*

As a special case of the previous one: partial match

Ex: “search this free-form text for an email address”, then the regex engine adds an implicit .*? to the beginning of your regex.

5 of 22

Regex Equivalence Class Groups

6 of 22

Motivation

LLM-based tools are widely used in software development and potentially for RegEx generation.
Developers may focus only on functionally correct RegEx but may be unaware of the ReDoS.
LLM can generate ReDoS-vulnerable RegEx, which may not be the same type as real-world ReDoS-vulnerable RegEx.
Developers may not be aware of ReDoS vulnerabilities and how to tackle them.

7 of 22

Goal of the Work

RQ1: To what extent LLMs can generate non-vulnerable and correct regexes?

RQ2: What are the characteristics of LLM-generated ReDoS vulnerabilities?

RQ3: What are the characteristics of real-world ReDoS vulnerabilities that were fixed?

RQ4: How are ReDoS discussed by developers?

8 of 22

RQ1: ReDoS Generation

9 of 22

RQ1: RegEx/ReDoS Generation

Models:

Fine-tuned T5
Phi-1.5
GPT-3.5-Turbo

Prompt: “<prompt description here>. Generate a regex for this description:”

Temperature: 0.0, 0.2, 0.4, 0.6, 0.8, 1.0

Number of RegEx Generated: 10

Generated Token Size: 128

10 of 22

RQ1: RegEx/ReDoS Generation

Functional Correctness Metrics:

Exact Match
DFA-EQ@k
Pass@k

11 of 22

RQ1: RegEx/ReDoS Generation

Functional Correctness Metrics:

Exact Match
DFA-EQ@k
Pass@k

k= {1, 3, 10}

12 of 22

RQ1: RegEx/ReDoS Generation

Functional Correctness Metrics:

Exact Match
DFA-EQ@k
Pass@k

k= {1, 3, 10}

13 of 22

RQ1: RegEx/ReDoS Generation

Vulnerability Metrics:

Vulnerable@k: Measures the probability of finding an insecure regex within the top k results.

Tool: ReDoSHunter

k= {1, 3, 10}

14 of 22

RQ2: ReDoS Characteristics

ReDoS Computational Complexity:

Exponential:

Exponential Overlapping Disjunction (EOD)
Exponential Overlapping Adjacency (EOA)

Polynomial:

Nested Quantifiers (NQ)
Polynomial Overlapping Adjacency (POA)
Starting with Large Quantifier (SLQ)

15 of 22

RQ2: ReDoS Characteristics

Regex Classification:

Custom Character Class Group (CCC)
Double-Bounded Group (DBB)
Literal Group (LIT)
Lower-Bounded Group (LWB)
Single-Bounded Group (SNG)

Sample from Original and Refined prompts with 95% confidence level and 5% margin of error.

Total Samples: 714

16 of 22

RQ3: Analyzing real-world ReDoS

Data Sources:

SOLA-DA: 34 samples
CVEs: 414 samples from 70 vulnerabilities.

17 of 22

RQ3: Analyzing real-world ReDoS

Comparison with RQ2 Result:

Mann–Whitney U test

Null hypothesis (𝐻₀): the two populations are equal
Alternative hypothesis (𝐻₁): the two populations are not equal.

Two-tailed test and Significant level, 𝛼 = 0.05

18 of 22

RQ4: Developers’ Concerns

Sources:

Stack-overflow: Post containing “regex” and “ReDoS” (case- insensitive search)
GitHub: Pull requests (a) containing the strings “regex” and “redos” (case insensitive); (b) was closed and (c) not done by a bot.

Total samples: 48 from SO and 426 from GitHub.

19 of 22

RQ4: Developers’ Concerns

20 of 22

Implications

Usage of LLMs for Regex Generation
ReDoS generation
ReDoS Discussion in GitHub PR and StackOverflow
Implication for the developers
Implication for the researchers

21 of 22

Conclusion

GPT-3.5-Turbo is more capable functionally correct and less ReDoS-vulnerable ReDoS generator than other models (RQ1).
LLM-generated regexes mainly have polynomial ReDoS vulnerability patterns (RQ2), and it is consistent with the real-world data (RQ3).
developers’ main concern is related to mitigation strategies to remove vulnerable regexes (RQ4).

Preprint

Source Code

Understanding Regular Expression Denial of Service (ReDoS): Insights from LLM-Generated Regexes and Developer Forums

Mohammed Latif Siddiq, Jiahao Zhang & Joanna C. S. Santos

22 of 22

Managing LLM’s instability

Conducting a thorough analysis across a spectrum of temperature settings.
At each temperature level, generating multiple instances.
Making all prompts and corresponding outputs publicly accessible in our replication package.