1 of 22

Understanding Regular Expression Denial of Service (ReDoS): Insights from LLM-Generated Regexes and Developer Forums

Security & Software Engineering Research Lab

Joanna C. S. Santos

joannacss@nd.edu

Mohammed Latif Siddiq*

msiddiq3@nd.edu

Jiahao Zhang* jzhang38@nd.edu

*Equal contribution

2 of 22

Regex-based Denial of Service

(aka Regex DoS or ReDoS)

  • Problem:

Some regexes (genuine or not) can take an extremely long time to match (or mismatch) – catastrophic backtracking

  • Preconditions for a Regex DoS attack
    • Use of ReDoS regex on an untrusted input
    • Regex engine uses a slow regex matching algorithm (aka most of the built-in engines on the popular languages)
    • Regex engine has no resource usage limit

3 of 22

How do you know a regex is “bad”?

Regexes that contain ambiguity:

When the regex engine could make two different choices and end up at the same point

Ambiguity commonly occurs as the result of ORs and quantifiers (*, +, {5,10}, etc.).

Examples:

aa → can be parsed in 4 different ways

aaa → can be parsed in 8 different ways

a → can be parsed in 2 different ways

(a|a)

(a|a)*

It is exponentially ambiguous

(# of paths doubles for every additional "a").

4 of 22

How do I know I have a problematic regex?

  • Nested quantifiers can be exponentially dangerous
    • Ex: (a*)*
  • Quantifying a disjunction can be exponentially dangerous
    • Ex: (a|a)*
  • Concatenated quantifiers can be polynomially dangerous
    • Ex: abc.*def.*
  • As a special case of the previous one: partial match
    • Ex: “search this free-form text for an email address”, then the regex engine adds an implicit .*? to the beginning of your regex.

5 of 22

Regex Equivalence Class Groups

5

6 of 22

Motivation

  • LLM-based tools are widely used in software development and potentially for RegEx generation.
  • Developers may focus only on functionally correct RegEx but may be unaware of the ReDoS.
  • LLM can generate ReDoS-vulnerable RegEx, which may not be the same type as real-world ReDoS-vulnerable RegEx.
  • Developers may not be aware of ReDoS vulnerabilities and how to tackle them.

6

7 of 22

Goal of the Work

RQ1: To what extent LLMs can generate non-vulnerable and correct regexes?

RQ2: What are the characteristics of LLM-generated ReDoS vulnerabilities?

RQ3: What are the characteristics of real-world ReDoS vulnerabilities that were fixed?

RQ4: How are ReDoS discussed by developers?

7

8 of 22

RQ1: ReDoS Generation

8

9 of 22

RQ1: RegEx/ReDoS Generation

Models:

  1. Fine-tuned T5
  2. Phi-1.5
  3. GPT-3.5-Turbo

Prompt: “<prompt description here>. Generate a regex for this description:”

Temperature: 0.0, 0.2, 0.4, 0.6, 0.8, 1.0

Number of RegEx Generated: 10

Generated Token Size: 128

9

10 of 22

RQ1: RegEx/ReDoS Generation

Functional Correctness Metrics:

  1. Exact Match
  2. DFA-EQ@k
  3. Pass@k

10

11 of 22

RQ1: RegEx/ReDoS Generation

Functional Correctness Metrics:

  • Exact Match
  • DFA-EQ@k
  • Pass@k

k= {1, 3, 10}

11

12 of 22

RQ1: RegEx/ReDoS Generation

Functional Correctness Metrics:

  • Exact Match
  • DFA-EQ@k
  • Pass@k

k= {1, 3, 10}

12

13 of 22

RQ1: RegEx/ReDoS Generation

Vulnerability Metrics:

Vulnerable@k: Measures the probability of finding an insecure regex within the top k results.

Tool: ReDoSHunter

k= {1, 3, 10}

13

14 of 22

RQ2: ReDoS Characteristics

ReDoS Computational Complexity:

  • Exponential:
    • Exponential Overlapping Disjunction (EOD)
    • Exponential Overlapping Adjacency (EOA)
  • Polynomial:
    • Nested Quantifiers (NQ)
    • Polynomial Overlapping Adjacency (POA)
    • Starting with Large Quantifier (SLQ)

14

15 of 22

RQ2: ReDoS Characteristics

Regex Classification:

  • Custom Character Class Group (CCC)
  • Double-Bounded Group (DBB)
  • Literal Group (LIT)
  • Lower-Bounded Group (LWB)
  • Single-Bounded Group (SNG)

Sample from Original and Refined prompts with 95% confidence level and 5% margin of error.

Total Samples: 714

15

16 of 22

RQ3: Analyzing real-world ReDoS

Data Sources:

  • SOLA-DA: 34 samples
  • CVEs: 414 samples from 70 vulnerabilities.

16

17 of 22

RQ3: Analyzing real-world ReDoS

Comparison with RQ2 Result:

Mann–Whitney U test

  • Null hypothesis (𝐻0): the two populations are equal
  • Alternative hypothesis (𝐻1): the two populations are not equal.

Two-tailed test and Significant level, 𝛼 = 0.05

17

18 of 22

RQ4: Developers’ Concerns

Sources:

  • Stack-overflow: Post containing “regex” and “ReDoS” (case- insensitive search)
  • GitHub: Pull requests (a) containing the strings “regex” and “redos” (case insensitive); (b) was closed and (c) not done by a bot.

Total samples: 48 from SO and 426 from GitHub.

18

19 of 22

RQ4: Developers’ Concerns

19

20 of 22

Implications

  • Usage of LLMs for Regex Generation
  • ReDoS generation
  • ReDoS Discussion in GitHub PR and StackOverflow
  • Implication for the developers
  • Implication for the researchers

20

21 of 22

Conclusion

  • GPT-3.5-Turbo is more capable functionally correct and less ReDoS-vulnerable ReDoS generator than other models (RQ1).
  • LLM-generated regexes mainly have polynomial ReDoS vulnerability patterns (RQ2), and it is consistent with the real-world data (RQ3).
  • developers’ main concern is related to mitigation strategies to remove vulnerable regexes (RQ4).

21

Preprint

Source Code

Understanding Regular Expression Denial of Service (ReDoS): Insights from LLM-Generated Regexes and Developer Forums

Mohammed Latif Siddiq, Jiahao Zhang & Joanna C. S. Santos

22 of 22

Managing LLM’s instability

  • Conducting a thorough analysis across a spectrum of temperature settings.
  • At each temperature level, generating multiple instances.
  • Making all prompts and corresponding outputs publicly accessible in our replication package.

22