1 of 8

CS458 Natural language Processing

Self-study 3

Minimum Edit Distance

Krishnendu Ghosh

Department of Computer Science & Engineering

Indian Institute of Information Technology Dharwad

2 of 8

Minimum Edit Distance

1. Using nltk

The nltk.edit_distance function computes the MED between two strings.

Example:

from nltk.metrics import edit_distance

# Input strings

str1 = "kitten"

str2 = "sitting"

# Calculate Minimum Edit Distance

med = edit_distance(str1, str2)

print(f"Minimum Edit Distance (NLTK): {med}")

3 of 8

Minimum Edit Distance

2. Using Levenshtein Library

The Levenshtein library provides a faster implementation for computing the MED.

Installation:

pip install python-Levenshtein

Example:

import Levenshtein

# Input strings

str1 = "kitten"

str2 = "sitting"

# Calculate Minimum Edit Distance

med = Levenshtein.distance(str1, str2)

print(f"Minimum Edit Distance (Levenshtein): {med}")

4 of 8

Minimum Edit Distance

Using difflib

The difflib library can also estimate similarity, but it isn't specifically designed for MED. However, it provides close matching tools.

Example:

import difflib

# Input strings

str1 = "kitten"

str2 = "sitting"

# Find similarity ratio (not MED but related)

similarity = difflib.SequenceMatcher(None, str1, str2).ratio()

print(f"Similarity Ratio (difflib): {similarity}")

5 of 8

Minimum Edit Distance

4. Custom Implementation Using Dynamic Programming

You can implement MED manually using dynamic programming for a deeper understanding.

Example:

def minimum_edit_distance(str1, str2):

m, n = len(str1), len(str2)

dp = [[0] * (n + 1) for _ in range(m + 1)]

# Initialize the table

for i in range(m + 1):

dp[i][0] = i

for j in range(n + 1):

dp[0][j] = j

6 of 8

Minimum Edit Distance

# Fill the table

for i in range(1, m + 1):

for j in range(1, n + 1):

if str1[i - 1] == str2[j - 1]:

dp[i][j] = dp[i - 1][j - 1]

else:

dp[i][j] = 1 + min(dp[i - 1][j], # Deletion

dp[i][j - 1], # Insertion

dp[i - 1][j - 1]) # Substitution

return dp[m][n]

# Input strings

str1 = "kitten"

str2 = "sitting"

# Calculate Minimum Edit Distance

med = minimum_edit_distance(str1, str2)

print(f"Minimum Edit Distance (Custom DP): {med}")

7 of 8

Minimum Edit Distance

5. Using rapidfuzz

The rapidfuzz library is a fast, fuzzy string matching library that also provides distance calculations.

Installation:

pip install rapidfuzz

Example:

from rapidfuzz.distance import Levenshtein

# Input strings

str1 = "kitten"

str2 = "sitting"

# Calculate Minimum Edit Distance

med = Levenshtein.distance(str1, str2)

print(f"Minimum Edit Distance (RapidFuzz): {med}")

8 of 8

Thank You