CS458 Natural language Processing
Self-study 3
Minimum Edit Distance
Krishnendu Ghosh
Department of Computer Science & Engineering
Indian Institute of Information Technology Dharwad
Minimum Edit Distance
1. Using nltk
The nltk.edit_distance function computes the MED between two strings.
Example:
from nltk.metrics import edit_distance
# Input strings
str1 = "kitten"
str2 = "sitting"
# Calculate Minimum Edit Distance
med = edit_distance(str1, str2)
print(f"Minimum Edit Distance (NLTK): {med}")
Minimum Edit Distance
2. Using Levenshtein Library
The Levenshtein library provides a faster implementation for computing the MED.
Installation:
pip install python-Levenshtein
Example:
import Levenshtein
# Input strings
str1 = "kitten"
str2 = "sitting"
# Calculate Minimum Edit Distance
med = Levenshtein.distance(str1, str2)
print(f"Minimum Edit Distance (Levenshtein): {med}")
Minimum Edit Distance
Using difflib
The difflib library can also estimate similarity, but it isn't specifically designed for MED. However, it provides close matching tools.
Example:
import difflib
# Input strings
str1 = "kitten"
str2 = "sitting"
# Find similarity ratio (not MED but related)
similarity = difflib.SequenceMatcher(None, str1, str2).ratio()
print(f"Similarity Ratio (difflib): {similarity}")
Minimum Edit Distance
4. Custom Implementation Using Dynamic Programming
You can implement MED manually using dynamic programming for a deeper understanding.
Example:
def minimum_edit_distance(str1, str2):
m, n = len(str1), len(str2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
# Initialize the table
for i in range(m + 1):
dp[i][0] = i
for j in range(n + 1):
dp[0][j] = j
Minimum Edit Distance
# Fill the table
for i in range(1, m + 1):
for j in range(1, n + 1):
if str1[i - 1] == str2[j - 1]:
dp[i][j] = dp[i - 1][j - 1]
else:
dp[i][j] = 1 + min(dp[i - 1][j], # Deletion
dp[i][j - 1], # Insertion
dp[i - 1][j - 1]) # Substitution
return dp[m][n]
# Input strings
str1 = "kitten"
str2 = "sitting"
# Calculate Minimum Edit Distance
med = minimum_edit_distance(str1, str2)
print(f"Minimum Edit Distance (Custom DP): {med}")
Minimum Edit Distance
5. Using rapidfuzz
The rapidfuzz library is a fast, fuzzy string matching library that also provides distance calculations.
Installation:
pip install rapidfuzz
Example:
from rapidfuzz.distance import Levenshtein
# Input strings
str1 = "kitten"
str2 = "sitting"
# Calculate Minimum Edit Distance
med = Levenshtein.distance(str1, str2)
print(f"Minimum Edit Distance (RapidFuzz): {med}")
Thank You