Edit Distance (Levenshtein Distance)
The Edit Distance problem, also known as Levenshtein Distance, is a fundamental string algorithm that measures the minimum number of single-character edits needed to transform one string into another. This problem perfectly demonstrates Pattern 4: String DP, where we build solutions by considering characters from both strings systematically.
Understanding Edit Distance (Levenshtein Distance)
What is Edit Distance?
Edit Distance answers the question: "What is the minimum number of operations needed to transform string A into string B?"
The three allowed operations are:
- Insert a character (cost: 1)
- Delete a character (cost: 1)
- Replace (substitute) a character (cost: 1)
Edit distance is everywhere in modern computing:
- Spell checkers find the closest correctly-spelled word
- DNA sequence alignment measures genetic similarity
- Git diff computes minimal changes between file versions
- Autocorrect suggests words based on typing errors
- Plagiarism detection identifies similar documents
The Core Insight
The key insight of Edit Distance is that we can break down the problem by considering what happens at the end of both strings:
Given strings word1[0...i] and word2[0...j], we have three choices:
-
Characters match (
word1[i] == word2[j]):- No operation needed! Take the edit distance from
dp[i-1][j-1]
- No operation needed! Take the edit distance from
-
Characters don't match - we must choose the minimum cost operation:
- Replace: Change
word1[i]toword2[j]→ cost =1 + dp[i-1][j-1] - Delete: Remove
word1[i]→ cost =1 + dp[i-1][j] - Insert: Add
word2[j]→ cost =1 + dp[i][j-1]
- Replace: Change
This follows our String DP template perfectly:
dp[i][j] = relationship between word1[0...i-1] and word2[0...j-1]
if word1[i-1] == word2[j-1]:
dp[i][j] = dp[i-1][j-1] // No operation needed
else:
dp[i][j] = 1 + min(
dp[i-1][j-1], // Replace
dp[i-1][j], // Delete
dp[i][j-1] // Insert
)
Visualizing the Operations
Let's understand what each operation means conceptually:
Example: Transform "HORSE" → "ROS"
- Delete 'H': HORSE → ORSE (1 operation)
- Delete 'R': ORSE → OSE (2 operations)
- Replace 'S' with 'R': Wait, this doesn't match our target...
Let's think more carefully about the optimal sequence (we'll compute this properly in the next section):
- Replace 'H' with 'R': HORSE → RORSE
- Delete 'R': RORSE → ROSE
- Delete 'E': ROSE → ROS
Total: 3 operations
But is this optimal? Let's use DP to find out!
Complete Worked Example with Full DP Table
Let's solve: Transform "HORSE" → "ROS"
Step 1: Initialize the DP Table
dp[i][j] = minimum edit distance between word1[0...i-1] and word2[0...j-1]
The table dimensions are (len(word1) + 1) × (len(word2) + 1) = 6 × 4.
Base cases:
dp[0][j]= j (insert j characters to transform empty string to word2[0...j-1])dp[i][0]= i (delete i characters to transform word1[0...i-1] to empty string)
Initial table:
"" R O S
"" 0 1 2 3
H 1
O 2
R 3
S 4
E 5
dp[0][j] = j: To build "ROS" from "" requires 3 insertions: "" → "R" → "RO" → "ROS"dp[i][0] = i: To build "" from "HORSE" requires 5 deletions: "HORSE" → "HORS" → "HOR" → "HO" → "H" → ""
Step 2: Fill the DP Table Cell by Cell
Let's fill each cell methodically, showing the reasoning:
Row 1: Matching 'H' with substrings of "ROS"
Cell dp[1][1]: "H" → "R"
word1[0]= 'H',word2[0]= 'R' → NOT equal- Replace:
1 + dp[0][0]= 1 + 0 = 1 - Delete 'H':
1 + dp[0][1]= 1 + 1 = 2 - Insert 'R':
1 + dp[1][0]= 1 + 1 = 2 - Minimum = 1 (replace 'H' with 'R')
Cell dp[1][2]: "H" → "RO"
- 'H' ≠ 'O'
- Replace:
1 + dp[0][1]= 1 + 1 = 2 - Delete:
1 + dp[0][2]= 1 + 2 = 3 - Insert:
1 + dp[1][1]= 1 + 1 = 2 - Minimum = 2 (replace 'H'→'R', then insert 'O')
Cell dp[1][3]: "H" → "ROS"
- 'H' ≠ 'S'
- Replace:
1 + dp[0][2]= 1 + 2 = 3 - Delete:
1 + dp[0][3]= 1 + 3 = 4 - Insert:
1 + dp[1][2]= 1 + 2 = 3 - Minimum = 3
"" R O S
"" 0 1 2 3
H 1 1 2 3
O 2
R 3
S 4
E 5
Row 2: Matching 'HO' with substrings of "ROS"
Cell dp[2][1]: "HO" → "R"
- 'O' ≠ 'R'
- Replace:
1 + dp[1][0]= 1 + 1 = 2 - Delete:
1 + dp[1][1]= 1 + 1 = 2 - Insert:
1 + dp[2][0]= 1 + 2 = 3 - Minimum = 2
Cell dp[2][2]: "HO" → "RO"
- 'O' == 'O' ✓ Match!
- No operation needed:
dp[1][1]= 1 - Value = 1 (we just need to handle "H" → "R", which costs 1)
Cell dp[2][3]: "HO" → "ROS"
- 'O' ≠ 'S'
- Replace:
1 + dp[1][2]= 1 + 2 = 3 - Delete:
1 + dp[1][3]= 1 + 3 = 4 - Insert:
1 + dp[2][2]= 1 + 1 = 2 - Minimum = 2
"" R O S
"" 0 1 2 3
H 1 1 2 3
O 2 2 1 2
R 3
S 4
E 5
Row 3: Matching 'HOR' with substrings of "ROS"
Cell dp[3][1]: "HOR" → "R"
- 'R' == 'R' ✓ Match!
- No operation:
dp[2][0]= 2 - Value = 2 (delete 'H' and 'O')
Cell dp[3][2]: "HOR" → "RO"
- 'R' ≠ 'O'
- Replace:
1 + dp[2][1]= 1 + 2 = 3 - Delete:
1 + dp[2][2]= 1 + 1 = 2 - Insert:
1 + dp[3][1]= 1 + 2 = 3 - Minimum = 2
Cell dp[3][3]: "HOR" → "ROS"
- 'R' ≠ 'S'
- Replace:
1 + dp[2][2]= 1 + 1 = 2 - Delete:
1 + dp[2][3]= 1 + 2 = 3 - Insert:
1 + dp[3][2]= 1 + 2 = 3 - Minimum = 2
"" R O S
"" 0 1 2 3
H 1 1 2 3
O 2 2 1 2
R 3 2 2 2
S 4
E 5
Row 4: Matching 'HORS' with substrings of "ROS"
Cell dp[4][1]: "HORS" → "R"
- 'S' ≠ 'R'
- Replace:
1 + dp[3][0]= 1 + 3 = 4 - Delete:
1 + dp[3][1]= 1 + 2 = 3 - Insert:
1 + dp[4][0]= 1 + 4 = 5 - Minimum = 3
Cell dp[4][2]: "HORS" → "RO"
- 'S' ≠ 'O'
- Replace:
1 + dp[3][1]= 1 + 2 = 3 - Delete:
1 + dp[3][2]= 1 + 2 = 3 - Insert:
1 + dp[4][1]= 1 + 3 = 4 - Minimum = 3
Cell dp[4][3]: "HORS" → "ROS"
- 'S' == 'S' ✓ Match!
- No operation:
dp[3][2]= 2 - Value = 2 (we solved "HOR" → "RO" with 2 operations)
"" R O S
"" 0 1 2 3
H 1 1 2 3
O 2 2 1 2
R 3 2 2 2
S 4 3 3 2
E 5
Row 5: Matching 'HORSE' with substrings of "ROS"
Cell dp[5][1]: "HORSE" → "R"
- 'E' ≠ 'R'
- Replace:
1 + dp[4][0]= 1 + 4 = 5 - Delete:
1 + dp[4][1]= 1 + 3 = 4 - Insert:
1 + dp[5][0]= 1 + 5 = 6 - Minimum = 4
Cell dp[5][2]: "HORSE" → "RO"
- 'E' ≠ 'O'
- Replace:
1 + dp[4][1]= 1 + 3 = 4 - Delete:
1 + dp[4][2]= 1 + 3 = 4 - Insert:
1 + dp[5][1]= 1 + 4 = 5 - Minimum = 4
Cell dp[5][3]: "HORSE" → "ROS"
- 'E' ≠ 'S'
- Replace:
1 + dp[4][2]= 1 + 3 = 4 - Delete:
1 + dp[4][3]= 1 + 2 = 3 ← This is the minimum! - Insert:
1 + dp[5][2]= 1 + 4 = 5 - Minimum = 3
Final DP Table
"" R O S
"" 0 1 2 3
H 1 1 2 3
O 2 2 1 2
R 3 2 2 2
S 4 3 3 2
E 5 4 4 3
Minimum Edit Distance: 3
The value at dp[5][3] tells us the minimum number of operations to transform "HORSE" into "ROS" is 3.
Step 3: Traceback - Finding the Actual Edits
Now let's trace back to find one optimal sequence of operations. We start at dp[5][3] and work backwards:
Current: dp[5][3] = 3 (HORSE → ROS)
- We got here from dp[4][3] = 2 by DELETE 'E'
- Operation 1: Delete 'E' → "HORS"
Current: dp[4][3] = 2 (HORS → ROS)
- 'S' == 'S', came from dp[3][2] = 2 (no operation)
- Move to dp[3][2]
Current: dp[3][2] = 2 (HOR → RO)
- We got here from dp[2][2] = 1 by DELETE 'R'
- Operation 2: Delete 'R' → "HO"
Current: dp[2][2] = 1 (HO → RO)
- 'O' == 'O', came from dp[1][1] = 1 (no operation)
- Move to dp[1][1]
Current: dp[1][1] = 1 (H → R)
- We got here from dp[0][0] = 0 by REPLACE 'H' with 'R'
- Operation 3: Replace 'H' → 'R' → "RO"
Done! (reached dp[0][0])
Optimal Edit Sequence:
- Replace 'H' with 'R': "HORSE" → "RORSE"
- Delete 'R': "RORSE" → "ROSE"
- Delete 'E': "ROSE" → "ROS"
Wait, that's 3 operations but the edits don't look right. Let me reconsider the traceback more carefully:
Actually, when we trace back, we're working backwards through the strings. Let me redo this properly:
Correct Traceback (starting from the end):
Position (5,3): "HORSE" → "ROS", value = 3
- Came from (4,3) via DELETE 'E': "HORS" → "ROS" (cost 2 + 1 = 3)
Position (4,3): "HORS" → "ROS", value = 2
- 'S' == 'S', came from (3,2) with no operation
Position (3,2): "HOR" → "RO", value = 2
- Came from (2,2) via DELETE 'R': "HO" → "RO" (cost 1 + 1 = 2)
Position (2,2): "HO" → "RO", value = 1
- 'O' == 'O', came from (1,1) with no operation
Position (1,1): "H" → "R", value = 1
- Came from (0,0) via REPLACE: "" → "" then add operations
Forward Edit Sequence:
- Replace 'H' with 'R': "HORSE" → "RORSE"
- Delete 'R' (3rd position): "RORSE" → "ROSE"
- Delete 'E': "ROSE" → "ROS"
The traceback tells us HOW we achieved the minimum distance:
- Each step moves diagonally (match/replace), up (delete), or left (insert)
- Multiple optimal paths may exist
- The DP value tells us the cost is optimal, the traceback shows one way to achieve it
Why This Solution is Optimal
Could we do better than 3 operations? Let's verify:
- "HORSE" has 5 characters, "ROS" has 3 characters
- We must "get rid of" at least 2 characters somehow (5 - 3 = 2)
- Looking at the strings, only 'O' and 'S' are in the right positions
- We need to: fix 'H', deal with 'R', fix 'E' = at least 3 operations
The DP algorithm guarantees we found the optimal solution!
Applications and Variations
Real-World Applications
1. Spell Checking When you type "teh", the spell checker computes edit distance to all dictionary words:
- "teh" → "the" (distance = 2: delete 'h', insert 'h' at correct position... or just SWAP, but that's not in basic edit distance)
- "teh" → "tea" (distance = 1: replace 'h' with 'a')
- "teh" → "ten" (distance = 1: replace 'h' with 'n')
The spell checker suggests words with smallest edit distance.
2. DNA Sequence Alignment Biologists use edit distance to measure genetic similarity:
DNA1: AGGTAB
DNA2: GXTXAYB
Finding the edit distance helps identify mutations, insertions, and deletions in genetic code.
3. Version Control (Git Diff) Git uses a variation of edit distance to compute minimal changes between file versions:
Old: "Hello World"
New: "Hello Beautiful World"
The diff shows: INSERT "Beautiful " at position 6.
4. Autocorrect on Mobile When you type "restaurent" → "restaurant", the phone computes edit distances to all known words and suggests the closest matches.
LeetCode Problem Connection
This is the classic problem we've been solving!
Problem Statement:
Given two strings word1 and word2, return the minimum number of operations required to convert word1 to word2.
You have the following three operations permitted on a word:
- Insert a character
- Delete a character
- Replace a character
Example:
Input: word1 = "horse", word2 = "ros"
Output: 3
Explanation:
horse → rorse (replace 'h' with 'r')
rorse → rose (remove 'r')
rose → ros (remove 'e')
This is exactly what we solved above!
Implementation
Here's the complete Java implementation:
public class EditDistance {
/**
* Computes the minimum edit distance between two strings.
*
* Time Complexity: O(m * n) where m = word1.length(), n = word2.length()
* Space Complexity: O(m * n) for the DP table
*
* @param word1 The source string
* @param word2 The target string
* @return The minimum number of operations to transform word1 into word2
*/
public int minDistance(String word1, String word2) {
int m = word1.length();
int n = word2.length();
// dp[i][j] = min operations to convert word1[0...i-1] to word2[0...j-1]
int[][] dp = new int[m + 1][n + 1];
// Base cases: transforming from/to empty string
for (int i = 0; i <= m; i++) {
dp[i][0] = i; // Delete all characters
}
for (int j = 0; j <= n; j++) {
dp[0][j] = j; // Insert all characters
}
// Fill the DP table
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
// Characters match - no operation needed
dp[i][j] = dp[i - 1][j - 1];
} else {
// Take minimum of three operations
dp[i][j] = 1 + Math.min(
dp[i - 1][j - 1], // Replace
Math.min(
dp[i - 1][j], // Delete from word1
dp[i][j - 1] // Insert into word1
)
);
}
}
}
return dp[m][n];
}
/**
* Demonstrates the solution with detailed output.
*/
public static void main(String[] args) {
EditDistance ed = new EditDistance();
// Test case 1: Classic example
String word1 = "horse";
String word2 = "ros";
System.out.println("Transform '" + word1 + "' → '" + word2 + "'");
System.out.println("Minimum operations: " + ed.minDistance(word1, word2));
System.out.println();
// Test case 2: Longer strings
word1 = "intention";
word2 = "execution";
System.out.println("Transform '" + word1 + "' → '" + word2 + "'");
System.out.println("Minimum operations: " + ed.minDistance(word1, word2));
System.out.println();
// Test case 3: Edge cases
System.out.println("Empty strings: " + ed.minDistance("", ""));
System.out.println("One empty: " + ed.minDistance("abc", ""));
System.out.println("Identical: " + ed.minDistance("same", "same"));
}
}
Space-Optimized Version
Since we only need the previous row to compute the current row, we can optimize space to O(n):
public int minDistanceOptimized(String word1, String word2) {
int m = word1.length();
int n = word2.length();
// Only keep two rows: previous and current
int[] prev = new int[n + 1];
int[] curr = new int[n + 1];
// Initialize first row
for (int j = 0; j <= n; j++) {
prev[j] = j;
}
// Fill row by row
for (int i = 1; i <= m; i++) {
curr[0] = i; // Base case for this row
for (int j = 1; j <= n; j++) {
if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
curr[j] = prev[j - 1];
} else {
curr[j] = 1 + Math.min(
prev[j - 1], // Replace
Math.min(prev[j], curr[j - 1]) // Delete or Insert
);
}
}
// Swap arrays for next iteration
int[] temp = prev;
prev = curr;
curr = temp;
}
return prev[n];
}
By using rolling arrays, we reduce space from O(m×n) to O(n), while maintaining the same O(m×n) time complexity. This is crucial for very long strings!
Important Variations
1. Minimum ASCII Delete Sum (LeetCode 712)
Instead of counting operations, minimize the ASCII sum of deleted characters:
public int minimumDeleteSum(String s1, String s2) {
int m = s1.length(), n = s2.length();
int[][] dp = new int[m + 1][n + 1];
// Base cases: sum of ASCII values
for (int i = 1; i <= m; i++) {
dp[i][0] = dp[i-1][0] + s1.charAt(i-1);
}
for (int j = 1; j <= n; j++) {
dp[0][j] = dp[0][j-1] + s2.charAt(j-1);
}
// DP transition
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (s1.charAt(i-1) == s2.charAt(j-1)) {
dp[i][j] = dp[i-1][j-1]; // No cost when matching
} else {
dp[i][j] = Math.min(
dp[i-1][j] + s1.charAt(i-1), // Delete from s1
dp[i][j-1] + s2.charAt(j-1) // Delete from s2
);
}
}
}
return dp[m][n];
}
2. One Edit Distance (LeetCode 161)
Check if two strings are exactly one edit apart:
public boolean isOneEditDistance(String s, String t) {
int m = s.length(), n = t.length();
// Length difference must be at most 1
if (Math.abs(m - n) > 1) return false;
// Ensure s is the shorter string
if (m > n) return isOneEditDistance(t, s);
for (int i = 0; i < m; i++) {
if (s.charAt(i) != t.charAt(i)) {
if (m == n) {
// Must be a replace operation
return s.substring(i + 1).equals(t.substring(i + 1));
} else {
// Must be an insert operation (delete from t)
return s.substring(i).equals(t.substring(i + 1));
}
}
}
// All characters match - strings differ only if length differs by 1
return m + 1 == n;
}
3. Edit Distance with Limited Operations
Sometimes we have constraints on which operations are allowed:
/**
* Edit distance allowing only insert and delete (no replace).
* This effectively becomes finding the Longest Common Subsequence (LCS)
* and then: insertions + deletions = (m - lcs) + (n - lcs)
*/
public int minDistanceInsertDeleteOnly(String word1, String word2) {
int lcs = longestCommonSubsequence(word1, word2);
return word1.length() + word2.length() - 2 * lcs;
}
private int longestCommonSubsequence(String s1, String s2) {
int m = s1.length(), n = s2.length();
int[][] dp = new int[m + 1][n + 1];
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (s1.charAt(i-1) == s2.charAt(j-1)) {
dp[i][j] = dp[i-1][j-1] + 1;
} else {
dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);
}
}
}
return dp[m][n];
}
Key Insights for Interviews
- Off-by-one errors: Remember
dp[i][j]represents substrings ending ati-1andj-1 - Wrong base cases: Empty string transformations must be handled correctly
- Forgetting character matching: When characters match, no operation is needed!
- Traceback direction: Work backwards from
dp[m][n]to reconstruct the path
When to use Edit Distance:
- ✅ Measuring string similarity
- ✅ Spell checking and autocorrect
- ✅ DNA sequence alignment
- ✅ Finding closest matches in a dictionary
- ✅ Version control diff algorithms
Pattern Recognition:
- Two strings → 2D DP table
- Operations that modify strings → consider all possibilities
- Optimal substructure → DP is the right approach
- Need actual edit sequence → implement traceback
Time Complexity: O(m × n) where m and n are string lengths Space Complexity: O(m × n), optimizable to O(n) with rolling arrays
Summary
Edit Distance (Levenshtein Distance) is a fundamental algorithm in Pattern 4: String DP that measures the minimum operations needed to transform one string into another. The key insights are:
- Build solutions bottom-up considering all possible operations
- Characters match → no cost, take diagonal value
- Characters differ → take minimum of (replace, delete, insert)
- Applications everywhere from spell checkers to DNA analysis
Master this pattern and you'll handle a wide range of string transformation problems with confidence!