Skip to main content

Edit Distance (Levenshtein Distance)

The Edit Distance problem, also known as Levenshtein Distance, is a fundamental string algorithm that measures the minimum number of single-character edits needed to transform one string into another. This problem perfectly demonstrates Pattern 4: String DP, where we build solutions by considering characters from both strings systematically.

Understanding Edit Distance (Levenshtein Distance)

What is Edit Distance?

Edit Distance answers the question: "What is the minimum number of operations needed to transform string A into string B?"

The three allowed operations are:

  1. Insert a character (cost: 1)
  2. Delete a character (cost: 1)
  3. Replace (substitute) a character (cost: 1)
Why This Matters

Edit distance is everywhere in modern computing:

  • Spell checkers find the closest correctly-spelled word
  • DNA sequence alignment measures genetic similarity
  • Git diff computes minimal changes between file versions
  • Autocorrect suggests words based on typing errors
  • Plagiarism detection identifies similar documents

The Core Insight

The key insight of Edit Distance is that we can break down the problem by considering what happens at the end of both strings:

Given strings word1[0...i] and word2[0...j], we have three choices:

  1. Characters match (word1[i] == word2[j]):

    • No operation needed! Take the edit distance from dp[i-1][j-1]
  2. Characters don't match - we must choose the minimum cost operation:

    • Replace: Change word1[i] to word2[j] → cost = 1 + dp[i-1][j-1]
    • Delete: Remove word1[i] → cost = 1 + dp[i-1][j]
    • Insert: Add word2[j] → cost = 1 + dp[i][j-1]
Pattern 4: String DP Template

This follows our String DP template perfectly:

dp[i][j] = relationship between word1[0...i-1] and word2[0...j-1]

if word1[i-1] == word2[j-1]:
dp[i][j] = dp[i-1][j-1] // No operation needed
else:
dp[i][j] = 1 + min(
dp[i-1][j-1], // Replace
dp[i-1][j], // Delete
dp[i][j-1] // Insert
)

Visualizing the Operations

Let's understand what each operation means conceptually:

Example: Transform "HORSE" → "ROS"

  • Delete 'H': HORSE → ORSE (1 operation)
  • Delete 'R': ORSE → OSE (2 operations)
  • Replace 'S' with 'R': Wait, this doesn't match our target...

Let's think more carefully about the optimal sequence (we'll compute this properly in the next section):

  1. Replace 'H' with 'R': HORSE → RORSE
  2. Delete 'R': RORSE → ROSE
  3. Delete 'E': ROSE → ROS

Total: 3 operations

But is this optimal? Let's use DP to find out!

Complete Worked Example with Full DP Table

Let's solve: Transform "HORSE" → "ROS"

Step 1: Initialize the DP Table

dp[i][j] = minimum edit distance between word1[0...i-1] and word2[0...j-1]

The table dimensions are (len(word1) + 1) × (len(word2) + 1) = 6 × 4.

Base cases:

  • dp[0][j] = j (insert j characters to transform empty string to word2[0...j-1])
  • dp[i][0] = i (delete i characters to transform word1[0...i-1] to empty string)

Initial table:

        ""  R   O   S
"" 0 1 2 3
H 1
O 2
R 3
S 4
E 5
Why These Base Cases?
  • dp[0][j] = j: To build "ROS" from "" requires 3 insertions: "" → "R" → "RO" → "ROS"
  • dp[i][0] = i: To build "" from "HORSE" requires 5 deletions: "HORSE" → "HORS" → "HOR" → "HO" → "H" → ""

Step 2: Fill the DP Table Cell by Cell

Let's fill each cell methodically, showing the reasoning:

Row 1: Matching 'H' with substrings of "ROS"

Cell dp[1][1]: "H" → "R"

  • word1[0] = 'H', word2[0] = 'R' → NOT equal
  • Replace: 1 + dp[0][0] = 1 + 0 = 1
  • Delete 'H': 1 + dp[0][1] = 1 + 1 = 2
  • Insert 'R': 1 + dp[1][0] = 1 + 1 = 2
  • Minimum = 1 (replace 'H' with 'R')

Cell dp[1][2]: "H" → "RO"

  • 'H' ≠ 'O'
  • Replace: 1 + dp[0][1] = 1 + 1 = 2
  • Delete: 1 + dp[0][2] = 1 + 2 = 3
  • Insert: 1 + dp[1][1] = 1 + 1 = 2
  • Minimum = 2 (replace 'H'→'R', then insert 'O')

Cell dp[1][3]: "H" → "ROS"

  • 'H' ≠ 'S'
  • Replace: 1 + dp[0][2] = 1 + 2 = 3
  • Delete: 1 + dp[0][3] = 1 + 3 = 4
  • Insert: 1 + dp[1][2] = 1 + 2 = 3
  • Minimum = 3
        ""  R   O   S
"" 0 1 2 3
H 1 1 2 3
O 2
R 3
S 4
E 5

Row 2: Matching 'HO' with substrings of "ROS"

Cell dp[2][1]: "HO" → "R"

  • 'O' ≠ 'R'
  • Replace: 1 + dp[1][0] = 1 + 1 = 2
  • Delete: 1 + dp[1][1] = 1 + 1 = 2
  • Insert: 1 + dp[2][0] = 1 + 2 = 3
  • Minimum = 2

Cell dp[2][2]: "HO" → "RO"

  • 'O' == 'O' ✓ Match!
  • No operation needed: dp[1][1] = 1
  • Value = 1 (we just need to handle "H" → "R", which costs 1)

Cell dp[2][3]: "HO" → "ROS"

  • 'O' ≠ 'S'
  • Replace: 1 + dp[1][2] = 1 + 2 = 3
  • Delete: 1 + dp[1][3] = 1 + 3 = 4
  • Insert: 1 + dp[2][2] = 1 + 1 = 2
  • Minimum = 2
        ""  R   O   S
"" 0 1 2 3
H 1 1 2 3
O 2 2 1 2
R 3
S 4
E 5

Row 3: Matching 'HOR' with substrings of "ROS"

Cell dp[3][1]: "HOR" → "R"

  • 'R' == 'R' ✓ Match!
  • No operation: dp[2][0] = 2
  • Value = 2 (delete 'H' and 'O')

Cell dp[3][2]: "HOR" → "RO"

  • 'R' ≠ 'O'
  • Replace: 1 + dp[2][1] = 1 + 2 = 3
  • Delete: 1 + dp[2][2] = 1 + 1 = 2
  • Insert: 1 + dp[3][1] = 1 + 2 = 3
  • Minimum = 2

Cell dp[3][3]: "HOR" → "ROS"

  • 'R' ≠ 'S'
  • Replace: 1 + dp[2][2] = 1 + 1 = 2
  • Delete: 1 + dp[2][3] = 1 + 2 = 3
  • Insert: 1 + dp[3][2] = 1 + 2 = 3
  • Minimum = 2
        ""  R   O   S
"" 0 1 2 3
H 1 1 2 3
O 2 2 1 2
R 3 2 2 2
S 4
E 5

Row 4: Matching 'HORS' with substrings of "ROS"

Cell dp[4][1]: "HORS" → "R"

  • 'S' ≠ 'R'
  • Replace: 1 + dp[3][0] = 1 + 3 = 4
  • Delete: 1 + dp[3][1] = 1 + 2 = 3
  • Insert: 1 + dp[4][0] = 1 + 4 = 5
  • Minimum = 3

Cell dp[4][2]: "HORS" → "RO"

  • 'S' ≠ 'O'
  • Replace: 1 + dp[3][1] = 1 + 2 = 3
  • Delete: 1 + dp[3][2] = 1 + 2 = 3
  • Insert: 1 + dp[4][1] = 1 + 3 = 4
  • Minimum = 3

Cell dp[4][3]: "HORS" → "ROS"

  • 'S' == 'S' ✓ Match!
  • No operation: dp[3][2] = 2
  • Value = 2 (we solved "HOR" → "RO" with 2 operations)
        ""  R   O   S
"" 0 1 2 3
H 1 1 2 3
O 2 2 1 2
R 3 2 2 2
S 4 3 3 2
E 5

Row 5: Matching 'HORSE' with substrings of "ROS"

Cell dp[5][1]: "HORSE" → "R"

  • 'E' ≠ 'R'
  • Replace: 1 + dp[4][0] = 1 + 4 = 5
  • Delete: 1 + dp[4][1] = 1 + 3 = 4
  • Insert: 1 + dp[5][0] = 1 + 5 = 6
  • Minimum = 4

Cell dp[5][2]: "HORSE" → "RO"

  • 'E' ≠ 'O'
  • Replace: 1 + dp[4][1] = 1 + 3 = 4
  • Delete: 1 + dp[4][2] = 1 + 3 = 4
  • Insert: 1 + dp[5][1] = 1 + 4 = 5
  • Minimum = 4

Cell dp[5][3]: "HORSE" → "ROS"

  • 'E' ≠ 'S'
  • Replace: 1 + dp[4][2] = 1 + 3 = 4
  • Delete: 1 + dp[4][3] = 1 + 2 = 3 ← This is the minimum!
  • Insert: 1 + dp[5][2] = 1 + 4 = 5
  • Minimum = 3

Final DP Table

        ""  R   O   S
"" 0 1 2 3
H 1 1 2 3
O 2 2 1 2
R 3 2 2 2
S 4 3 3 2
E 5 4 4 3
The Answer

Minimum Edit Distance: 3

The value at dp[5][3] tells us the minimum number of operations to transform "HORSE" into "ROS" is 3.

Step 3: Traceback - Finding the Actual Edits

Now let's trace back to find one optimal sequence of operations. We start at dp[5][3] and work backwards:

Current: dp[5][3] = 3 (HORSE → ROS)
- We got here from dp[4][3] = 2 by DELETE 'E'
- Operation 1: Delete 'E' → "HORS"

Current: dp[4][3] = 2 (HORS → ROS)
- 'S' == 'S', came from dp[3][2] = 2 (no operation)
- Move to dp[3][2]

Current: dp[3][2] = 2 (HOR → RO)
- We got here from dp[2][2] = 1 by DELETE 'R'
- Operation 2: Delete 'R' → "HO"

Current: dp[2][2] = 1 (HO → RO)
- 'O' == 'O', came from dp[1][1] = 1 (no operation)
- Move to dp[1][1]

Current: dp[1][1] = 1 (H → R)
- We got here from dp[0][0] = 0 by REPLACE 'H' with 'R'
- Operation 3: Replace 'H' → 'R' → "RO"

Done! (reached dp[0][0])

Optimal Edit Sequence:

  1. Replace 'H' with 'R': "HORSE" → "RORSE"
  2. Delete 'R': "RORSE" → "ROSE"
  3. Delete 'E': "ROSE" → "ROS"

Wait, that's 3 operations but the edits don't look right. Let me reconsider the traceback more carefully:

Actually, when we trace back, we're working backwards through the strings. Let me redo this properly:

Correct Traceback (starting from the end):

Position (5,3): "HORSE" → "ROS", value = 3

  • Came from (4,3) via DELETE 'E': "HORS" → "ROS" (cost 2 + 1 = 3)

Position (4,3): "HORS" → "ROS", value = 2

  • 'S' == 'S', came from (3,2) with no operation

Position (3,2): "HOR" → "RO", value = 2

  • Came from (2,2) via DELETE 'R': "HO" → "RO" (cost 1 + 1 = 2)

Position (2,2): "HO" → "RO", value = 1

  • 'O' == 'O', came from (1,1) with no operation

Position (1,1): "H" → "R", value = 1

  • Came from (0,0) via REPLACE: "" → "" then add operations

Forward Edit Sequence:

  1. Replace 'H' with 'R': "HORSE" → "RORSE"
  2. Delete 'R' (3rd position): "RORSE" → "ROSE"
  3. Delete 'E': "ROSE" → "ROS"
Understanding Traceback

The traceback tells us HOW we achieved the minimum distance:

  • Each step moves diagonally (match/replace), up (delete), or left (insert)
  • Multiple optimal paths may exist
  • The DP value tells us the cost is optimal, the traceback shows one way to achieve it

Why This Solution is Optimal

Could we do better than 3 operations? Let's verify:

  • "HORSE" has 5 characters, "ROS" has 3 characters
  • We must "get rid of" at least 2 characters somehow (5 - 3 = 2)
  • Looking at the strings, only 'O' and 'S' are in the right positions
  • We need to: fix 'H', deal with 'R', fix 'E' = at least 3 operations

The DP algorithm guarantees we found the optimal solution!

Applications and Variations

Real-World Applications

1. Spell Checking When you type "teh", the spell checker computes edit distance to all dictionary words:

  • "teh" → "the" (distance = 2: delete 'h', insert 'h' at correct position... or just SWAP, but that's not in basic edit distance)
  • "teh" → "tea" (distance = 1: replace 'h' with 'a')
  • "teh" → "ten" (distance = 1: replace 'h' with 'n')

The spell checker suggests words with smallest edit distance.

2. DNA Sequence Alignment Biologists use edit distance to measure genetic similarity:

DNA1: AGGTAB
DNA2: GXTXAYB

Finding the edit distance helps identify mutations, insertions, and deletions in genetic code.

3. Version Control (Git Diff) Git uses a variation of edit distance to compute minimal changes between file versions:

Old: "Hello World"
New: "Hello Beautiful World"

The diff shows: INSERT "Beautiful " at position 6.

4. Autocorrect on Mobile When you type "restaurent" → "restaurant", the phone computes edit distances to all known words and suggests the closest matches.

LeetCode Problem Connection

LeetCode 72: Edit Distance

This is the classic problem we've been solving!

Problem Statement:

Given two strings word1 and word2, return the minimum number of operations required to convert word1 to word2.

You have the following three operations permitted on a word:

  • Insert a character
  • Delete a character
  • Replace a character

Example:

Input: word1 = "horse", word2 = "ros"
Output: 3
Explanation:
horse → rorse (replace 'h' with 'r')
rorse → rose (remove 'r')
rose → ros (remove 'e')

This is exactly what we solved above!

Implementation

Here's the complete Java implementation:

public class EditDistance {
/**
* Computes the minimum edit distance between two strings.
*
* Time Complexity: O(m * n) where m = word1.length(), n = word2.length()
* Space Complexity: O(m * n) for the DP table
*
* @param word1 The source string
* @param word2 The target string
* @return The minimum number of operations to transform word1 into word2
*/
public int minDistance(String word1, String word2) {
int m = word1.length();
int n = word2.length();

// dp[i][j] = min operations to convert word1[0...i-1] to word2[0...j-1]
int[][] dp = new int[m + 1][n + 1];

// Base cases: transforming from/to empty string
for (int i = 0; i <= m; i++) {
dp[i][0] = i; // Delete all characters
}
for (int j = 0; j <= n; j++) {
dp[0][j] = j; // Insert all characters
}

// Fill the DP table
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
// Characters match - no operation needed
dp[i][j] = dp[i - 1][j - 1];
} else {
// Take minimum of three operations
dp[i][j] = 1 + Math.min(
dp[i - 1][j - 1], // Replace
Math.min(
dp[i - 1][j], // Delete from word1
dp[i][j - 1] // Insert into word1
)
);
}
}
}

return dp[m][n];
}

/**
* Demonstrates the solution with detailed output.
*/
public static void main(String[] args) {
EditDistance ed = new EditDistance();

// Test case 1: Classic example
String word1 = "horse";
String word2 = "ros";
System.out.println("Transform '" + word1 + "' → '" + word2 + "'");
System.out.println("Minimum operations: " + ed.minDistance(word1, word2));
System.out.println();

// Test case 2: Longer strings
word1 = "intention";
word2 = "execution";
System.out.println("Transform '" + word1 + "' → '" + word2 + "'");
System.out.println("Minimum operations: " + ed.minDistance(word1, word2));
System.out.println();

// Test case 3: Edge cases
System.out.println("Empty strings: " + ed.minDistance("", ""));
System.out.println("One empty: " + ed.minDistance("abc", ""));
System.out.println("Identical: " + ed.minDistance("same", "same"));
}
}

Space-Optimized Version

Since we only need the previous row to compute the current row, we can optimize space to O(n):

public int minDistanceOptimized(String word1, String word2) {
int m = word1.length();
int n = word2.length();

// Only keep two rows: previous and current
int[] prev = new int[n + 1];
int[] curr = new int[n + 1];

// Initialize first row
for (int j = 0; j <= n; j++) {
prev[j] = j;
}

// Fill row by row
for (int i = 1; i <= m; i++) {
curr[0] = i; // Base case for this row

for (int j = 1; j <= n; j++) {
if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
curr[j] = prev[j - 1];
} else {
curr[j] = 1 + Math.min(
prev[j - 1], // Replace
Math.min(prev[j], curr[j - 1]) // Delete or Insert
);
}
}

// Swap arrays for next iteration
int[] temp = prev;
prev = curr;
curr = temp;
}

return prev[n];
}
Space Optimization

By using rolling arrays, we reduce space from O(m×n) to O(n), while maintaining the same O(m×n) time complexity. This is crucial for very long strings!

Important Variations

1. Minimum ASCII Delete Sum (LeetCode 712)

Instead of counting operations, minimize the ASCII sum of deleted characters:

public int minimumDeleteSum(String s1, String s2) {
int m = s1.length(), n = s2.length();
int[][] dp = new int[m + 1][n + 1];

// Base cases: sum of ASCII values
for (int i = 1; i <= m; i++) {
dp[i][0] = dp[i-1][0] + s1.charAt(i-1);
}
for (int j = 1; j <= n; j++) {
dp[0][j] = dp[0][j-1] + s2.charAt(j-1);
}

// DP transition
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (s1.charAt(i-1) == s2.charAt(j-1)) {
dp[i][j] = dp[i-1][j-1]; // No cost when matching
} else {
dp[i][j] = Math.min(
dp[i-1][j] + s1.charAt(i-1), // Delete from s1
dp[i][j-1] + s2.charAt(j-1) // Delete from s2
);
}
}
}

return dp[m][n];
}

2. One Edit Distance (LeetCode 161)

Check if two strings are exactly one edit apart:

public boolean isOneEditDistance(String s, String t) {
int m = s.length(), n = t.length();

// Length difference must be at most 1
if (Math.abs(m - n) > 1) return false;

// Ensure s is the shorter string
if (m > n) return isOneEditDistance(t, s);

for (int i = 0; i < m; i++) {
if (s.charAt(i) != t.charAt(i)) {
if (m == n) {
// Must be a replace operation
return s.substring(i + 1).equals(t.substring(i + 1));
} else {
// Must be an insert operation (delete from t)
return s.substring(i).equals(t.substring(i + 1));
}
}
}

// All characters match - strings differ only if length differs by 1
return m + 1 == n;
}

3. Edit Distance with Limited Operations

Sometimes we have constraints on which operations are allowed:

/**
* Edit distance allowing only insert and delete (no replace).
* This effectively becomes finding the Longest Common Subsequence (LCS)
* and then: insertions + deletions = (m - lcs) + (n - lcs)
*/
public int minDistanceInsertDeleteOnly(String word1, String word2) {
int lcs = longestCommonSubsequence(word1, word2);
return word1.length() + word2.length() - 2 * lcs;
}

private int longestCommonSubsequence(String s1, String s2) {
int m = s1.length(), n = s2.length();
int[][] dp = new int[m + 1][n + 1];

for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (s1.charAt(i-1) == s2.charAt(j-1)) {
dp[i][j] = dp[i-1][j-1] + 1;
} else {
dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);
}
}
}

return dp[m][n];
}

Key Insights for Interviews

Common Mistakes
  1. Off-by-one errors: Remember dp[i][j] represents substrings ending at i-1 and j-1
  2. Wrong base cases: Empty string transformations must be handled correctly
  3. Forgetting character matching: When characters match, no operation is needed!
  4. Traceback direction: Work backwards from dp[m][n] to reconstruct the path

When to use Edit Distance:

  • ✅ Measuring string similarity
  • ✅ Spell checking and autocorrect
  • ✅ DNA sequence alignment
  • ✅ Finding closest matches in a dictionary
  • ✅ Version control diff algorithms

Pattern Recognition:

  • Two strings → 2D DP table
  • Operations that modify strings → consider all possibilities
  • Optimal substructure → DP is the right approach
  • Need actual edit sequence → implement traceback

Time Complexity: O(m × n) where m and n are string lengths Space Complexity: O(m × n), optimizable to O(n) with rolling arrays


Summary

Edit Distance (Levenshtein Distance) is a fundamental algorithm in Pattern 4: String DP that measures the minimum operations needed to transform one string into another. The key insights are:

  1. Build solutions bottom-up considering all possible operations
  2. Characters match → no cost, take diagonal value
  3. Characters differ → take minimum of (replace, delete, insert)
  4. Applications everywhere from spell checkers to DNA analysis

Master this pattern and you'll handle a wide range of string transformation problems with confidence!