Edit Distance (Levenshtein Distance)

The Edit Distance problem, also known as Levenshtein Distance, is a fundamental string algorithm that measures the minimum number of single-character edits needed to transform one string into another. This problem perfectly demonstrates Pattern 4: String DP, where we build solutions by considering characters from both strings systematically.

Understanding Edit Distance (Levenshtein Distance)

What is Edit Distance?

Edit Distance answers the question: "What is the minimum number of operations needed to transform string A into string B?"

The three allowed operations are:

Insert a character (cost: 1)
Delete a character (cost: 1)
Replace (substitute) a character (cost: 1)

Why This Matters

Edit distance is everywhere in modern computing:

Spell checkers find the closest correctly-spelled word
DNA sequence alignment measures genetic similarity
Git diff computes minimal changes between file versions
Autocorrect suggests words based on typing errors
Plagiarism detection identifies similar documents

The Core Insight

The key insight of Edit Distance is that we can break down the problem by considering what happens at the end of both strings:

Given strings word1[0...i] and word2[0...j], we have three choices:

Characters match (word1[i] == word2[j]):
- No operation needed! Take the edit distance from dp[i-1][j-1]
Characters don't match - we must choose the minimum cost operation:
- Replace: Change word1[i] to word2[j] → cost = 1 + dp[i-1][j-1]
- Delete: Remove word1[i] → cost = 1 + dp[i-1][j]
- Insert: Add word2[j] → cost = 1 + dp[i][j-1]

Pattern 4: String DP Template

This follows our String DP template perfectly:

dp[i][j] = relationship between word1[0...i-1] and word2[0...j-1]

if word1[i-1] == word2[j-1]:
    dp[i][j] = dp[i-1][j-1]  // No operation needed
else:
    dp[i][j] = 1 + min(
        dp[i-1][j-1],  // Replace
        dp[i-1][j],    // Delete
        dp[i][j-1]     // Insert
    )

Visualizing the Operations

Let's understand what each operation means conceptually:

Example: Transform "HORSE" → "ROS"

Delete 'H': HORSE → ORSE (1 operation)
Delete 'R': ORSE → OSE (2 operations)
Replace 'S' with 'R': Wait, this doesn't match our target...

Let's think more carefully about the optimal sequence (we'll compute this properly in the next section):

Replace 'H' with 'R': HORSE → RORSE
Delete 'R': RORSE → ROSE
Delete 'E': ROSE → ROS

Total: 3 operations

But is this optimal? Let's use DP to find out!

Complete Worked Example with Full DP Table

Let's solve: Transform "HORSE" → "ROS"

Step 1: Initialize the DP Table

dp[i][j] = minimum edit distance between word1[0...i-1] and word2[0...j-1]

The table dimensions are (len(word1) + 1) × (len(word2) + 1) = 6 × 4.

Base cases:

dp[0][j] = j (insert j characters to transform empty string to word2[0...j-1])
dp[i][0] = i (delete i characters to transform word1[0...i-1] to empty string)

Initial table:

        ""  R   O   S
    ""  0   1   2   3
    H   1
    O   2
    R   3
    S   4
    E   5

Why These Base Cases?

dp[0][j] = j: To build "ROS" from "" requires 3 insertions: "" → "R" → "RO" → "ROS"
dp[i][0] = i: To build "" from "HORSE" requires 5 deletions: "HORSE" → "HORS" → "HOR" → "HO" → "H" → ""

Step 2: Fill the DP Table Cell by Cell

Let's fill each cell methodically, showing the reasoning:

Row 1: Matching 'H' with substrings of "ROS"

Cell dp[1][1]: "H" → "R"

word1[0] = 'H', word2[0] = 'R' → NOT equal
Replace: 1 + dp[0][0] = 1 + 0 = 1
Delete 'H': 1 + dp[0][1] = 1 + 1 = 2
Insert 'R': 1 + dp[1][0] = 1 + 1 = 2
Minimum = 1 (replace 'H' with 'R')

Cell dp[1][2]: "H" → "RO"

'H' ≠ 'O'
Replace: 1 + dp[0][1] = 1 + 1 = 2
Delete: 1 + dp[0][2] = 1 + 2 = 3
Insert: 1 + dp[1][1] = 1 + 1 = 2
Minimum = 2 (replace 'H'→'R', then insert 'O')

Cell dp[1][3]: "H" → "ROS"

'H' ≠ 'S'
Replace: 1 + dp[0][2] = 1 + 2 = 3
Delete: 1 + dp[0][3] = 1 + 3 = 4
Insert: 1 + dp[1][2] = 1 + 2 = 3
Minimum = 3

        ""  R   O   S
    ""  0   1   2   3
    H   1   1   2   3
    O   2
    R   3
    S   4
    E   5

Row 2: Matching 'HO' with substrings of "ROS"

Cell dp[2][1]: "HO" → "R"

'O' ≠ 'R'
Replace: 1 + dp[1][0] = 1 + 1 = 2
Delete: 1 + dp[1][1] = 1 + 1 = 2
Insert: 1 + dp[2][0] = 1 + 2 = 3
Minimum = 2

Cell dp[2][2]: "HO" → "RO"

'O' == 'O' ✓ Match!
No operation needed: dp[1][1] = 1
Value = 1 (we just need to handle "H" → "R", which costs 1)

Cell dp[2][3]: "HO" → "ROS"

'O' ≠ 'S'
Replace: 1 + dp[1][2] = 1 + 2 = 3
Delete: 1 + dp[1][3] = 1 + 3 = 4
Insert: 1 + dp[2][2] = 1 + 1 = 2
Minimum = 2

        ""  R   O   S
    ""  0   1   2   3
    H   1   1   2   3
    O   2   2   1   2
    R   3
    S   4
    E   5

Row 3: Matching 'HOR' with substrings of "ROS"

Cell dp[3][1]: "HOR" → "R"

'R' == 'R' ✓ Match!
No operation: dp[2][0] = 2
Value = 2 (delete 'H' and 'O')

Cell dp[3][2]: "HOR" → "RO"

'R' ≠ 'O'
Replace: 1 + dp[2][1] = 1 + 2 = 3
Delete: 1 + dp[2][2] = 1 + 1 = 2
Insert: 1 + dp[3][1] = 1 + 2 = 3
Minimum = 2

Cell dp[3][3]: "HOR" → "ROS"

'R' ≠ 'S'
Replace: 1 + dp[2][2] = 1 + 1 = 2
Delete: 1 + dp[2][3] = 1 + 2 = 3
Insert: 1 + dp[3][2] = 1 + 2 = 3
Minimum = 2

        ""  R   O   S
    ""  0   1   2   3
    H   1   1   2   3
    O   2   2   1   2
    R   3   2   2   2
    S   4
    E   5

Row 4: Matching 'HORS' with substrings of "ROS"

Cell dp[4][1]: "HORS" → "R"

'S' ≠ 'R'
Replace: 1 + dp[3][0] = 1 + 3 = 4
Delete: 1 + dp[3][1] = 1 + 2 = 3
Insert: 1 + dp[4][0] = 1 + 4 = 5
Minimum = 3

Cell dp[4][2]: "HORS" → "RO"

'S' ≠ 'O'
Replace: 1 + dp[3][1] = 1 + 2 = 3
Delete: 1 + dp[3][2] = 1 + 2 = 3
Insert: 1 + dp[4][1] = 1 + 3 = 4
Minimum = 3

Cell dp[4][3]: "HORS" → "ROS"

'S' == 'S' ✓ Match!
No operation: dp[3][2] = 2
Value = 2 (we solved "HOR" → "RO" with 2 operations)

        ""  R   O   S
    ""  0   1   2   3
    H   1   1   2   3
    O   2   2   1   2
    R   3   2   2   2
    S   4   3   3   2
    E   5

Row 5: Matching 'HORSE' with substrings of "ROS"

Cell dp[5][1]: "HORSE" → "R"

'E' ≠ 'R'
Replace: 1 + dp[4][0] = 1 + 4 = 5
Delete: 1 + dp[4][1] = 1 + 3 = 4
Insert: 1 + dp[5][0] = 1 + 5 = 6
Minimum = 4

Cell dp[5][2]: "HORSE" → "RO"

'E' ≠ 'O'
Replace: 1 + dp[4][1] = 1 + 3 = 4
Delete: 1 + dp[4][2] = 1 + 3 = 4
Insert: 1 + dp[5][1] = 1 + 4 = 5
Minimum = 4

Cell dp[5][3]: "HORSE" → "ROS"

'E' ≠ 'S'
Replace: 1 + dp[4][2] = 1 + 3 = 4
Delete: 1 + dp[4][3] = 1 + 2 = 3 ← This is the minimum!
Insert: 1 + dp[5][2] = 1 + 4 = 5
Minimum = 3

Final DP Table

        ""  R   O   S
    ""  0   1   2   3
    H   1   1   2   3
    O   2   2   1   2
    R   3   2   2   2
    S   4   3   3   2
    E   5   4   4   3

The Answer

Minimum Edit Distance: 3

The value at dp[5][3] tells us the minimum number of operations to transform "HORSE" into "ROS" is 3.

Step 3: Traceback - Finding the Actual Edits

Now let's trace back to find one optimal sequence of operations. We start at dp[5][3] and work backwards:

Current: dp[5][3] = 3 (HORSE → ROS)
- We got here from dp[4][3] = 2 by DELETE 'E'
- Operation 1: Delete 'E' → "HORS"

Current: dp[4][3] = 2 (HORS → ROS)
- 'S' == 'S', came from dp[3][2] = 2 (no operation)
- Move to dp[3][2]

Current: dp[3][2] = 2 (HOR → RO)
- We got here from dp[2][2] = 1 by DELETE 'R'
- Operation 2: Delete 'R' → "HO"

Current: dp[2][2] = 1 (HO → RO)
- 'O' == 'O', came from dp[1][1] = 1 (no operation)
- Move to dp[1][1]

Current: dp[1][1] = 1 (H → R)
- We got here from dp[0][0] = 0 by REPLACE 'H' with 'R'
- Operation 3: Replace 'H' → 'R' → "RO"

Done! (reached dp[0][0])

Optimal Edit Sequence:

Replace 'H' with 'R': "HORSE" → "RORSE"
Delete 'R': "RORSE" → "ROSE"
Delete 'E': "ROSE" → "ROS"

Wait, that's 3 operations but the edits don't look right. Let me reconsider the traceback more carefully:

Actually, when we trace back, we're working backwards through the strings. Let me redo this properly:

Correct Traceback (starting from the end):

Position (5,3): "HORSE" → "ROS", value = 3

Came from (4,3) via DELETE 'E': "HORS" → "ROS" (cost 2 + 1 = 3)

Position (4,3): "HORS" → "ROS", value = 2

'S' == 'S', came from (3,2) with no operation

Position (3,2): "HOR" → "RO", value = 2

Came from (2,2) via DELETE 'R': "HO" → "RO" (cost 1 + 1 = 2)

Position (2,2): "HO" → "RO", value = 1

'O' == 'O', came from (1,1) with no operation

Position (1,1): "H" → "R", value = 1

Came from (0,0) via REPLACE: "" → "" then add operations

Forward Edit Sequence:

Replace 'H' with 'R': "HORSE" → "RORSE"
Delete 'R' (3rd position): "RORSE" → "ROSE"
Delete 'E': "ROSE" → "ROS"

Understanding Traceback

The traceback tells us HOW we achieved the minimum distance:

Each step moves diagonally (match/replace), up (delete), or left (insert)
Multiple optimal paths may exist
The DP value tells us the cost is optimal, the traceback shows one way to achieve it

Why This Solution is Optimal

Could we do better than 3 operations? Let's verify:

"HORSE" has 5 characters, "ROS" has 3 characters
We must "get rid of" at least 2 characters somehow (5 - 3 = 2)
Looking at the strings, only 'O' and 'S' are in the right positions
We need to: fix 'H', deal with 'R', fix 'E' = at least 3 operations

The DP algorithm guarantees we found the optimal solution!

Applications and Variations

Real-World Applications

1. Spell Checking When you type "teh", the spell checker computes edit distance to all dictionary words:

"teh" → "the" (distance = 2: delete 'h', insert 'h' at correct position... or just SWAP, but that's not in basic edit distance)
"teh" → "tea" (distance = 1: replace 'h' with 'a')
"teh" → "ten" (distance = 1: replace 'h' with 'n')

The spell checker suggests words with smallest edit distance.

2. DNA Sequence Alignment Biologists use edit distance to measure genetic similarity:

DNA1: AGGTAB
DNA2: GXTXAYB

Finding the edit distance helps identify mutations, insertions, and deletions in genetic code.

3. Version Control (Git Diff) Git uses a variation of edit distance to compute minimal changes between file versions:

Old: "Hello World"
New: "Hello Beautiful World"

The diff shows: INSERT "Beautiful " at position 6.

4. Autocorrect on Mobile When you type "restaurent" → "restaurant", the phone computes edit distances to all known words and suggests the closest matches.

LeetCode Problem Connection

LeetCode 72: Edit Distance

This is the classic problem we've been solving!

Problem Statement:

Given two strings word1 and word2, return the minimum number of operations required to convert word1 to word2.

You have the following three operations permitted on a word:

Insert a character

Delete a character

Replace a character

Example:

Input: word1 = "horse", word2 = "ros"
Output: 3
Explanation:
horse → rorse (replace 'h' with 'r')
rorse → rose (remove 'r')
rose → ros (remove 'e')

This is exactly what we solved above!

Implementation

Here's the complete Java implementation:

public class EditDistance {
    /**
     * Computes the minimum edit distance between two strings.
     *
     * Time Complexity: O(m * n) where m = word1.length(), n = word2.length()
     * Space Complexity: O(m * n) for the DP table
     *
     * @param word1 The source string
     * @param word2 The target string
     * @return The minimum number of operations to transform word1 into word2
     */
    public int minDistance(String word1, String word2) {
        int m = word1.length();
        int n = word2.length();

        // dp[i][j] = min operations to convert word1[0...i-1] to word2[0...j-1]
        int[][] dp = new int[m + 1][n + 1];

        // Base cases: transforming from/to empty string
        for (int i = 0; i <= m; i++) {
            dp[i][0] = i;  // Delete all characters
        }
        for (int j = 0; j <= n; j++) {
            dp[0][j] = j;  // Insert all characters
        }

        // Fill the DP table
        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
                    // Characters match - no operation needed
                    dp[i][j] = dp[i - 1][j - 1];
                } else {
                    // Take minimum of three operations
                    dp[i][j] = 1 + Math.min(
                        dp[i - 1][j - 1],  // Replace
                        Math.min(
                            dp[i - 1][j],   // Delete from word1
                            dp[i][j - 1]    // Insert into word1
                        )
                    );
                }
            }
        }

        return dp[m][n];
    }

    /**
     * Demonstrates the solution with detailed output.
     */
    public static void main(String[] args) {
        EditDistance ed = new EditDistance();

        // Test case 1: Classic example
        String word1 = "horse";
        String word2 = "ros";
        System.out.println("Transform '" + word1 + "' → '" + word2 + "'");
        System.out.println("Minimum operations: " + ed.minDistance(word1, word2));
        System.out.println();

        // Test case 2: Longer strings
        word1 = "intention";
        word2 = "execution";
        System.out.println("Transform '" + word1 + "' → '" + word2 + "'");
        System.out.println("Minimum operations: " + ed.minDistance(word1, word2));
        System.out.println();

        // Test case 3: Edge cases
        System.out.println("Empty strings: " + ed.minDistance("", ""));
        System.out.println("One empty: " + ed.minDistance("abc", ""));
        System.out.println("Identical: " + ed.minDistance("same", "same"));
    }
}

Space-Optimized Version

Since we only need the previous row to compute the current row, we can optimize space to O(n):

public int minDistanceOptimized(String word1, String word2) {
    int m = word1.length();
    int n = word2.length();

    // Only keep two rows: previous and current
    int[] prev = new int[n + 1];
    int[] curr = new int[n + 1];

    // Initialize first row
    for (int j = 0; j <= n; j++) {
        prev[j] = j;
    }

    // Fill row by row
    for (int i = 1; i <= m; i++) {
        curr[0] = i;  // Base case for this row

        for (int j = 1; j <= n; j++) {
            if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
                curr[j] = prev[j - 1];
            } else {
                curr[j] = 1 + Math.min(
                    prev[j - 1],  // Replace
                    Math.min(prev[j], curr[j - 1])  // Delete or Insert
                );
            }
        }

        // Swap arrays for next iteration
        int[] temp = prev;
        prev = curr;
        curr = temp;
    }

    return prev[n];
}

Space Optimization

By using rolling arrays, we reduce space from O(m×n) to O(n), while maintaining the same O(m×n) time complexity. This is crucial for very long strings!

Important Variations

1. Minimum ASCII Delete Sum (LeetCode 712)

Instead of counting operations, minimize the ASCII sum of deleted characters:

public int minimumDeleteSum(String s1, String s2) {
    int m = s1.length(), n = s2.length();
    int[][] dp = new int[m + 1][n + 1];

    // Base cases: sum of ASCII values
    for (int i = 1; i <= m; i++) {
        dp[i][0] = dp[i-1][0] + s1.charAt(i-1);
    }
    for (int j = 1; j <= n; j++) {
        dp[0][j] = dp[0][j-1] + s2.charAt(j-1);
    }

    // DP transition
    for (int i = 1; i <= m; i++) {
        for (int j = 1; j <= n; j++) {
            if (s1.charAt(i-1) == s2.charAt(j-1)) {
                dp[i][j] = dp[i-1][j-1];  // No cost when matching
            } else {
                dp[i][j] = Math.min(
                    dp[i-1][j] + s1.charAt(i-1),  // Delete from s1
                    dp[i][j-1] + s2.charAt(j-1)   // Delete from s2
                );
            }
        }
    }

    return dp[m][n];
}

2. One Edit Distance (LeetCode 161)

Check if two strings are exactly one edit apart:

public boolean isOneEditDistance(String s, String t) {
    int m = s.length(), n = t.length();

    // Length difference must be at most 1
    if (Math.abs(m - n) > 1) return false;

    // Ensure s is the shorter string
    if (m > n) return isOneEditDistance(t, s);

    for (int i = 0; i < m; i++) {
        if (s.charAt(i) != t.charAt(i)) {
            if (m == n) {
                // Must be a replace operation
                return s.substring(i + 1).equals(t.substring(i + 1));
            } else {
                // Must be an insert operation (delete from t)
                return s.substring(i).equals(t.substring(i + 1));
            }
        }
    }

    // All characters match - strings differ only if length differs by 1
    return m + 1 == n;
}

3. Edit Distance with Limited Operations

Sometimes we have constraints on which operations are allowed:

/**
 * Edit distance allowing only insert and delete (no replace).
 * This effectively becomes finding the Longest Common Subsequence (LCS)
 * and then: insertions + deletions = (m - lcs) + (n - lcs)
 */
public int minDistanceInsertDeleteOnly(String word1, String word2) {
    int lcs = longestCommonSubsequence(word1, word2);
    return word1.length() + word2.length() - 2 * lcs;
}

private int longestCommonSubsequence(String s1, String s2) {
    int m = s1.length(), n = s2.length();
    int[][] dp = new int[m + 1][n + 1];

    for (int i = 1; i <= m; i++) {
        for (int j = 1; j <= n; j++) {
            if (s1.charAt(i-1) == s2.charAt(j-1)) {
                dp[i][j] = dp[i-1][j-1] + 1;
            } else {
                dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);
            }
        }
    }

    return dp[m][n];
}

Key Insights for Interviews

Common Mistakes

Off-by-one errors: Remember dp[i][j] represents substrings ending at i-1 and j-1
Wrong base cases: Empty string transformations must be handled correctly
Forgetting character matching: When characters match, no operation is needed!
Traceback direction: Work backwards from dp[m][n] to reconstruct the path

When to use Edit Distance:

✅ Measuring string similarity
✅ Spell checking and autocorrect
✅ DNA sequence alignment
✅ Finding closest matches in a dictionary
✅ Version control diff algorithms

Pattern Recognition:

Two strings → 2D DP table
Operations that modify strings → consider all possibilities
Optimal substructure → DP is the right approach
Need actual edit sequence → implement traceback

Time Complexity: O(m × n) where m and n are string lengths Space Complexity: O(m × n), optimizable to O(n) with rolling arrays

Summary

Edit Distance (Levenshtein Distance) is a fundamental algorithm in Pattern 4: String DP that measures the minimum operations needed to transform one string into another. The key insights are:

Build solutions bottom-up considering all possible operations
Characters match → no cost, take diagonal value
Characters differ → take minimum of (replace, delete, insert)
Applications everywhere from spell checkers to DNA analysis

Master this pattern and you'll handle a wide range of string transformation problems with confidence!

Understanding Edit Distance (Levenshtein Distance)​

What is Edit Distance?​

The Core Insight​

Visualizing the Operations​

Complete Worked Example with Full DP Table​

Step 1: Initialize the DP Table​

Step 2: Fill the DP Table Cell by Cell​

Row 1: Matching 'H' with substrings of "ROS"​

Row 2: Matching 'HO' with substrings of "ROS"​

Row 3: Matching 'HOR' with substrings of "ROS"​

Row 4: Matching 'HORS' with substrings of "ROS"​

Row 5: Matching 'HORSE' with substrings of "ROS"​

Final DP Table​

Step 3: Traceback - Finding the Actual Edits​

Why This Solution is Optimal​

Applications and Variations​

Real-World Applications​

LeetCode Problem Connection​

Implementation​

Space-Optimized Version​

Important Variations​

Key Insights for Interviews​

Summary​