Longest Common Subsequence: From First Principles

Learning Philosophy: This document teaches you how to think about DP problems, not just memorize solutions. We'll build intuition layer by layer, questioning every assumption.

LAYER 0: The Fundamental Confusion

CRITICAL INSIGHT: Why Two Pointers Fail

You might think: "I'd normally use two pointers and solve it easily."

STOP. This is the first critical insight to examine: Why would two pointers fail here?

Let's test it. Take text1 = "abcde", text2 = "ace":

Start: i=0, j=0 → 'a' == 'a' ✓ → Move both
Now:   i=1, j=1 → 'b' != 'c' → Which pointer do we move?

THE DILEMMA

If you move i: you get "ace" (length 3) ✓
If you move j: you miss the 'c' later and get worse result ✗

The problem: You can't decide greedily. You need to explore both futures.

The Prerequisite Insight

We have overlapping decisions where local greedy choices might block better global outcomes.

This is what makes DP necessary.

LAYER 1: Designer's Brain - Building Intuition

Discovery Step 1: What decisions do we actually make?

At any moment, you're comparing text1[i] vs text2[j]. Let's think chronologically—pretend you're at some position:

text1 = "a b c d e"
         ↑
         i
text2 = "a c e"
         ↑
         j

Case 1: Characters Match (`text1[i] == text2[j]`)

DISCOVERY THOUGHT

Example: 'a' == 'a'

"Great! This character is PART of my common subsequence."

Consequence: I've "used up" both characters. I can't reuse them.

Action: Count it (add 1) and move BOTH pointers forward.

Case 2: Characters Don't Match (`text1[i] != text2[j]`)

DISCOVERY THOUGHT

Example: 'b' != 'c'

"These don't match. One of them won't be in the optimal LCS."

But which one? I don't know! Maybe skipping 'b' leads to better outcome, maybe skipping 'c' does.

Action: I have to try BOTH possibilities and take the better result:

Option A: Skip text1[i], try matching text1[i+1...] with text2[j...]
Option B: Skip text2[j], try matching text1[i...] with text2[j+1...]

Discovery Step 2: Why does this need DP? (The "messy" realization)

Let's trace manually:

SIMPLE CASE

text1 = "abc"
text2 = "abc"

Starting at (0,0):

'a' == 'a' → Take it, go to (1,1)
At (1,1): 'b' == 'b' → Take it, go to (2,2)
At (2,2): 'c' == 'c' → Take it, go to (3,3) (done)

Simple! Linear path.

But now try:

COMPLEX CASE - THE BRANCHING BEGINS

text1 = "abcde"
text2 = "ace"

Starting at (0,0):

'a' == 'a' → Take it, go to (1,1)

At (1,1):

'b' != 'c' → BRANCH!
- Path A: Skip 'b', go to (2,1)
- Path B: Skip 'c', go to (1,2)

At (2,1):

'c' == 'c' → Take it, go to (3,2)

At (1,2):

'b' != 'e' → BRANCH AGAIN!
- Path A1: go to (2,2)
- Path A2: go to (1,3)

THE AHA MOMENT

WAIT. Look at (2,2):

We might reach (2,2) from path A → (2,1) → (2,2)
We might reach (2,2) from path B → (1,2) → (2,2)

Aha! Overlapping subproblems. We're solving (2,2) multiple times.

This is why DP: Cache the answer for each (i,j) pair so we don't recalculate.

LAYER 2: Designer's Brain - The State Space

The KEY Question: What decides 1D vs 2D DP?

THE META-PRINCIPLE

State = minimum info needed to describe your current situation

The question to ask: "What information do I need to answer 'what's the LCS from this point onward?'"

Let's test:

Can I describe state with just `i` (position in text1)?

❌ No! If I'm at text1[2], I need to know where I am in text2 as well.

Example: (i=2, j=1) vs (i=2, j=3) are completely different situations.

Can I describe state with just `j` (position in text2)?

❌ No! Same reasoning.

Do I need both `i` AND `j`?

✅ Yes! (i, j) uniquely describes where I am in BOTH strings.

From (i, j), I can compute LCS of text1[i:] and text2[j:].

The Universal Dimensionality Rule

FRAMEWORK: DP DIMENSIONALITY

# of dimensions in DP = # of independent variables needed to describe state

Examples:

Fibonacci: dp[i] = just need position in sequence (1D)
Coin change: dp[amount] = just need remaining amount (1D)
LCS: dp[i][j] = need position in BOTH strings (2D)
Edit distance: dp[i][j] = need position in BOTH strings (2D)
Knapsack: dp[i][weight] = need item index AND remaining capacity (2D)

LAYER 3: Coder's Brain - Building the Solution

What does `dp[i][j]` represent?

PRECISE DEFINITION

dp[i][j] = length of LCS between:
  • First i characters of text1 (i.e., text1[0...i-1])
  • First j characters of text2 (i.e., text2[0...j-1])

Why this definition?

We want to build from smaller subproblems to larger ones:

dp[0][j] = "LCS of empty string and text2[0...j-1]" = 0
dp[i][0] = "LCS of text1[0...i-1] and empty string" = 0
dp[m][n] = "LCS of entire text1 and text2" = our answer

The Recurrence: Translating Decisions to Math

At dp[i][j], we're comparing text1[i-1] and text2[j-1]:

THE RECURRENCE RELATION

if (text1.charAt(i-1) == text2.charAt(j-1)) {
    // We found a matching character!
    // Add 1 to the LCS of the "previous" state
    dp[i][j] = dp[i-1][j-1] + 1;
}

Why dp[i-1][j-1] + 1?

We're using both characters, so we "consume" them
The LCS length is: (best LCS before these characters) + 1

if (text1.charAt(i-1) != text2.charAt(j-1)) {
    // Characters don't match - try skipping each
    // Option A: Skip text1[i-1], use dp[i-1][j]
    // Option B: Skip text2[j-1], use dp[i][j-1]
    dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);
}

Why Math.max(...)?

We don't know which skip leads to better outcome
We've already computed both futures (that's DP!)
Take the better one

Complete Implementation

public int longestCommonSubsequence(String text1, String text2) {
    int m = text1.length();
    int n = text2.length();

    // Create DP table with (m+1) x (n+1) dimensions
    int[][] dp = new int[m + 1][n + 1];

    // Fill the DP table
    for (int i = 1; i <= m; i++) {
        for (int j = 1; j <= n; j++) {
            if (text1.charAt(i-1) == text2.charAt(j-1)) {
                // Characters match - take it!
                dp[i][j] = dp[i-1][j-1] + 1;
            } else {
                // Characters don't match - try both skips
                dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);
            }
        }
    }

    return dp[m][n];
}

LAYER 4: Edge Cases as Concept Validators

Edge cases aren't just test cases—they validate your understanding of the concept.

Edge Case 1: Empty Strings

TEST CASE

String text1 = "";
String text2 = "abc";

Question: What should dp[0][3] be?

Answer: 0 (LCS of empty and "abc" is empty)

Validation: Our base case handles this ✓

Edge Case 2: Identical Strings

TEST CASE

String text1 = "abc";
String text2 = "abc";

Trace:

dp[1][1]: 'a'=='a' → dp[0][0] + 1 = 1
dp[2][2]: 'b'=='b' → dp[1][1] + 1 = 2
dp[3][3]: 'c'=='c' → dp[2][2] + 1 = 3

Result: 3 ✓

Edge Case 3: No Common Characters

TEST CASE

String text1 = "abc";
String text2 = "def";

Trace: Every comparison fails, we keep taking Math.max(dp[i-1][j], dp[i][j-1]), which propagates 0s.

Result: 0 ✓

Edge Case 4: Tricky Ordering

CRITICAL TEST

String text1 = "abcde";
String text2 = "ecdba";

Question for you: What's the LCS? Trace through mentally.

Answer: "cdb" or "cda" (length 3)

Notice: 'a' appears in both, but using it early doesn't help!

This validates why greedy fails and DP succeeds.

LAYER 5: The Meta-Process (Framework for ALL DP)

THE UNIVERSAL DP THINKING FRAMEWORK

Step 1: Identify if it's DP

Are there overlapping subproblems?
Are there optimal substructure (optimal solution contains optimal sub-solutions)?
Keywords: "longest", "shortest", "count ways", "max/min"

Step 2: Define the state

What's the minimum info to describe where you are?
This determines dimensionality of DP array

Step 3: Define the recurrence

What decisions can I make at this state?
How do those decisions relate to smaller subproblems?
Write it as a formula

Step 4: Handle base cases

What are the smallest valid states?
What are their trivial answers?

Step 5: Determine iteration order

Which states depend on which?
Fill DP table so dependencies are computed first

YOUR TURN: Test Your Understanding

CHALLENGE QUESTIONS

Question 1: Dimensionality Reduction

Why can't we reduce this to 1D DP by processing strings character-by-character?

Think about what information you'd lose. Can you reconstruct the optimal solution with just one dimension?

Question 2: Reconstruction

What would change if we wanted to return the actual LCS string instead of just length?

Hint: You'd need to track "which decision led to dp[i][j]". How would you backtrack?

Question 3: Problem Relationships

Can you see the connection to Edit Distance problem? What's the difference in recurrence?

Both are 2D DP on two strings. But Edit Distance has 3 operations instead of 2 choices...

Question 4: Complexity Analysis

Time complexity is O(m*n). Can we do better? Why or why not?

Think about information theory: do we need to compare every character pair?

Final Thoughts

LEARNING PHILOSOPHY

You're learning the thinking, not the solution.

Ask "why does X lead to Y?" wherever there's a gap!

The goal isn't to memorize this problem—it's to develop the meta-skill of decomposing ANY DP problem into:

State definition
Decision tree
Recurrence relation
Base cases
Iteration order

Complexity Analysis

Metric	Value	Explanation
Time	O(m × n)	We fill m × n table, each cell O(1)
Space	O(m × n)	DP table storage
Space Optimized	O(min(m,n))	Keep only 2 rows (current + previous)

PRACTICE PROGRESSION

Master these in order to build DP intuition:

Longest Common Subsequence (this problem)
Edit Distance (3 operations instead of 2)
Longest Palindromic Subsequence (LCS with reversed string)
Shortest Common Supersequence (inverse problem)
Distinct Subsequences (counting instead of max length)

LAYER 0: The Fundamental Confusion​

The Prerequisite Insight​

LAYER 1: Designer's Brain - Building Intuition​

Discovery Step 1: What decisions do we actually make?​

Case 1: Characters Match (text1[i] == text2[j])​

Case 2: Characters Don't Match (text1[i] != text2[j])​

Discovery Step 2: Why does this need DP? (The "messy" realization)​

LAYER 2: Designer's Brain - The State Space​

The KEY Question: What decides 1D vs 2D DP?​

Can I describe state with just i (position in text1)?​

Can I describe state with just j (position in text2)?​

Do I need both i AND j?​

The Universal Dimensionality Rule​

LAYER 3: Coder's Brain - Building the Solution​

What does dp[i][j] represent?​

The Recurrence: Translating Decisions to Math​

Complete Implementation​

LAYER 4: Edge Cases as Concept Validators​

Edge Case 1: Empty Strings​

Edge Case 2: Identical Strings​

Edge Case 3: No Common Characters​

Edge Case 4: Tricky Ordering​

LAYER 5: The Meta-Process (Framework for ALL DP)​

Step 1: Identify if it's DP​

Step 2: Define the state​

Step 3: Define the recurrence​

Step 4: Handle base cases​

Step 5: Determine iteration order​

YOUR TURN: Test Your Understanding​

Question 1: Dimensionality Reduction​

Question 2: Reconstruction​

Question 3: Problem Relationships​

Question 4: Complexity Analysis​

Final Thoughts​

Complexity Analysis​

Related Problems​

LAYER 0: The Fundamental Confusion

The Prerequisite Insight

LAYER 1: Designer's Brain - Building Intuition

Discovery Step 1: What decisions do we actually make?

Case 1: Characters Match (`text1[i] == text2[j]`)

Case 2: Characters Don't Match (`text1[i] != text2[j]`)

Discovery Step 2: Why does this need DP? (The "messy" realization)

LAYER 2: Designer's Brain - The State Space

The KEY Question: What decides 1D vs 2D DP?

Can I describe state with just `i` (position in text1)?

Can I describe state with just `j` (position in text2)?

Do I need both `i` AND `j`?

The Universal Dimensionality Rule

LAYER 3: Coder's Brain - Building the Solution

What does `dp[i][j]` represent?

The Recurrence: Translating Decisions to Math

Complete Implementation

LAYER 4: Edge Cases as Concept Validators

Edge Case 1: Empty Strings

Edge Case 2: Identical Strings

Edge Case 3: No Common Characters

Edge Case 4: Tricky Ordering

LAYER 5: The Meta-Process (Framework for ALL DP)

Step 1: Identify if it's DP

Step 2: Define the state

Step 3: Define the recurrence

Step 4: Handle base cases

Step 5: Determine iteration order

YOUR TURN: Test Your Understanding

Question 1: Dimensionality Reduction

Question 2: Reconstruction

Question 3: Problem Relationships

Question 4: Complexity Analysis

Final Thoughts

Complexity Analysis

Related Problems