Skip to main content

Longest Common Subsequence: From First Principles

Learning Philosophy: This document teaches you how to think about DP problems, not just memorize solutions. We'll build intuition layer by layer, questioning every assumption.


LAYER 0: The Fundamental Confusion

CRITICAL INSIGHT: Why Two Pointers Fail

You might think: "I'd normally use two pointers and solve it easily."

STOP. This is the first critical insight to examine: Why would two pointers fail here?

Let's test it. Take text1 = "abcde", text2 = "ace":

Start: i=0, j=0 → 'a' == 'a' ✓ → Move both
Now: i=1, j=1 → 'b' != 'c' → Which pointer do we move?
THE DILEMMA
  • If you move i: you get "ace" (length 3) ✓
  • If you move j: you miss the 'c' later and get worse result ✗

The problem: You can't decide greedily. You need to explore both futures.

The Prerequisite Insight

We have overlapping decisions where local greedy choices might block better global outcomes.

This is what makes DP necessary.


LAYER 1: Designer's Brain - Building Intuition

Discovery Step 1: What decisions do we actually make?

At any moment, you're comparing text1[i] vs text2[j]. Let's think chronologically—pretend you're at some position:

text1 = "a b c d e"

i
text2 = "a c e"

j

Case 1: Characters Match (text1[i] == text2[j])

DISCOVERY THOUGHT

Example: 'a' == 'a'

"Great! This character is PART of my common subsequence."

Consequence: I've "used up" both characters. I can't reuse them.

Action: Count it (add 1) and move BOTH pointers forward.

Case 2: Characters Don't Match (text1[i] != text2[j])

DISCOVERY THOUGHT

Example: 'b' != 'c'

"These don't match. One of them won't be in the optimal LCS."

But which one? I don't know! Maybe skipping 'b' leads to better outcome, maybe skipping 'c' does.

Action: I have to try BOTH possibilities and take the better result:

  • Option A: Skip text1[i], try matching text1[i+1...] with text2[j...]
  • Option B: Skip text2[j], try matching text1[i...] with text2[j+1...]

Discovery Step 2: Why does this need DP? (The "messy" realization)

Let's trace manually:

SIMPLE CASE
text1 = "abc"
text2 = "abc"

Starting at (0,0):

  1. 'a' == 'a' → Take it, go to (1,1)
  2. At (1,1): 'b' == 'b' → Take it, go to (2,2)
  3. At (2,2): 'c' == 'c' → Take it, go to (3,3) (done)

Simple! Linear path.

But now try:

COMPLEX CASE - THE BRANCHING BEGINS
text1 = "abcde"
text2 = "ace"

Starting at (0,0):

  • 'a' == 'a' → Take it, go to (1,1)

At (1,1):

  • 'b' != 'c'BRANCH!
    • Path A: Skip 'b', go to (2,1)
    • Path B: Skip 'c', go to (1,2)

At (2,1):

  • 'c' == 'c' → Take it, go to (3,2)

At (1,2):

  • 'b' != 'e'BRANCH AGAIN!
    • Path A1: go to (2,2)
    • Path A2: go to (1,3)
THE AHA MOMENT

WAIT. Look at (2,2):

  • We might reach (2,2) from path A → (2,1) → (2,2)
  • We might reach (2,2) from path B → (1,2) → (2,2)

Aha! Overlapping subproblems. We're solving (2,2) multiple times.

This is why DP: Cache the answer for each (i,j) pair so we don't recalculate.


LAYER 2: Designer's Brain - The State Space

The KEY Question: What decides 1D vs 2D DP?

THE META-PRINCIPLE

State = minimum info needed to describe your current situation

The question to ask: "What information do I need to answer 'what's the LCS from this point onward?'"

Let's test:

Can I describe state with just i (position in text1)?

No! If I'm at text1[2], I need to know where I am in text2 as well.

Example: (i=2, j=1) vs (i=2, j=3) are completely different situations.

Can I describe state with just j (position in text2)?

No! Same reasoning.

Do I need both i AND j?

Yes! (i, j) uniquely describes where I am in BOTH strings.

From (i, j), I can compute LCS of text1[i:] and text2[j:].

The Universal Dimensionality Rule

FRAMEWORK: DP DIMENSIONALITY
# of dimensions in DP = # of independent variables needed to describe state

Examples:

  • Fibonacci: dp[i] = just need position in sequence (1D)
  • Coin change: dp[amount] = just need remaining amount (1D)
  • LCS: dp[i][j] = need position in BOTH strings (2D)
  • Edit distance: dp[i][j] = need position in BOTH strings (2D)
  • Knapsack: dp[i][weight] = need item index AND remaining capacity (2D)

LAYER 3: Coder's Brain - Building the Solution

What does dp[i][j] represent?

PRECISE DEFINITION
dp[i][j] = length of LCS between:
• First i characters of text1 (i.e., text1[0...i-1])
• First j characters of text2 (i.e., text2[0...j-1])

Why this definition?

We want to build from smaller subproblems to larger ones:

  • dp[0][j] = "LCS of empty string and text2[0...j-1]" = 0
  • dp[i][0] = "LCS of text1[0...i-1] and empty string" = 0
  • dp[m][n] = "LCS of entire text1 and text2" = our answer

The Recurrence: Translating Decisions to Math

At dp[i][j], we're comparing text1[i-1] and text2[j-1]:

THE RECURRENCE RELATION
if (text1.charAt(i-1) == text2.charAt(j-1)) {
// We found a matching character!
// Add 1 to the LCS of the "previous" state
dp[i][j] = dp[i-1][j-1] + 1;
}

Why dp[i-1][j-1] + 1?

  • We're using both characters, so we "consume" them
  • The LCS length is: (best LCS before these characters) + 1

if (text1.charAt(i-1) != text2.charAt(j-1)) {
// Characters don't match - try skipping each
// Option A: Skip text1[i-1], use dp[i-1][j]
// Option B: Skip text2[j-1], use dp[i][j-1]
dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);
}

Why Math.max(...)?

  • We don't know which skip leads to better outcome
  • We've already computed both futures (that's DP!)
  • Take the better one

Complete Implementation

public int longestCommonSubsequence(String text1, String text2) {
int m = text1.length();
int n = text2.length();

// Create DP table with (m+1) x (n+1) dimensions
int[][] dp = new int[m + 1][n + 1];

// Fill the DP table
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (text1.charAt(i-1) == text2.charAt(j-1)) {
// Characters match - take it!
dp[i][j] = dp[i-1][j-1] + 1;
} else {
// Characters don't match - try both skips
dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);
}
}
}

return dp[m][n];
}

LAYER 4: Edge Cases as Concept Validators

Edge cases aren't just test cases—they validate your understanding of the concept.

Edge Case 1: Empty Strings

TEST CASE
String text1 = "";
String text2 = "abc";

Question: What should dp[0][3] be?

Answer: 0 (LCS of empty and "abc" is empty)

Validation: Our base case handles this ✓

Edge Case 2: Identical Strings

TEST CASE
String text1 = "abc";
String text2 = "abc";

Trace:

  • dp[1][1]: 'a'=='a'dp[0][0] + 1 = 1
  • dp[2][2]: 'b'=='b'dp[1][1] + 1 = 2
  • dp[3][3]: 'c'=='c'dp[2][2] + 1 = 3

Result: 3 ✓

Edge Case 3: No Common Characters

TEST CASE
String text1 = "abc";
String text2 = "def";

Trace: Every comparison fails, we keep taking Math.max(dp[i-1][j], dp[i][j-1]), which propagates 0s.

Result: 0 ✓

Edge Case 4: Tricky Ordering

CRITICAL TEST
String text1 = "abcde";
String text2 = "ecdba";

Question for you: What's the LCS? Trace through mentally.

Answer: "cdb" or "cda" (length 3)

Notice: 'a' appears in both, but using it early doesn't help!

This validates why greedy fails and DP succeeds.


LAYER 5: The Meta-Process (Framework for ALL DP)

THE UNIVERSAL DP THINKING FRAMEWORK

Step 1: Identify if it's DP

  • Are there overlapping subproblems?
  • Are there optimal substructure (optimal solution contains optimal sub-solutions)?
  • Keywords: "longest", "shortest", "count ways", "max/min"

Step 2: Define the state

  • What's the minimum info to describe where you are?
  • This determines dimensionality of DP array

Step 3: Define the recurrence

  • What decisions can I make at this state?
  • How do those decisions relate to smaller subproblems?
  • Write it as a formula

Step 4: Handle base cases

  • What are the smallest valid states?
  • What are their trivial answers?

Step 5: Determine iteration order

  • Which states depend on which?
  • Fill DP table so dependencies are computed first

YOUR TURN: Test Your Understanding

CHALLENGE QUESTIONS

Question 1: Dimensionality Reduction

Why can't we reduce this to 1D DP by processing strings character-by-character?

Think about what information you'd lose. Can you reconstruct the optimal solution with just one dimension?


Question 2: Reconstruction

What would change if we wanted to return the actual LCS string instead of just length?

Hint: You'd need to track "which decision led to dp[i][j]". How would you backtrack?


Question 3: Problem Relationships

Can you see the connection to Edit Distance problem? What's the difference in recurrence?

Both are 2D DP on two strings. But Edit Distance has 3 operations instead of 2 choices...


Question 4: Complexity Analysis

Time complexity is O(m*n). Can we do better? Why or why not?

Think about information theory: do we need to compare every character pair?


Final Thoughts

LEARNING PHILOSOPHY

You're learning the thinking, not the solution.

Ask "why does X lead to Y?" wherever there's a gap!

The goal isn't to memorize this problem—it's to develop the meta-skill of decomposing ANY DP problem into:

  1. State definition
  2. Decision tree
  3. Recurrence relation
  4. Base cases
  5. Iteration order

Complexity Analysis

MetricValueExplanation
TimeO(m × n)We fill m × n table, each cell O(1)
SpaceO(m × n)DP table storage
Space OptimizedO(min(m,n))Keep only 2 rows (current + previous)

PRACTICE PROGRESSION

Master these in order to build DP intuition:

  1. Longest Common Subsequence (this problem)
  2. Edit Distance (3 operations instead of 2)
  3. Longest Palindromic Subsequence (LCS with reversed string)
  4. Shortest Common Supersequence (inverse problem)
  5. Distinct Subsequences (counting instead of max length)