Longest Common Subsequence: From First Principles
Learning Philosophy: This document teaches you how to think about DP problems, not just memorize solutions. We'll build intuition layer by layer, questioning every assumption.
LAYER 0: The Fundamental Confusion
You might think: "I'd normally use two pointers and solve it easily."
STOP. This is the first critical insight to examine: Why would two pointers fail here?
Let's test it. Take text1 = "abcde"
, text2 = "ace"
:
Start: i=0, j=0 → 'a' == 'a' ✓ → Move both
Now: i=1, j=1 → 'b' != 'c' → Which pointer do we move?
- If you move i: you get "ace" (length 3) ✓
- If you move j: you miss the 'c' later and get worse result ✗
The problem: You can't decide greedily. You need to explore both futures.
The Prerequisite Insight
We have overlapping decisions where local greedy choices might block better global outcomes.
This is what makes DP necessary.
LAYER 1: Designer's Brain - Building Intuition
Discovery Step 1: What decisions do we actually make?
At any moment, you're comparing text1[i]
vs text2[j]
. Let's think chronologically—pretend you're at some position:
text1 = "a b c d e"
↑
i
text2 = "a c e"
↑
j
Case 1: Characters Match (text1[i] == text2[j]
)
Example: 'a' == 'a'
"Great! This character is PART of my common subsequence."
Consequence: I've "used up" both characters. I can't reuse them.
Action: Count it (add 1) and move BOTH pointers forward.
Case 2: Characters Don't Match (text1[i] != text2[j]
)
Example: 'b' != 'c'
"These don't match. One of them won't be in the optimal LCS."
But which one? I don't know! Maybe skipping 'b' leads to better outcome, maybe skipping 'c' does.
Action: I have to try BOTH possibilities and take the better result:
- Option A: Skip
text1[i]
, try matchingtext1[i+1...]
withtext2[j...]
- Option B: Skip
text2[j]
, try matchingtext1[i...]
withtext2[j+1...]
Discovery Step 2: Why does this need DP? (The "messy" realization)
Let's trace manually:
text1 = "abc"
text2 = "abc"
Starting at (0,0):
'a' == 'a'
→ Take it, go to (1,1)- At (1,1):
'b' == 'b'
→ Take it, go to (2,2) - At (2,2):
'c' == 'c'
→ Take it, go to (3,3) (done)
Simple! Linear path.
But now try:
text1 = "abcde"
text2 = "ace"
Starting at (0,0):
'a' == 'a'
→ Take it, go to (1,1)
At (1,1):
'b' != 'c'
→ BRANCH!- Path A: Skip 'b', go to (2,1)
- Path B: Skip 'c', go to (1,2)
At (2,1):
'c' == 'c'
→ Take it, go to (3,2)
At (1,2):
'b' != 'e'
→ BRANCH AGAIN!- Path A1: go to (2,2)
- Path A2: go to (1,3)
WAIT. Look at (2,2):
- We might reach (2,2) from path A → (2,1) → (2,2)
- We might reach (2,2) from path B → (1,2) → (2,2)
Aha! Overlapping subproblems. We're solving (2,2) multiple times.
This is why DP: Cache the answer for each (i,j) pair so we don't recalculate.
LAYER 2: Designer's Brain - The State Space
The KEY Question: What decides 1D vs 2D DP?
State = minimum info needed to describe your current situation
The question to ask: "What information do I need to answer 'what's the LCS from this point onward?'"
Let's test:
Can I describe state with just i
(position in text1)?
❌ No! If I'm at text1[2]
, I need to know where I am in text2
as well.
Example: (i=2, j=1)
vs (i=2, j=3)
are completely different situations.
Can I describe state with just j
(position in text2)?
❌ No! Same reasoning.
Do I need both i
AND j
?
✅ Yes! (i, j)
uniquely describes where I am in BOTH strings.
From (i, j)
, I can compute LCS of text1[i:]
and text2[j:]
.
The Universal Dimensionality Rule
# of dimensions in DP = # of independent variables needed to describe state
Examples:
- Fibonacci:
dp[i]
= just need position in sequence (1D) - Coin change:
dp[amount]
= just need remaining amount (1D) - LCS:
dp[i][j]
= need position in BOTH strings (2D) - Edit distance:
dp[i][j]
= need position in BOTH strings (2D) - Knapsack:
dp[i][weight]
= need item index AND remaining capacity (2D)
LAYER 3: Coder's Brain - Building the Solution
What does dp[i][j]
represent?
dp[i][j] = length of LCS between:
• First i characters of text1 (i.e., text1[0...i-1])
• First j characters of text2 (i.e., text2[0...j-1])
Why this definition?
We want to build from smaller subproblems to larger ones:
dp[0][j]
= "LCS of empty string and text2[0...j-1]" = 0dp[i][0]
= "LCS of text1[0...i-1] and empty string" = 0dp[m][n]
= "LCS of entire text1 and text2" = our answer
The Recurrence: Translating Decisions to Math
At dp[i][j]
, we're comparing text1[i-1]
and text2[j-1]
:
if (text1.charAt(i-1) == text2.charAt(j-1)) {
// We found a matching character!
// Add 1 to the LCS of the "previous" state
dp[i][j] = dp[i-1][j-1] + 1;
}
Why dp[i-1][j-1] + 1
?
- We're using both characters, so we "consume" them
- The LCS length is: (best LCS before these characters) + 1
if (text1.charAt(i-1) != text2.charAt(j-1)) {
// Characters don't match - try skipping each
// Option A: Skip text1[i-1], use dp[i-1][j]
// Option B: Skip text2[j-1], use dp[i][j-1]
dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);
}
Why Math.max(...)
?
- We don't know which skip leads to better outcome
- We've already computed both futures (that's DP!)
- Take the better one
Complete Implementation
public int longestCommonSubsequence(String text1, String text2) {
int m = text1.length();
int n = text2.length();
// Create DP table with (m+1) x (n+1) dimensions
int[][] dp = new int[m + 1][n + 1];
// Fill the DP table
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (text1.charAt(i-1) == text2.charAt(j-1)) {
// Characters match - take it!
dp[i][j] = dp[i-1][j-1] + 1;
} else {
// Characters don't match - try both skips
dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);
}
}
}
return dp[m][n];
}
LAYER 4: Edge Cases as Concept Validators
Edge cases aren't just test cases—they validate your understanding of the concept.
Edge Case 1: Empty Strings
String text1 = "";
String text2 = "abc";
Question: What should dp[0][3]
be?
Answer: 0 (LCS of empty and "abc" is empty)
Validation: Our base case handles this ✓
Edge Case 2: Identical Strings
String text1 = "abc";
String text2 = "abc";
Trace:
dp[1][1]
:'a'=='a'
→dp[0][0] + 1 = 1
dp[2][2]
:'b'=='b'
→dp[1][1] + 1 = 2
dp[3][3]
:'c'=='c'
→dp[2][2] + 1 = 3
Result: 3 ✓
Edge Case 3: No Common Characters
String text1 = "abc";
String text2 = "def";
Trace: Every comparison fails, we keep taking Math.max(dp[i-1][j], dp[i][j-1])
, which propagates 0s.
Result: 0 ✓
Edge Case 4: Tricky Ordering
String text1 = "abcde";
String text2 = "ecdba";
Question for you: What's the LCS? Trace through mentally.
Answer: "cdb" or "cda" (length 3)
Notice: 'a' appears in both, but using it early doesn't help!
This validates why greedy fails and DP succeeds.
LAYER 5: The Meta-Process (Framework for ALL DP)
Step 1: Identify if it's DP
- Are there overlapping subproblems?
- Are there optimal substructure (optimal solution contains optimal sub-solutions)?
- Keywords: "longest", "shortest", "count ways", "max/min"
Step 2: Define the state
- What's the minimum info to describe where you are?
- This determines dimensionality of DP array
Step 3: Define the recurrence
- What decisions can I make at this state?
- How do those decisions relate to smaller subproblems?
- Write it as a formula
Step 4: Handle base cases
- What are the smallest valid states?
- What are their trivial answers?
Step 5: Determine iteration order
- Which states depend on which?
- Fill DP table so dependencies are computed first
YOUR TURN: Test Your Understanding
Question 1: Dimensionality Reduction
Why can't we reduce this to 1D DP by processing strings character-by-character?
Think about what information you'd lose. Can you reconstruct the optimal solution with just one dimension?
Question 2: Reconstruction
What would change if we wanted to return the actual LCS string instead of just length?
Hint: You'd need to track "which decision led to dp[i][j]". How would you backtrack?
Question 3: Problem Relationships
Can you see the connection to Edit Distance problem? What's the difference in recurrence?
Both are 2D DP on two strings. But Edit Distance has 3 operations instead of 2 choices...
Question 4: Complexity Analysis
Time complexity is O(m*n). Can we do better? Why or why not?
Think about information theory: do we need to compare every character pair?
Final Thoughts
You're learning the thinking, not the solution.
Ask "why does X lead to Y?" wherever there's a gap!
The goal isn't to memorize this problem—it's to develop the meta-skill of decomposing ANY DP problem into:
- State definition
- Decision tree
- Recurrence relation
- Base cases
- Iteration order
Complexity Analysis
Metric | Value | Explanation |
---|---|---|
Time | O(m × n) | We fill m × n table, each cell O(1) |
Space | O(m × n) | DP table storage |
Space Optimized | O(min(m,n)) | Keep only 2 rows (current + previous) |
Related Problems
Master these in order to build DP intuition:
- Longest Common Subsequence (this problem)
- Edit Distance (3 operations instead of 2)
- Longest Palindromic Subsequence (LCS with reversed string)
- Shortest Common Supersequence (inverse problem)
- Distinct Subsequences (counting instead of max length)