How to Compare Two Texts and Find Differences

Learn how text diff algorithms work, when to use line-level vs. word-level comparison, and how to spot hidden changes like whitespace and encoding issues.

The Quick Answer

To compare two texts and find differences, paste them into a text diff tool side by side. Added lines appear in green, removed lines in red, and changed lines in yellow. For code, you can also use diff on the command line or git diff in version control.

The key concept: a diff algorithm finds the longest common subsequence (LCS) of lines shared by both texts, then marks everything else as an addition, removal, or modification.

What Is a Text Diff?

A text diff compares two versions of text and produces a description of what changed. The term comes from the Unix diff command, first released in 1974. Today, diffing is built into version control systems, code editors, word processors, and web-based comparison tools.

A diff answers three questions:

  • What was removed? Lines present in the original but missing from the modified version.
  • What was added? Lines present in the modified version but not in the original.
  • What was changed? Lines that exist in both versions but have different content.

How Diff Algorithms Work

The Longest Common Subsequence (LCS)

Most diff algorithms are based on finding the Longest Common Subsequence — the longest sequence of lines that appear in both texts, in the same order, but not necessarily consecutively.

For example, given:

Text A:

apple
banana
cherry
date

Text B:

apple
blueberry
cherry
elderberry
date

The LCS is: apple, cherry, date — these three lines appear in both texts in the same order. Everything else is either an addition or a removal:

  • banana was removed
  • blueberry was added (before cherry)
  • elderberry was added (before date)

Line Diff vs. Word Diff

A line diff treats each line as an atomic unit. If any character on a line changes, the entire line is flagged as modified:

- The server runs on port 8080
+ The server runs on port 9090

A word diff (or inline diff) goes deeper, highlighting exactly which words changed within a line:

The server runs on port [8080 → 9090]

Word-level diffs are more useful for prose editing and small code changes, where knowing the exact modification saves time. Line-level diffs are better for structural changes where entire lines are added or removed.

Myers' Diff Algorithm

The most widely used diff algorithm is Myers' diff (Eugene Myers, 1986). It finds the shortest sequence of edits (insertions and deletions) to transform one text into another. This is the algorithm behind git diff and most Unix diff implementations.

Myers' algorithm runs in O(ND) time, where N is the total length of both texts and D is the number of differences. For texts that are mostly similar (which is the common case), it is very fast.

Patience Diff

Patience diff is an alternative algorithm that produces more human-readable output for certain types of changes. It first matches unique lines (lines that appear exactly once in each text), then fills in the gaps. This tends to produce better results when large blocks of code are moved or when there are many repeated lines (like closing braces in code).

Git supports patience diff with git diff --patience.

How to Read Diff Output

Side-by-Side Format

The visual format used by most web-based diff tools and IDEs:

Line Original (A) Modified (B)
1 port=8080 port=8080
2 debug=true debug=false
3 max_conn=100 max_conn=200
4 log_level=info

Left column = original. Right column = modified. Colors indicate the type of change.

Unified Diff Format

The standard text format used by git diff, patches, and code review tools:

@@ -1,3 +1,4 @@
 port=8080
-debug=true
-max_connections=100
+debug=false
+max_connections=200
+log_level=info

How to read it:

  • @@ header shows line ranges: -1,3 means starting at line 1, showing 3 lines from the original; +1,4 means 4 lines in the modified version
  • Lines starting with a space are unchanged context
  • Lines starting with - were removed from the original
  • Lines starting with + were added in the modified version

Context Diff Format

An older format that shows changes with more surrounding context:

*** original.txt
--- modified.txt
***************
*** 1,3 ****
  port=8080
! debug=true
! max_connections=100
--- 1,4 ----
  port=8080
! debug=false
! max_connections=200
+ log_level=info

You will encounter this in older systems, but unified diff has largely replaced it.

Practical Examples

Example 1: Comparing Code Revisions

You have two versions of a function:

Before:

function calculate(a, b) {
  return a + b;
}

After:

function calculate(a, b, operation) {
  if (operation === 'multiply') {
    return a * b;
  }
  return a + b;
}

The diff shows:

  • Line 1: changed — parameter list expanded
  • Lines 2-3: added — new conditional block
  • Line 4: unchangedreturn a + b;
  • Line 5: unchanged — closing brace

Example 2: Spotting Hidden Whitespace

Two lines look identical but diff shows them as different:

Hello World    ← has trailing spaces
Hello World    ← no trailing spaces

This is one of the most common reasons to use a diff tool — invisible differences that the human eye cannot spot. Trailing whitespace, tabs vs. spaces, and Windows (CRLF) vs. Unix (LF) line endings all cause this.

Example 3: Verifying Copy-Paste Accuracy

When copying a license key, API token, or configuration snippet, paste both the source and your copy into a diff tool. If the diff shows zero changes, the copy is exact. This catches:

  • Accidentally clipped characters at the start or end
  • Smart quotes replacing straight quotes (common when copying from web pages)
  • Non-breaking spaces replacing regular spaces

Common Pitfalls

1. Whitespace Differences

Whitespace changes are the most common source of confusing diffs:

  • Trailing spaces at the end of lines are invisible but create diffs
  • Tabs vs. spaces look similar but are different characters
  • Line endings — Windows uses \r\n (CRLF), Unix/Mac uses \n (LF)

Many diff tools offer an "ignore whitespace" option. In Git, use git diff -w to ignore all whitespace changes or git diff --ignore-space-at-eol for just line endings.

2. Encoding Issues

If one text is UTF-8 and the other is Latin-1, any non-ASCII character (accents, symbols, emoji) will appear as different even if the intended content is the same. Always ensure both texts use the same encoding before comparing.

3. Formatting-Only Changes

Running a code formatter (Prettier, Black, gofmt) changes formatting without changing logic, but a diff tool shows every reformatted line as changed. To avoid noise:

  • Format both texts the same way before comparing
  • Use tools that understand structural equivalence (like JSON Diff Viewer for JSON)

4. Rich Text vs. Plain Text

Copying from Word, Google Docs, or email clients can introduce invisible formatting — smart quotes, em dashes, non-breaking spaces, and soft hyphens. Always compare plain text, not rich text.

Diff Tools Comparison

Tool Type Best For
Text Diff Tool Web-based Quick comparisons, no install needed
diff (Unix) Command-line Scripting, automation, patches
git diff Command-line Code versioned in Git
VS Code built-in diff Editor Code editing workflow
Meld Desktop app Visual side-by-side with merge
WinMerge Desktop app (Windows) File and directory comparison
colordiff Command-line Colored terminal diff output

When to Use a Diff Tool vs. Alternatives

Use a text diff tool when:

  • You have two versions of the same text and want to see what changed
  • You need to verify copy-paste accuracy
  • You are reviewing edits to a document or configuration

Use a structural diff when:

  • Comparing JSON, XML, or other structured data formats where formatting does not matter
  • You need changes expressed as paths (e.g., config.database.port)

Use version control (Git) when:

  • You need a complete history of changes over time
  • Multiple people are editing the same files
  • You want to be able to revert changes

FAQ

Is text comparison the same as plagiarism detection?

No. A text diff compares two texts you provide and shows the exact differences. Plagiarism detection compares a text against a large database of existing documents to find similar passages. They use fundamentally different algorithms and serve different purposes.

Can I compare binary files with a text diff?

No. Text diff tools work with plain text — human-readable characters separated by line breaks. Binary files (images, PDFs, executables) contain non-text data and require specialized comparison tools.

How do I compare files on the command line?

On macOS/Linux, use the built-in diff command:

diff -u file1.txt file2.txt

The -u flag produces unified diff output. Add --color for colored output on supported systems. For a side-by-side view, use diff -y file1.txt file2.txt.

What does "context lines" mean in a diff?

Context lines are unchanged lines shown around each change to provide surrounding context. By default, git diff shows 3 lines of context. You can change this with git diff -U5 (5 lines) or git diff -U0 (no context, only changes).

Why does my diff show everything as changed?

This usually means one of three things: (1) the line endings are different (CRLF vs. LF), (2) the encoding is different (UTF-8 vs. Latin-1), or (3) one text was reformatted (indentation, spacing). Try normalizing line endings and encoding before comparing.

Summary

Text diffing is a fundamental skill for anyone who works with text files — developers, writers, system administrators, and data analysts. The core concepts are simple: split into lines, find what is common, and highlight what is different. Understanding diff output formats (side-by-side, unified) and common pitfalls (whitespace, encoding, formatting) helps you use diff tools effectively and catch changes that matter.

Compare your texts now →

Related Tools