The Quick Answer
A hash function takes any input and produces a fixed-length string of characters (the "hash"). The same input always gives the same hash, but even a one-character change produces a completely different output. Hash functions are one-way — you cannot reverse a hash to recover the original data.
Which algorithm should you use? SHA-256 for most purposes. Avoid MD5 and SHA-1 for anything security-related.
What Hash Functions Actually Do
Think of a hash function as a fingerprint machine for data. You feed it any input — a single character, a paragraph, an entire file — and it produces a fixed-length "fingerprint." Two key properties make this useful:
- Deterministic: The same input always produces the same hash
- Avalanche effect: Changing even one bit of input changes roughly half the output bits
Here's what that looks like in practice:
| Input | SHA-256 Hash (first 16 chars) |
|---|---|
hello |
2cf24dba5fb0a30e... |
Hello |
185f8db32271fe25... |
hello (trailing space) |
98ea6e4f216f2fb4... |
Every character changed — even though the inputs are nearly identical.
The Four Algorithms Compared
MD5 (1992)
- Output: 128 bits (32 hex characters)
- Status: ❌ Cryptographically broken since 2004
- Speed: Very fast
- Still useful for: Non-security checksums, cache keys, deduplication
- Do not use for: Passwords, certificates, digital signatures
MD5 was designed by Ronald Rivest and was the standard for years. In 2004, researchers demonstrated practical collision attacks — generating two different files with the same MD5 hash. Today, collisions can be produced in seconds on a laptop.
SHA-1 (1995)
- Output: 160 bits (40 hex characters)
- Status: ❌ Broken since 2017 (Google's SHAttered attack)
- Speed: Fast
- Still used in: Git commit identifiers (legacy reasons)
- Do not use for: Security applications
Google and CWI Amsterdam demonstrated the first practical SHA-1 collision in 2017 by creating two different PDF files with identical SHA-1 hashes. Major browsers stopped accepting SHA-1 SSL certificates before this.
SHA-256 (2001)
- Output: 256 bits (64 hex characters)
- Status: ✅ Secure — no known practical attacks
- Speed: Moderate (optimized on modern CPUs)
- Used for: SSL/TLS certificates, Bitcoin blockchain, code signing, file integrity, digital signatures
- Recommended for: Most general-purpose hashing needs
SHA-256 is part of the SHA-2 family, designed by the NSA and published by NIST. Despite its origin, it has been extensively analyzed by the global cryptographic community with no weaknesses found.
SHA-512 (2001)
- Output: 512 bits (128 hex characters)
- Status: ✅ Secure — no known practical attacks
- Speed: Can be faster than SHA-256 on 64-bit processors
- Used for: High-security applications, some password hashing schemes
- Choose over SHA-256 when: You need a wider security margin or are on 64-bit hardware
SHA-512 processes data in 64-bit chunks, which actually makes it faster than SHA-256 on modern 64-bit CPUs. The larger output provides a higher security margin against theoretical future attacks.
How Hashing Works (Step by Step)
Here's a simplified view of what happens inside a hash function like SHA-256:
1. Padding
The input message is padded to ensure its length is a multiple of the block size (512 bits for SHA-256). The padding includes the original message length, which prevents certain attacks.
2. Block Processing
The padded message is split into 512-bit blocks. Each block goes through 64 rounds of operations involving:
- Bitwise rotations and shifts — rearrange bits within 32-bit words
- Logical functions (AND, OR, XOR, NOT) — combine bits from multiple words
- Modular addition — add values, wrapping around at 2³²
- Constants — derived from the fractional parts of cube roots of prime numbers
3. Chaining
Each block's processing starts with the result of the previous block. The first block starts with fixed initial values. This chaining is what causes the avalanche effect — changing any bit in the input affects all subsequent blocks.
4. Output
After all blocks are processed, the internal state (eight 32-bit words for SHA-256) is concatenated to produce the final 256-bit hash, typically displayed as 64 hexadecimal characters.
Security Properties
A secure hash function must have three properties:
Pre-image Resistance
Given a hash h, it should be computationally infeasible to find any input m such that hash(m) = h. This is what makes hashing "one-way."
Second Pre-image Resistance
Given an input m1, it should be infeasible to find a different input m2 such that hash(m1) = hash(m2). This ensures you can't forge a different file that matches a known hash.
Collision Resistance
It should be infeasible to find any two different inputs m1 and m2 such that hash(m1) = hash(m2). MD5 and SHA-1 fail this property — SHA-256 and SHA-512 do not.
Practical Uses
File Integrity Verification
When you download software, the publisher often provides a SHA-256 checksum. After downloading, you compute the hash of your downloaded file and compare:
Expected: e3b0c44298fc1c14...
Your file: e3b0c44298fc1c14... ✓ Match — file is intact
If even one byte was corrupted or tampered with during download, the hashes will not match.
Password Storage
Websites should never store your password in plain text. Instead, they hash it and store the hash. When you log in, the system hashes your entered password and compares the result to the stored hash.
Important: General-purpose hash functions like SHA-256 are too fast for passwords. Attackers can try billions of guesses per second. Use specialized password hashing algorithms (bcrypt, scrypt, Argon2) that are intentionally slow and add random "salt" to each password.
Digital Signatures
To sign a document:
- Hash the document (producing a small, fixed-size digest)
- Encrypt the hash with the signer's private key
- Attach the encrypted hash as the "signature"
To verify:
- Hash the document independently
- Decrypt the signature with the signer's public key
- Compare the two hashes — if they match, the document hasn't been altered
Hashing the document first is essential because asymmetric encryption is slow on large data. Hashing reduces a large document to a small digest that can be signed quickly.
Blockchain
In Bitcoin and similar systems:
- Each transaction is hashed to create a transaction ID
- Each block contains the hash of the previous block
- Miners repeatedly hash block data (with a changing "nonce") until the hash meets a difficulty target
This design means changing any transaction in any past block would change that block's hash, which would change the next block's hash, and so on — making tampering immediately detectable.
Git Version Control
Git uses SHA-1 (with plans to migrate to SHA-256) to identify every object:
- Commits are identified by hashing the commit message, author, timestamp, and parent commit hash
- Files (blobs) are identified by hashing their contents
- Trees are identified by hashing the list of filenames and their blob hashes
This content-addressing means identical files are automatically deduplicated, and any change to the repository history changes all downstream hashes.
Common Pitfalls
Confusing Hashing with Encryption
Hashing is one-way: input → hash. There is no key, and you cannot get the input back.
Encryption is two-way: input + key → ciphertext → input. You can decrypt with the correct key.
They serve different purposes. Use hashing to verify data. Use encryption to protect data you need to retrieve later.
Using Fast Hashes for Passwords
SHA-256 can compute billions of hashes per second on a modern GPU. That's excellent for file checksums but terrible for passwords — an attacker can try an enormous number of guesses very quickly.
Password-specific algorithms like bcrypt deliberately slow down hashing (typically 100-1000ms per hash) and add random salt to each password, making brute-force and precomputed attacks impractical.
Trusting MD5 for Security
MD5 collisions are trivial to produce. In 2008, researchers used an MD5 collision to create a rogue SSL certificate accepted by all major browsers. If you're using MD5 for anything security-related, migrate to SHA-256.
Ignoring Encoding
The hash of a string depends on how it's encoded. The text "hello" encoded as UTF-8 produces a different hash than the same text encoded as UTF-16. When comparing hashes across systems, ensure both sides use the same encoding (UTF-8 is the standard).
When to Use What
| Scenario | Recommended Algorithm |
|---|---|
| File download verification | SHA-256 |
| Password storage | bcrypt, scrypt, or Argon2 (not plain SHA-256) |
| Digital signatures | SHA-256 or SHA-512 |
| Blockchain / cryptocurrency | SHA-256 |
| Non-security checksums | MD5 is acceptable; SHA-256 is better |
| Cache keys / deduplication | MD5 or SHA-256 |
| HMAC (message authentication) | SHA-256 |
Try It Yourself
Use the hash generator to hash any text and see MD5, SHA-1, SHA-256, and SHA-512 output side by side. Try changing a single character and watch how every hash changes completely.
For password hashing specifically, use the bcrypt generator which includes salting and configurable cost factors.
Further Reading
- What is Base64 Encoding — another common encoding scheme, often confused with hashing
- Password Strength and Entropy — why password length matters more than complexity
- JWT Tokens Explained — how hashing is used in JSON Web Tokens