OpenAI Token Counter -- GPT-5, GPT-4o Tokens

OpenAI Token Counter

OpenAI Token Counter estimates how many tokens your text will use with GPT-5.x, GPT-4o, GPT-4.1, and other OpenAI models. All current OpenAI models use the o200k_base encoding (~4 characters per token for English).

Model

Estimated Tokens

GPT-5.1 -- o200k_base encoding

Words

Characters

Chars (no spaces)

Lines

Bytes (UTF-8)

4.0

Chars/Token

Input Cost

$0.0000

Output Cost (est.)

$0.0000

Context window: 0 of 400,000 tokens used

0.00%

This counter estimates raw token counts and does not account for cached (prompt caching) tokens. OpenAI offers reduced pricing for cached prompt prefixes on supported models -- check the OpenAI pricing page for details.

OpenAI Model Comparison

All current OpenAI models use the o200k_base encoding with a vocabulary of approximately 200,000 tokens.

Model	Context	Input / 1M	Output / 1M	Encoding
GPT-5.3-Codex	~400K	TBD*	TBD*	o200k_base
GPT-5.2	400K	$1.75	$14.00	o200k_base
GPT-5.2-Codex	400K	$1.75	$14.00	o200k_base
GPT-5.1	400K	$1.25	$10.00	o200k_base
GPT-5.1-Codex	400K	$1.25	$10.00	o200k_base
GPT-5	400K	$1.25	$10.00	o200k_base
GPT-5 Mini	400K	$0.25	$2.00	o200k_base
GPT-4o	128K	$2.50	$10.00	o200k_base
GPT-4o mini	128K	$0.15	$0.60	o200k_base
GPT-4.1	1M	$2.00	$8.00	o200k_base
GPT-4.1 Mini	1M	$0.40	$1.60	o200k_base
GPT-4.1 Nano	1M	$0.10	$0.40	o200k_base

*GPT-5.3-Codex released Feb 5, 2026 -- API pricing not yet announced; currently ChatGPT-only.

How OpenAI Tokenization Works

OpenAI models use Byte Pair Encoding (BPE) to split text into tokens. The current tokenizer is called o200k_base, which has a vocabulary of approximately 200,000 tokens. This is the same encoding used by all GPT-5.x, GPT-4o, and GPT-4.1 models.

What is BPE?

Byte Pair Encoding starts with individual bytes and iteratively merges the most frequent pairs into new tokens. This creates a vocabulary where common words like "the" or "and" are single tokens, while rare words get split into subword pieces.

The tiktoken Library

OpenAI provides an open-source Python library called tiktoken for exact token counting. To use it:

import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o")
tokens = enc.encode("Hello, world!")
print(len(tokens))  # 4

Token Estimation Rules

English text: ~4 characters per token (1 word = ~1.3 tokens)
Code: ~3-3.5 characters per token (symbols split into separate tokens)
CJK text: ~1.5-2 characters per token (each character may be multiple tokens)
Numbers: each digit or small group is typically 1 token

Frequently Asked Questions

How many tokens does GPT-5 use per word?

GPT-5 uses the o200k_base tokenizer, which averages about 1.3 tokens per English word (or roughly 4 characters per token). Technical terms and code use more tokens per word. Non-English languages, especially CJK, use significantly more tokens.

What encoding does GPT-4o use?

GPT-4o uses the o200k_base encoding with a vocabulary of approximately 200,000 tokens. This is the same encoding used across all current OpenAI models including GPT-5.x and GPT-4.1.

How to count tokens for the OpenAI API?

You can use this free online counter for estimates. For exact counts, use the official tiktoken Python library: pip install tiktoken, then use tiktoken.encoding_for_model('gpt-4o') to get the encoder.

What is tiktoken?

tiktoken is OpenAI's open-source tokenizer library for Python. It provides exact token counts for all OpenAI models. It supports multiple encodings including o200k_base (current models) and cl100k_base (legacy models like GPT-4 Turbo).

What is the difference between GPT-5 and GPT-5.1?

GPT-5.1 is an updated version of GPT-5 with improved reasoning and instruction following. Both use the same o200k_base tokenizer and 400K context window, and have the same input pricing ($1.25/1M tokens).

How much does GPT-5 cost per 1000 tokens?

GPT-5 costs $0.00125 per 1K input tokens ($1.25 per 1M) and $0.01 per 1K output tokens ($10.00 per 1M). GPT-5 Mini is significantly cheaper at $0.00025 per 1K input tokens. Output tokens always cost more than input tokens.

Token Counters by Provider

AI Token Counter

Generic counter -- compare all providers

Claude Token Counter

Claude Opus 4.6, Sonnet, Haiku

Gemini Token Counter

Gemini 3 Pro, 2.5 Pro, Flash

Llama Token Counter

Llama 4, 3.3, 3.1 (open-source)

Pricing data as of February 7, 2026. Prices change frequently -- always verify with the official provider documentation: OpenAI | Anthropic | Google Gemini | Groq | Together AI

Privacy & Limitations

All calculations run entirely in your browser -- nothing is sent to any server.
Results are estimates and may vary based on actual conditions.

Related Tools

Llama Token Counter -- Count tokens and estimate costs for Meta Llama 4, 3.3, and open-source LLM
AI Token Counter -- Estimate tokens and characters for a prompt
OpenAI Cost Calculator -- Estimate API cost from token counts
Claude Token Counter -- Count tokens and estimate costs for Claude Opus 4.6, Sonnet, and Anthropic

Related Tools

View all tools

AI Token Counter

Estimate tokens and characters for a prompt

OpenAI Cost Calculator

Estimate API cost from token counts

Claude Token Counter

Count tokens and estimate costs for Claude Opus 4.6, Sonnet, and Anthropic models

Gemini Token Counter

Count tokens and estimate costs for Google Gemini 3 Pro, 2.5 Pro and Flash models

Llama Token Counter

Count tokens and estimate costs for Meta Llama 4, 3.3, and open-source LLM models

OpenAI Token Counter FAQ

How many tokens does GPT-5 use per word?

GPT-5 uses the o200k_base tokenizer, which averages about 1.3 tokens per English word (or roughly 4 characters per token). Technical terms and code use more tokens per word.

What encoding does GPT-4o use?

GPT-4o uses the o200k_base encoding with a vocabulary of approximately 200,000 tokens. This is the same encoding used across all current OpenAI models including GPT-5.x and GPT-4.1.

How to count tokens for the OpenAI API?

You can use this free online counter for estimates, or use the official tiktoken Python library for exact counts. Install it with pip install tiktoken, then use tiktoken.encoding_for_model('gpt-4o') to get the encoder.

What is tiktoken?

What is the difference between GPT-5 and GPT-5.1?

GPT-5.1 is an updated version of GPT-5 with improved reasoning and instruction following. Both use the same o200k_base tokenizer and 400K context window, but GPT-5.1 has the same input pricing ($1.25/1M tokens) while offering better performance.

How much does GPT-5 cost per 1000 tokens?

GPT-5 costs $0.00125 per 1K input tokens ($1.25 per 1M) and $0.01 per 1K output tokens ($10.00 per 1M). GPT-5 Mini is significantly cheaper at $0.00025 per 1K input tokens.