Robots.txt Generator - Create Robots.txt Files for SEO

Generate Robots.txt

Create a robots.txt file to control how search engines crawl your website. Choose a preset or customize your rules.

What is robots.txt?

A robots.txt file is a plain text file placed at your website's root directory (e.g., https://example.com/robots.txt) that tells search engine crawlers which parts of your site they can and cannot access. It follows the Robots Exclusion Protocol, a standard that well-behaved crawlers follow before indexing your content.

How robots.txt works

When a search engine crawler (like Googlebot) visits your site, it first checks for a robots.txt file. The file contains rules that specify:

User-agent: Which crawler the rules apply to (* means all crawlers)
Disallow: Paths the crawler should not access
Allow: Paths the crawler can access (useful for exceptions)
Sitemap: Location of your XML sitemap

Common robots.txt examples

Allow all crawlers (most websites)

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

Block all crawlers (staging sites)

User-agent: *
Disallow: /

Block specific directories

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/
Allow: /

Sitemap: https://example.com/sitemap.xml

Block AI training bots

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: *
Allow: /

Robots.txt best practices

Always place robots.txt in your root directory — It won't work in subdirectories
Include your sitemap — Helps crawlers discover all your pages efficiently
Don't block CSS or JavaScript — Search engines need these to render your pages properly
Use specific paths — Be precise about what you block to avoid accidentally hiding important content
Test your robots.txt — Use Google Search Console's robots.txt tester to verify your rules
Don't rely on it for security — Robots.txt is public and only a suggestion; use proper authentication for sensitive data

Common mistakes to avoid

Blocking your entire site accidentally — Forgetting to remove Disallow: / after launching
Blocking important resources — Preventing crawlers from accessing CSS, JS, or images needed for rendering
Expecting robots.txt to hide pages — Disallowed URLs can still appear in search results if linked from other sites
Using robots.txt for sensitive data — Anyone can read your robots.txt file
Incorrect path syntax — Paths must start with / and are case-sensitive on most servers

Frequently Asked Questions

What is a robots.txt file?

A robots.txt file is a plain text file placed in your website's root directory that tells search engine crawlers which pages they can or cannot access. It follows the Robots Exclusion Protocol and is read by crawlers before they index your site.

Where do I put my robots.txt file?

The robots.txt file must be placed in the root directory of your website. For example, if your site is https://example.com, the file should be accessible at https://example.com/robots.txt. It will not work if placed in a subdirectory.

Does robots.txt block pages from appearing in Google?

No. Robots.txt only prevents crawling, not indexing. If other sites link to a disallowed page, Google may still index its URL (without content). To fully block a page from search results, use a noindex meta tag or X-Robots-Tag HTTP header instead.

What is the difference between Allow and Disallow?

Disallow tells crawlers not to access a specific path. Allow explicitly permits access, which is useful when you want to allow a specific page within a disallowed directory. For example: Disallow: /private/ combined with Allow: /private/public-page.html.

Do all bots follow robots.txt?

No. Robots.txt is a voluntary protocol. Legitimate search engines like Google, Bing, and DuckDuckGo follow it, but malicious bots often ignore it. Do not rely on robots.txt for security—use proper authentication and access controls for sensitive content.

How do I block AI training bots with robots.txt?

Add specific user-agent rules for AI crawlers. Common AI bot user-agents include GPTBot (OpenAI), Google-Extended (Google AI), anthropic-ai, and CCBot. Example: User-agent: GPTBot followed by Disallow: / will block OpenAI's crawler.

Can I use wildcards in robots.txt?

Yes. Most modern crawlers support * as a wildcard (matches any sequence of characters) and $ to indicate end-of-URL. For example, Disallow: /*.pdf$ blocks all URLs ending in .pdf.

How long does it take for robots.txt changes to take effect?

Search engines cache robots.txt files, typically for up to 24 hours. Changes may not be reflected immediately. You can request a refresh in Google Search Console under the robots.txt tester tool.

Is robots.txt case-sensitive?

The directives (User-agent, Disallow, Allow) are case-insensitive, but paths are matched exactly as specified. Since most web servers treat URLs as case-sensitive, your robots.txt paths should match your actual URL casing.

Should I block /wp-admin/ on WordPress sites?

Blocking /wp-admin/ is common practice but not strictly necessary since WordPress already requires authentication. However, you should allow /wp-admin/admin-ajax.php as some themes and plugins use it for frontend functionality.

Related Tools

UTM Link Builder -- Build URLs with UTM tracking parameters
Meta Tag Generator -- Generate common SEO meta tags for a page
Color Contrast Checker -- Check contrast ratio between two colors
CSS Minifier -- Minify CSS by removing comments and whitespace for smaller file sizes

Robots.txt Generator FAQ

What is a robots.txt file?

A robots.txt file is a plain text file placed in your website's root directory that tells search engine crawlers which pages they can or cannot access. It follows the Robots Exclusion Protocol and is read by crawlers before they index your site.

Where do I put my robots.txt file?

The robots.txt file must be placed in the root directory of your website. For example, if your site is https://example.com, the file should be accessible at https://example.com/robots.txt. It will not work if placed in a subdirectory.

Does robots.txt block pages from appearing in Google?

No. Robots.txt only prevents crawling, not indexing. If other sites link to a disallowed page, Google may still index its URL (without content). To fully block a page from search results, use a noindex meta tag or X-Robots-Tag HTTP header instead.

What is the difference between Allow and Disallow?

Disallow tells crawlers not to access a specific path. Allow explicitly permits access, which is useful when you want to allow a specific page within a disallowed directory. For example: Disallow /private/ combined with Allow /private/public-page.html.

Do all bots follow robots.txt?

No. Robots.txt is a voluntary protocol. Legitimate search engines like Google, Bing, and DuckDuckGo follow it, but malicious bots often ignore it. Do not rely on robots.txt for security—use proper authentication and access controls for sensitive content.

How do I block AI training bots with robots.txt?

Add specific user-agent rules for AI crawlers. Common AI bot user-agents include GPTBot (OpenAI), Google-Extended (Google AI), anthropic-ai, and CCBot. Example: User-agent: GPTBot followed by Disallow: / will block OpenAI's crawler.

Robots.txt Generator - Create Robots.txt Files for SEO

Generate Robots.txt

What is robots.txt?

How robots.txt works

Common robots.txt examples

Allow all crawlers (most websites)

Block all crawlers (staging sites)

Block specific directories

Block AI training bots

Robots.txt best practices

Common mistakes to avoid

Frequently Asked Questions

Privacy & Limitations

Related Tools

Related Tools

UTM Link Builder

User-Agent Parser

URL Parser

Sitemap XML Generator

Meta Tag Generator

SERP Preview Tool

Robots.txt Generator FAQ