Robots.txt Generator - Create Robots.txt Files for SEO

Generate a robots.txt file for your website

Generate Robots.txt

Create a robots.txt file to control how search engines crawl your website. Choose a preset or customize your rules.

Helps search engines discover all your pages.

What is robots.txt?

A robots.txt file is a plain text file placed at your website's root directory (e.g., https://example.com/robots.txt) that tells search engine crawlers which parts of your site they can and cannot access. It follows the Robots Exclusion Protocol, a standard that well-behaved crawlers follow before indexing your content.

How robots.txt works

When a search engine crawler (like Googlebot) visits your site, it first checks for a robots.txt file. The file contains rules that specify:

  • User-agent: Which crawler the rules apply to (* means all crawlers)
  • Disallow: Paths the crawler should not access
  • Allow: Paths the crawler can access (useful for exceptions)
  • Sitemap: Location of your XML sitemap

Common robots.txt examples

Allow all crawlers (most websites)

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

Block all crawlers (staging sites)

User-agent: *
Disallow: /

Block specific directories

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/
Allow: /

Sitemap: https://example.com/sitemap.xml

Block AI training bots

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: *
Allow: /

Robots.txt best practices

  • Always place robots.txt in your root directory — It won't work in subdirectories
  • Include your sitemap — Helps crawlers discover all your pages efficiently
  • Don't block CSS or JavaScript — Search engines need these to render your pages properly
  • Use specific paths — Be precise about what you block to avoid accidentally hiding important content
  • Test your robots.txt — Use Google Search Console's robots.txt tester to verify your rules
  • Don't rely on it for security — Robots.txt is public and only a suggestion; use proper authentication for sensitive data

Common mistakes to avoid

  • Blocking your entire site accidentally — Forgetting to remove Disallow: / after launching
  • Blocking important resources — Preventing crawlers from accessing CSS, JS, or images needed for rendering
  • Expecting robots.txt to hide pages — Disallowed URLs can still appear in search results if linked from other sites
  • Using robots.txt for sensitive data — Anyone can read your robots.txt file
  • Incorrect path syntax — Paths must start with / and are case-sensitive on most servers

Frequently Asked Questions

What is a robots.txt file?

A robots.txt file is a plain text file placed in your website's root directory that tells search engine crawlers which pages they can or cannot access. It follows the Robots Exclusion Protocol and is read by crawlers before they index your site.

Where do I put my robots.txt file?

The robots.txt file must be placed in the root directory of your website. For example, if your site is https://example.com, the file should be accessible at https://example.com/robots.txt. It will not work if placed in a subdirectory.

Does robots.txt block pages from appearing in Google?

No. Robots.txt only prevents crawling, not indexing. If other sites link to a disallowed page, Google may still index its URL (without content). To fully block a page from search results, use a noindex meta tag or X-Robots-Tag HTTP header instead.

What is the difference between Allow and Disallow?

Disallow tells crawlers not to access a specific path. Allow explicitly permits access, which is useful when you want to allow a specific page within a disallowed directory. For example: Disallow: /private/ combined with Allow: /private/public-page.html.

Do all bots follow robots.txt?

No. Robots.txt is a voluntary protocol. Legitimate search engines like Google, Bing, and DuckDuckGo follow it, but malicious bots often ignore it. Do not rely on robots.txt for security—use proper authentication and access controls for sensitive content.

How do I block AI training bots with robots.txt?

Add specific user-agent rules for AI crawlers. Common AI bot user-agents include GPTBot (OpenAI), Google-Extended (Google AI), anthropic-ai, and CCBot. Example: User-agent: GPTBot followed by Disallow: / will block OpenAI's crawler.

Can I use wildcards in robots.txt?

Yes. Most modern crawlers support * as a wildcard (matches any sequence of characters) and $ to indicate end-of-URL. For example, Disallow: /*.pdf$ blocks all URLs ending in .pdf.

How long does it take for robots.txt changes to take effect?

Search engines cache robots.txt files, typically for up to 24 hours. Changes may not be reflected immediately. You can request a refresh in Google Search Console under the robots.txt tester tool.

Is robots.txt case-sensitive?

The directives (User-agent, Disallow, Allow) are case-insensitive, but paths are matched exactly as specified. Since most web servers treat URLs as case-sensitive, your robots.txt paths should match your actual URL casing.

Should I block /wp-admin/ on WordPress sites?

Blocking /wp-admin/ is common practice but not strictly necessary since WordPress already requires authentication. However, you should allow /wp-admin/admin-ajax.php as some themes and plugins use it for frontend functionality.

Privacy & Limitations

  • All calculations run entirely in your browser -- nothing is sent to any server.
  • Results are estimates and may vary based on actual conditions.

Related Tools

Related Tools

View all tools

Robots.txt Generator FAQ

What is a robots.txt file?

A robots.txt file is a plain text file placed in your website's root directory that tells search engine crawlers which pages they can or cannot access. It follows the Robots Exclusion Protocol and is read by crawlers before they index your site.

Where do I put my robots.txt file?

The robots.txt file must be placed in the root directory of your website. For example, if your site is https://example.com, the file should be accessible at https://example.com/robots.txt. It will not work if placed in a subdirectory.

Does robots.txt block pages from appearing in Google?

No. Robots.txt only prevents crawling, not indexing. If other sites link to a disallowed page, Google may still index its URL (without content). To fully block a page from search results, use a noindex meta tag or X-Robots-Tag HTTP header instead.

What is the difference between Allow and Disallow?

Disallow tells crawlers not to access a specific path. Allow explicitly permits access, which is useful when you want to allow a specific page within a disallowed directory. For example: Disallow /private/ combined with Allow /private/public-page.html.

Do all bots follow robots.txt?

No. Robots.txt is a voluntary protocol. Legitimate search engines like Google, Bing, and DuckDuckGo follow it, but malicious bots often ignore it. Do not rely on robots.txt for security—use proper authentication and access controls for sensitive content.

How do I block AI training bots with robots.txt?

Add specific user-agent rules for AI crawlers. Common AI bot user-agents include GPTBot (OpenAI), Google-Extended (Google AI), anthropic-ai, and CCBot. Example: User-agent: GPTBot followed by Disallow: / will block OpenAI's crawler.

Request a New Tool
Improve This Tool