Generate Robots.txt
Create a robots.txt file to control how search engines crawl your website. Choose a preset or customize your rules.
What is robots.txt?
A robots.txt file is a plain text file placed at your website's root directory (e.g., https://example.com/robots.txt) that tells search engine crawlers which parts of your site they can and cannot access. It follows the Robots Exclusion Protocol, a standard that well-behaved crawlers follow before indexing your content.
How robots.txt works
When a search engine crawler (like Googlebot) visits your site, it first checks for a robots.txt file. The file contains rules that specify:
- User-agent: Which crawler the rules apply to (
*means all crawlers) - Disallow: Paths the crawler should not access
- Allow: Paths the crawler can access (useful for exceptions)
- Sitemap: Location of your XML sitemap
Common robots.txt examples
Allow all crawlers (most websites)
User-agent: * Allow: / Sitemap: https://example.com/sitemap.xml
Block all crawlers (staging sites)
User-agent: * Disallow: /
Block specific directories
User-agent: * Disallow: /admin/ Disallow: /private/ Disallow: /tmp/ Allow: / Sitemap: https://example.com/sitemap.xml
Block AI training bots
User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: anthropic-ai Disallow: / User-agent: CCBot Disallow: / User-agent: * Allow: /
Robots.txt best practices
- Always place robots.txt in your root directory — It won't work in subdirectories
- Include your sitemap — Helps crawlers discover all your pages efficiently
- Don't block CSS or JavaScript — Search engines need these to render your pages properly
- Use specific paths — Be precise about what you block to avoid accidentally hiding important content
- Test your robots.txt — Use Google Search Console's robots.txt tester to verify your rules
- Don't rely on it for security — Robots.txt is public and only a suggestion; use proper authentication for sensitive data
Common mistakes to avoid
- Blocking your entire site accidentally — Forgetting to remove
Disallow: /after launching - Blocking important resources — Preventing crawlers from accessing CSS, JS, or images needed for rendering
- Expecting robots.txt to hide pages — Disallowed URLs can still appear in search results if linked from other sites
- Using robots.txt for sensitive data — Anyone can read your robots.txt file
- Incorrect path syntax — Paths must start with
/and are case-sensitive on most servers
Frequently Asked Questions
What is a robots.txt file?
A robots.txt file is a plain text file placed in your website's root directory that tells search engine crawlers which pages they can or cannot access. It follows the Robots Exclusion Protocol and is read by crawlers before they index your site.
Where do I put my robots.txt file?
The robots.txt file must be placed in the root directory of your website. For example, if your site is https://example.com, the file should be accessible at https://example.com/robots.txt. It will not work if placed in a subdirectory.
Does robots.txt block pages from appearing in Google?
No. Robots.txt only prevents crawling, not indexing. If other sites link to a disallowed page, Google may still index its URL (without content). To fully block a page from search results, use a noindex meta tag or X-Robots-Tag HTTP header instead.
What is the difference between Allow and Disallow?
Disallow tells crawlers not to access a specific path. Allow explicitly permits access, which is useful when you want to allow a specific page within a disallowed directory. For example: Disallow: /private/ combined with Allow: /private/public-page.html.
Do all bots follow robots.txt?
No. Robots.txt is a voluntary protocol. Legitimate search engines like Google, Bing, and DuckDuckGo follow it, but malicious bots often ignore it. Do not rely on robots.txt for security—use proper authentication and access controls for sensitive content.
How do I block AI training bots with robots.txt?
Add specific user-agent rules for AI crawlers. Common AI bot user-agents include GPTBot (OpenAI), Google-Extended (Google AI), anthropic-ai, and CCBot. Example: User-agent: GPTBot followed by Disallow: / will block OpenAI's crawler.
Can I use wildcards in robots.txt?
Yes. Most modern crawlers support * as a wildcard (matches any sequence of characters) and $ to indicate end-of-URL. For example, Disallow: /*.pdf$ blocks all URLs ending in .pdf.
How long does it take for robots.txt changes to take effect?
Search engines cache robots.txt files, typically for up to 24 hours. Changes may not be reflected immediately. You can request a refresh in Google Search Console under the robots.txt tester tool.
Is robots.txt case-sensitive?
The directives (User-agent, Disallow, Allow) are case-insensitive, but paths are matched exactly as specified. Since most web servers treat URLs as case-sensitive, your robots.txt paths should match your actual URL casing.
Should I block /wp-admin/ on WordPress sites?
Blocking /wp-admin/ is common practice but not strictly necessary since WordPress already requires authentication. However, you should allow /wp-admin/admin-ajax.php as some themes and plugins use it for frontend functionality.
Privacy & Limitations
- All calculations run entirely in your browser -- nothing is sent to any server.
- Results are estimates and may vary based on actual conditions.
Related Tools
- UTM Link Builder -- Build URLs with UTM tracking parameters
- Meta Tag Generator -- Generate common SEO meta tags for a page
- Color Contrast Checker -- Check contrast ratio between two colors
- CSS Minifier -- Minify CSS by removing comments and whitespace for smaller file sizes
Related Tools
View all toolsUTM Link Builder
Build URLs with UTM tracking parameters
User-Agent Parser
Parse a user-agent string for quick details
URL Parser
Parse a URL into protocol, host, path, and query
Sitemap XML Generator
Generate a simple sitemap.xml from a list of URLs
Meta Tag Generator
Generate common SEO meta tags for a page
SERP Preview Tool
Preview how your page appears in Google search results
Robots.txt Generator FAQ
What is a robots.txt file?
A robots.txt file is a plain text file placed in your website's root directory that tells search engine crawlers which pages they can or cannot access. It follows the Robots Exclusion Protocol and is read by crawlers before they index your site.
Where do I put my robots.txt file?
The robots.txt file must be placed in the root directory of your website. For example, if your site is https://example.com, the file should be accessible at https://example.com/robots.txt. It will not work if placed in a subdirectory.
Does robots.txt block pages from appearing in Google?
No. Robots.txt only prevents crawling, not indexing. If other sites link to a disallowed page, Google may still index its URL (without content). To fully block a page from search results, use a noindex meta tag or X-Robots-Tag HTTP header instead.
What is the difference between Allow and Disallow?
Disallow tells crawlers not to access a specific path. Allow explicitly permits access, which is useful when you want to allow a specific page within a disallowed directory. For example: Disallow /private/ combined with Allow /private/public-page.html.
Do all bots follow robots.txt?
No. Robots.txt is a voluntary protocol. Legitimate search engines like Google, Bing, and DuckDuckGo follow it, but malicious bots often ignore it. Do not rely on robots.txt for security—use proper authentication and access controls for sensitive content.
How do I block AI training bots with robots.txt?
Add specific user-agent rules for AI crawlers. Common AI bot user-agents include GPTBot (OpenAI), Google-Extended (Google AI), anthropic-ai, and CCBot. Example: User-agent: GPTBot followed by Disallow: / will block OpenAI's crawler.