robots.txt is a crawler-instruction file for your site root. It helps compliant bots understand which paths to crawl and which to skip.
If you need a working starting point, this version is safe for most sites:
User-agent: *
Disallow: /admin/
Disallow: /checkout/
Sitemap: https://example.com/sitemap.xml
Put it at: https://example.com/robots.txt
Quick Answer
robots.txtis a text file at the domain root.- It controls crawling, not security or true access control.
- Use
User-agent,Disallow,Allow, andSitemapdirectives. - Keep public content crawlable, including CSS and JavaScript assets.
- Test changes after each deploy to avoid accidental de-indexing.
What a robots.txt File Does (and Does Not Do)
This file answers one question: "May this crawler fetch this path?"
It does:
- Provide crawl guidance to compliant crawlers.
- Reduce crawl budget waste on low-value URLs.
- Help discovery with a sitemap URL.
It does not:
- Hide sensitive content.
- Guarantee that a URL will never appear in search.
- Stop malicious scraping bots.
If a page must stay private, use authentication and server-side access controls.
robots.txt Syntax
1. User-agent
Defines which crawler a rule block targets.
User-agent: * # All compliant crawlers
User-agent: Googlebot # A specific crawler
2. Disallow
Blocks crawler access to matching paths.
Disallow: /private/ # Block a directory
Disallow: /file.html # Block a file
Disallow: / # Block everything
Disallow: # Block nothing
3. Allow
Creates a path exception inside a broader disallow rule.
Disallow: /images/
Allow: /images/public/
4. Sitemap
Gives crawlers the absolute URL of your sitemap.
Sitemap: https://example.com/sitemap.xml
Common robots.txt Examples (Copy/Paste)
Allow Everything
User-agent: *
Disallow:
Block One Directory
User-agent: *
Disallow: /admin/
Block Multiple Paths
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /checkout/
Disallow: /cart/
Block Specific File Types
User-agent: *
Disallow: /*.pdf$
Disallow: /*.doc$
Let One Crawler In, Block Others
User-agent: Googlebot
Allow: /
User-agent: *
Disallow: /
Block Everything During Staging
User-agent: *
Disallow: /
How to Create robots.txt Step by Step
Step 1: Decide what should be crawled
Usually crawl:
- Main pages
- Product or content detail pages
- CSS and JavaScript files required for rendering
Usually disallow:
- Admin areas
- Internal search result paths
- Session/parameter duplicates if they create crawl bloat
Step 2: Write a minimal first version
Start simple and expand only when needed:
User-agent: *
Disallow: /admin/
Disallow: /search?
Sitemap: https://example.com/sitemap.xml
Step 3: Upload to root
It must resolve at /robots.txt on each host you control:
https://example.com/robots.txthttps://blog.example.com/robots.txt(separate file)
Step 4: Test and monitor
After publishing:
- Confirm the file is reachable in the browser.
- Validate syntax in crawler tooling.
- Watch crawl logs after changes to ensure key pages are still fetched.
What to Disallow vs What to Keep Crawlable
| URL Type | Typical Decision | Why |
|---|---|---|
/admin/, /account/ |
Disallow | Low-value for search and user-specific |
/cart/, /checkout/ |
Disallow | Transactional and private state |
| Internal search URLs | Disallow | Often thin or duplicative |
| CSS and JS assets | Allow | Needed for rendering and quality evaluation |
| Primary content pages | Allow | These should be discovered and indexed |
Common Mistakes That Hurt SEO
Blocking all crawlers by accident
This rule can remove discovery for your whole site:
User-agent: *
Disallow: /
Use it only for private staging environments, not production.
Trying to use robots.txt as security
Anyone can read the file. Do not list sensitive endpoints expecting secrecy.
Using wrong path patterns
Path matching is sensitive to slashes and path structure. Test intended and edge URLs.
Forgetting subdomains need separate files
example.com and blog.example.com are distinct hosts for robots rules.
Confusing crawling with indexing
Disallow can stop fetches, but URL-only entries may still appear if other pages link to them.
FAQ
What is a robots.txt file in plain language?
It is a public instruction file for web crawlers. It tells compliant bots which paths they may crawl.
Where exactly do I put robots.txt?
At the root of each host: https://yourdomain.com/robots.txt.
Can robots.txt remove a page from search results?
Not by itself in every case. It controls crawling, while indexing decisions may also depend on links and other signals.
Should I disallow /api/?
If API URLs are not intended for search and add crawl noise, disallowing can be reasonable. Keep public docs pages crawlable.
Should I block CSS and JS files?
Usually no. Rendering assets should remain crawlable for accurate content evaluation.
Is robots.txt case-sensitive?
Yes, path matching can be case-sensitive depending on server and URL structure.
Can I block bad bots with robots.txt?
Not reliably. Malicious bots can ignore it.
Should I include my sitemap URL?
Yes. Include an absolute sitemap URL to improve discovery.
How often should I review robots.txt?
Review after URL structure changes, migration projects, or major SEO audits.
What is the safest starter robots.txt for a small site?
A minimal block for admin/private paths plus a sitemap line is often enough. Add more rules only when you have a clear crawl-control reason.
Robots.txt Generator
Create a correctly formatted robots.txt file and copy it to your site root.
Open Generator