Where should robots.txt be located?

It must be at the domain root, such as https://example.com/robots.txt. Subdomains need their own file, such as https://blog.example.com/robots.txt.

Does robots.txt hide private pages?

No. robots.txt is public and not an access control system. Sensitive content should be protected with authentication or removed from public access.

What is the difference between Disallow and noindex?

Disallow controls crawling, while noindex controls indexing. If a URL is blocked from crawling, crawlers may still keep a URL-only listing if discovered elsewhere.

Should I block CSS and JavaScript in robots.txt?

Usually no. Keeping CSS and JavaScript crawlable helps search engines render pages correctly.

How do I block one folder?

Use a user-agent block and a path rule, for example: User-agent: * followed by Disallow: /admin/.

Can I use wildcards in robots.txt?

Many major crawlers support * for wildcard matching and $ for end-of-URL matching, for example Disallow: /*.pdf$.

Can robots.txt block bad bots?

Not reliably. Malicious bots can ignore robots.txt. Use server-side controls such as rate limits, firewall rules, and authentication.

How often should I update robots.txt?

Update it whenever site structure changes, new private areas are added, or crawl behavior needs adjustment.

Should I include my sitemap in robots.txt?

Yes, adding a Sitemap line helps crawlers discover canonical URLs faster.

How to Create a robots.txt File (With Examples and Common Mistakes)

Q: What is a robots.txt file?

A robots.txt file is a plain text file at the root of a domain that tells compliant web crawlers which URL paths they may or may not crawl.

robots.txt is a crawler-instruction file for your site root. It helps compliant bots understand which paths to crawl and which to skip.

If you need a working starting point, this version is safe for most sites:

User-agent: *
Disallow: /admin/
Disallow: /checkout/

Sitemap: https://example.com/sitemap.xml

Put it at: https://example.com/robots.txt

Quick Answer

robots.txt is a text file at the domain root.
It controls crawling, not security or true access control.
Use User-agent, Disallow, Allow, and Sitemap directives.
Keep public content crawlable, including CSS and JavaScript assets.
Test changes after each deploy to avoid accidental de-indexing.

What a robots.txt File Does (and Does Not Do)

This file answers one question: "May this crawler fetch this path?"

It does:

Provide crawl guidance to compliant crawlers.
Reduce crawl budget waste on low-value URLs.
Help discovery with a sitemap URL.

It does not:

Hide sensitive content.
Guarantee that a URL will never appear in search.
Stop malicious scraping bots.

If a page must stay private, use authentication and server-side access controls.

robots.txt Syntax

1. `User-agent`

Defines which crawler a rule block targets.

User-agent: *         # All compliant crawlers
User-agent: Googlebot # A specific crawler

2. `Disallow`

Blocks crawler access to matching paths.

Disallow: /private/   # Block a directory
Disallow: /file.html  # Block a file
Disallow: /           # Block everything
Disallow:             # Block nothing

3. `Allow`

Creates a path exception inside a broader disallow rule.

Disallow: /images/
Allow: /images/public/

4. `Sitemap`

Gives crawlers the absolute URL of your sitemap.

Sitemap: https://example.com/sitemap.xml

Common robots.txt Examples (Copy/Paste)

Allow Everything

User-agent: *
Disallow:

Block One Directory

User-agent: *
Disallow: /admin/

Block Multiple Paths

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /checkout/
Disallow: /cart/

Block Specific File Types

User-agent: *
Disallow: /*.pdf$
Disallow: /*.doc$

Let One Crawler In, Block Others

User-agent: Googlebot
Allow: /

User-agent: *
Disallow: /

Block Everything During Staging

User-agent: *
Disallow: /

How to Create robots.txt Step by Step

Step 1: Decide what should be crawled

Usually crawl:

Main pages
Product or content detail pages
CSS and JavaScript files required for rendering

Usually disallow:

Admin areas
Internal search result paths
Session/parameter duplicates if they create crawl bloat

Step 2: Write a minimal first version

Start simple and expand only when needed:

User-agent: *
Disallow: /admin/
Disallow: /search?
Sitemap: https://example.com/sitemap.xml

Step 3: Upload to root

It must resolve at /robots.txt on each host you control:

https://example.com/robots.txt
https://blog.example.com/robots.txt (separate file)

Step 4: Test and monitor

After publishing:

Confirm the file is reachable in the browser.
Validate syntax in crawler tooling.
Watch crawl logs after changes to ensure key pages are still fetched.

What to Disallow vs What to Keep Crawlable

URL Type	Typical Decision	Why
`/admin/`, `/account/`	Disallow	Low-value for search and user-specific
`/cart/`, `/checkout/`	Disallow	Transactional and private state
Internal search URLs	Disallow	Often thin or duplicative
CSS and JS assets	Allow	Needed for rendering and quality evaluation
Primary content pages	Allow	These should be discovered and indexed

Common Mistakes That Hurt SEO

Blocking all crawlers by accident

This rule can remove discovery for your whole site:

User-agent: *
Disallow: /

Use it only for private staging environments, not production.

Trying to use robots.txt as security

Anyone can read the file. Do not list sensitive endpoints expecting secrecy.

Using wrong path patterns

Path matching is sensitive to slashes and path structure. Test intended and edge URLs.

Forgetting subdomains need separate files

example.com and blog.example.com are distinct hosts for robots rules.

Confusing crawling with indexing

Disallow can stop fetches, but URL-only entries may still appear if other pages link to them.

FAQ

What is a robots.txt file in plain language?

It is a public instruction file for web crawlers. It tells compliant bots which paths they may crawl.

Where exactly do I put robots.txt?

At the root of each host: https://yourdomain.com/robots.txt.

Can robots.txt remove a page from search results?

Not by itself in every case. It controls crawling, while indexing decisions may also depend on links and other signals.

Should I disallow `/api/`?

If API URLs are not intended for search and add crawl noise, disallowing can be reasonable. Keep public docs pages crawlable.

Should I block CSS and JS files?

Usually no. Rendering assets should remain crawlable for accurate content evaluation.

Is robots.txt case-sensitive?

Yes, path matching can be case-sensitive depending on server and URL structure.

Can I block bad bots with robots.txt?

Not reliably. Malicious bots can ignore it.

Should I include my sitemap URL?

Yes. Include an absolute sitemap URL to improve discovery.

How often should I review robots.txt?

Review after URL structure changes, migration projects, or major SEO audits.

What is the safest starter robots.txt for a small site?

A minimal block for admin/private paths plus a sitemap line is often enough. Add more rules only when you have a clear crawl-control reason.

Generate Your robots.txt

Robots.txt Generator

Create a correctly formatted robots.txt file and copy it to your site root.

Open Generator

Quick Answer

What a robots.txt File Does (and Does Not Do)

robots.txt Syntax

1. User-agent

2. Disallow

3. Allow

4. Sitemap

Common robots.txt Examples (Copy/Paste)

Allow Everything

Block One Directory

Block Multiple Paths

Block Specific File Types

Let One Crawler In, Block Others

Block Everything During Staging

How to Create robots.txt Step by Step

Step 1: Decide what should be crawled

Step 2: Write a minimal first version

Step 3: Upload to root

Step 4: Test and monitor

What to Disallow vs What to Keep Crawlable

Common Mistakes That Hurt SEO

Blocking all crawlers by accident

Trying to use robots.txt as security

Using wrong path patterns

Forgetting subdomains need separate files

Confusing crawling with indexing

FAQ

What is a robots.txt file in plain language?

Where exactly do I put robots.txt?

Can robots.txt remove a page from search results?

Should I disallow /api/?

Should I block CSS and JS files?

Is robots.txt case-sensitive?

Can I block bad bots with robots.txt?

Should I include my sitemap URL?

How often should I review robots.txt?

What is the safest starter robots.txt for a small site?

Robots.txt Generator

Related Tools

Robots.txt Generator

Sitemap XML Generator

Meta Tag Generator

1. `User-agent`

2. `Disallow`

3. `Allow`

4. `Sitemap`

Should I disallow `/api/`?