XML Sitemaps Explained — Structure, Best Practices, and Common Mistakes

Learn how XML sitemaps work, when you need one, and how to create, deploy, and maintain sitemap.xml files for better search engine discovery.

The Quick Answer

A sitemap.xml is a file that tells search engines which pages on your site exist and when they were last updated:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-02-05</lastmod>
  </url>
</urlset>

Place it at yoursite.com/sitemap.xml, reference it in robots.txt, and submit it to Google Search Console. That's the essentials.

The rest of this guide covers the full format, optional fields, deployment, and mistakes to avoid.


Why Sitemaps Exist

Search engines discover pages by following links. But not every page is well-linked:

  • New pages may have no inbound links yet
  • Orphan pages aren't linked from your navigation
  • Deep pages sit many clicks from the homepage
  • JavaScript-rendered content may not expose links to crawlers

A sitemap provides a direct list of URLs for crawlers to check. It does not guarantee indexing — it just makes discovery reliable.

Key point: A sitemap is a suggestion, not a command. Search engines decide independently what to crawl and index.


Sitemap XML Format

Required Elements

Every valid sitemap needs:

Element Description
<?xml version="1.0" encoding="UTF-8"?> XML declaration
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> Root element with namespace
<url> Container for each URL entry
<loc> Full URL including protocol (https://)

Optional Elements

Element Description Used by Google?
<lastmod> Last modification date (YYYY-MM-DD) Yes — if accurate
<changefreq> Expected change frequency Largely ignored
<priority> Relative importance (0.0–1.0) Largely ignored

A Complete Example

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-02-05</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://example.com/about</loc>
    <lastmod>2025-11-20</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.5</priority>
  </url>
  <url>
    <loc>https://example.com/blog/xml-sitemaps</loc>
    <lastmod>2026-02-05</lastmod>
    <changefreq>yearly</changefreq>
    <priority>0.7</priority>
  </url>
</urlset>

The lastmod Tag — The One That Matters

Of the three optional tags, lastmod is the only one Google actively uses. But it only works if it's accurate.

Good use of lastmod

Set it to the date the page content was actually last modified:

<lastmod>2026-01-15</lastmod>

Bad use of lastmod

Setting today's date on every page every time you regenerate the sitemap:

<!-- Don't do this — makes lastmod meaningless -->
<lastmod>2026-02-05</lastmod> <!-- on every URL, every day -->

When every URL has the same recent date, search engines cannot tell which pages actually changed. The signal becomes noise.

Accepted date formats

  • 2026-02-05 (date only — most common)
  • 2026-02-05T10:30:00+00:00 (full W3C datetime)
  • 2026-02-05T10:30:00Z (UTC)

Sitemap Limits

Constraint Limit
URLs per sitemap 50,000
File size (uncompressed) 50 MB
Gzipped files Accepted (.xml.gz)
Protocol Must match site protocol
Scope Same domain/subdomain only

If your site exceeds these limits, use a sitemap index file.


Sitemap Index Files

A sitemap index references multiple sitemap files:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-pages.xml</loc>
    <lastmod>2026-02-05</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-blog.xml</loc>
    <lastmod>2026-02-03</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-products.xml</loc>
    <lastmod>2026-01-28</lastmod>
  </sitemap>
</sitemapindex>

When to use a sitemap index

  • Site has more than 50,000 URLs
  • You want to organize sitemaps by content type (pages, blog, products)
  • Different sections update at different frequencies
  • You want to track indexing per section in Search Console

Deployment Checklist

1. Create your sitemap

Use the Sitemap XML Generator or generate it programmatically from your CMS or build process.

2. Upload to your root directory

https://yoursite.com/sitemap.xml

The sitemap can technically live at any URL, but the root is the convention and the easiest for crawlers to find.

3. Reference in robots.txt

Add this line to your robots.txt:

Sitemap: https://yoursite.com/sitemap.xml

This is how crawlers auto-discover your sitemap without you needing to submit it manually.

4. Submit to search engines

5. Monitor

Check back after a few days. Search Console shows:

  • How many URLs were discovered
  • How many were indexed
  • Any errors (404s, redirects, blocked pages)

Common Mistakes

1. Including non-canonical URLs

If a page has <link rel="canonical" href="..."> pointing to a different URL, only include the canonical URL in your sitemap. Mixed signals confuse crawlers.

2. Including URLs that return errors

Only include URLs that return HTTP 200. Remove URLs that:

  • Return 404 (not found)
  • Return 301/302 (redirects)
  • Return 5xx (server errors)
  • Are blocked by robots.txt

3. Inconsistent trailing slashes

/about and /about/ are technically different URLs. If your server treats them differently, pick one and be consistent. If it doesn't matter, still pick one for the sitemap.

4. Stale sitemaps

A sitemap generated once and never updated gradually becomes useless. New pages aren't listed, deleted pages return 404s, and lastmod dates are wrong.

Automate it. Most frameworks and CMS platforms can generate sitemaps at build time or on a schedule.

5. Listing pages you don't want indexed

If a page has <meta name="robots" content="noindex">, don't include it in your sitemap. Including noindex pages wastes crawl budget and creates contradictory signals.

6. Fake lastmod dates

Setting all lastmod dates to today or to the same date makes the field meaningless. Only set lastmod when you know the actual last-modified date.


Generating Sitemaps Automatically

Static site generators

Most static site generators (Eleventy, Hugo, Next.js, Gatsby, Astro) have sitemap plugins that auto-generate sitemap.xml at build time from your pages.

CMS platforms

WordPress, Drupal, and other CMS platforms typically include sitemap generation or have plugins for it. WordPress has built-in sitemap support since version 5.5.

Custom generation

For dynamic sites, generate sitemaps programmatically by querying your database for published URLs and writing the XML:

// Simplified Node.js example
const urls = await getPublishedUrls();
let xml = '<?xml version="1.0" encoding="UTF-8"?>\n';
xml += '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n';
for (const url of urls) {
  xml += `  <url>\n`;
  xml += `    <loc>${escapeXml(url.href)}</loc>\n`;
  if (url.updatedAt) {
    xml += `    <lastmod>${url.updatedAt.toISOString().split('T')[0]}</lastmod>\n`;
  }
  xml += `  </url>\n`;
}
xml += '</urlset>';

Run this on a schedule (daily or on content change) and write the output to your public directory.


Sitemap vs. robots.txt

These two files are complementary, not interchangeable:

File Purpose Mechanism
robots.txt Controls access — which URLs crawlers may request Restrict / allow
sitemap.xml Suggests discovery — which URLs you want crawlers to find Recommend

A common setup:

  • robots.txt blocks admin pages, staging URLs, and duplicate content
  • sitemap.xml lists all public, canonical, indexable pages
  • robots.txt references the sitemap location

You can use the Robots.txt Generator to create both files together.


Image and Video Sitemaps

Google supports XML namespace extensions for images and videos:

Image sitemap example

<url>
  <loc>https://example.com/gallery</loc>
  <image:image>
    <image:loc>https://example.com/images/photo1.jpg</image:loc>
    <image:caption>Sunset over the mountains</image:caption>
  </image:image>
</url>

Video sitemap example

<url>
  <loc>https://example.com/videos/tutorial</loc>
  <video:video>
    <video:thumbnail_loc>https://example.com/thumbs/tutorial.jpg</video:thumbnail_loc>
    <video:title>Getting Started Tutorial</video:title>
    <video:description>A step-by-step walkthrough.</video:description>
  </video:video>
</url>

These extensions help media-heavy sites surface content in Google Images and Google Video search.


Quick Reference

What Value
Max URLs per file 50,000
Max file size 50 MB (uncompressed)
Accepted compression gzip (.xml.gz)
Required tag <loc>
Most useful optional tag <lastmod>
Standard location /sitemap.xml
Declaration in robots.txt Sitemap: https://yoursite.com/sitemap.xml
Protocol Must match your site (https)
Scope Same host only

Next Steps

  1. Generate your sitemap with the Sitemap XML Generator
  2. Create a robots.txt with the Robots.txt Generator and add your sitemap reference
  3. Submit to search engines via Google Search Console and Bing Webmaster Tools
  4. Set up auto-generation so your sitemap stays current as your site grows

Related Tools