Search engines have been explicit about what they do and don't use from your sitemap, but most tutorials still recommend spending time on fields that were deprecated years ago. This guide focuses on what actually matters in 2026 — based on Google's own documentation and real-world crawl behaviour — so you can stop optimising the wrong things.
What Google Actually Uses From Your Sitemap
<loc> — The URL (required, used)
The only truly required field. Every URL you want Google to know about belongs here. Use the canonical version — always HTTPS, always the exact URL you'd want to appear in search results. If your site redirects http:// to https://, never list the http version. If www redirects to non-www, list non-www. Consistency here is critical.
<lastmod> — Last Modified Date (optional, used when accurate)
Google uses lastmod to decide which URLs to re-crawl and when. The operative word is when accurate. Google cross-checks your declared lastmod against the actual HTTP Last-Modified header and the content it sees on the page. If you set every URL's lastmod to today's date on every crawl — a common mistake — Google learns to ignore it.
Use lastmod correctly:
- Update it only when the page content meaningfully changes.
- Updating a copyright year in a footer or changing a button colour is not a meaningful change.
- Publishing a new blog post, updating product pricing, or significantly revising a guide is.
- Use ISO 8601 format:
2026-06-15or2026-06-15T10:30:00+00:00.
What Google Ignores (Stop Wasting Time Here)
<priority> — Ignored by Google
Google officially does not use priority. The field was designed to signal relative importance, but it was so widely abused — most sites just set everything to 1.0 — that the signal became meaningless. Don't spend time fine-tuning these values. Include the field if your CMS generates it automatically, but don't think it affects crawl order or rankings.
<changefreq> — Ignored by Google
Same story. changefreq was intended to hint how often a page changes. Google ignores it in favour of its own crawl frequency signals. Bing may use it as a soft signal, but it's not worth optimising for either. Set it to something sensible if you want, but don't treat it as a ranking lever.
URL Quality Rules
Your sitemap is a direct signal to Google about which pages you believe are worth indexing. Including low-quality pages is counterproductive.
Only include indexable URLs
Never include a URL that has a noindex directive, is blocked by robots.txt, or requires authentication. Sending conflicting signals (include in sitemap but noindex the page) confuses crawlers and wastes budget.
Never include redirects or 404s
Every URL in your sitemap should return HTTP 200. If a URL redirects to another, list the destination — not the redirect source. If a URL returns 404, remove it from the sitemap. Search engines that crawl dead URLs flag your site as low quality and reduce crawl budget allocation.
Use canonical URLs
If you use rel="canonical" on your pages, the URL in your sitemap should match the canonical exactly. Mismatches between your sitemap URL and the canonical tag confuse crawlers.
Size and Structure
- A single sitemap file can contain up to 50,000 URLs and must be under 50 MB uncompressed.
- If you exceed either limit, split into multiple sitemaps and use a sitemap index file to reference them all.
- You can compress sitemaps with gzip (
sitemap.xml.gz) to reduce server bandwidth and fetch time. - Serve your sitemap with the correct content type:
application/xmlortext/xml. A sitemap served astext/htmlwill fail to parse in Google Search Console.
Reference Your Sitemap in robots.txt
Add this line to your robots.txt:
Sitemap: https://yourdomain.com/sitemap.xml
Any crawler that reads your robots.txt — which they all do before crawling — will automatically discover your sitemap without you having to submit it. It's not a substitute for Google Search Console submission, but it helps lesser-known crawlers (and AI bots like ChatGPT Search and Perplexity) find your content.
Keep It Current
A stale sitemap is worse than no sitemap. If your sitemap lists pages that have been deleted or redirected, you're actively directing Google toward dead ends. Set up automated sitemap generation so it updates whenever your content changes — or run a scheduled crawl and health check so you catch issues before they accumulate.
The goal is simple: every URL in your sitemap returns 200, reflects current content, and has an accurate lastmod. Everything else is secondary.