Generating a sitemap is the starting point, not the finish line. The URLs inside it need to be healthy — returning HTTP 200, serving real content, and pointing to the canonical version of each page. A sitemap full of 404s, redirect chains, and broken links actively signals to search engines that your site is poorly maintained. This guide covers how to audit sitemap health systematically and which problems to fix first.
What "Sitemap Health" Actually Means
A healthy sitemap has two properties:
- Every URL it lists returns HTTP 200 — no 404s, no redirects, no 500 errors.
- Every URL it lists is worth indexing — real content, no noindex tags, not blocked by robots.txt, not a duplicate.
A health score is a shorthand for the first property: the percentage of your sitemap URLs that return a clean HTTP 200 response. A score of 100% means every URL in your sitemap resolves to live content. Anything below 90% is worth investigating. Below 80%, your crawl budget is being wasted on dead pages and search engines will start deprioritising your site.
Method 1 — Google Search Console (Free)
After submitting your sitemap, Google Search Console shows you:
- Submitted vs Indexed count — the gap tells you how many pages Google found but chose not to index.
- Sitemap errors — parse failures, fetch errors, or URLs Google couldn't access.
Go to Indexing → Pages for a full breakdown of why specific URLs are or aren't indexed. This tells you about indexing decisions but doesn't show you HTTP status codes for every sitemap URL.
Method 2 — Crawl Your Sitemap URLs
The most thorough approach is to crawl every URL in your sitemap and check its HTTP status code. This is exactly what a health check tool does — it visits each URL in your sitemap, records what the server returns, and gives you a status report. After a crawl completes you can:
- Filter to show only 4xx errors and see every dead page in one list.
- Filter to show only 3xx redirects and catch URLs that need updating in the sitemap.
- Filter to show 5xx errors and catch server-side issues.
- Search by URL path to focus on a specific section of the site.
Start with the "Issues only" view — it filters out all the healthy 200s so you see just the problems, making a large site manageable to audit.
Understanding HTTP Status Codes in Your Sitemap
| Status | Meaning | Action |
|---|---|---|
| 200 OK | Page is live and accessible | No action needed |
| 301 / 302 | Page redirects elsewhere | Update sitemap to list the destination URL |
| 404 Not Found | Page no longer exists | Restore, redirect, or remove the broken link |
| 410 Gone | Page permanently removed | Remove from sitemap immediately |
| 403 Forbidden | Server is blocking access | Check auth rules — public pages should return 200 |
| 500 Server Error | Backend error | Fix the server issue, then re-crawl to verify |
What to Fix First: Priority Order
- 404s with inbound backlinks. These are the most damaging — external sites are pointing to a dead page, wasting link equity. Find them in Google Search Console under Links → Top linked pages, cross-reference with your 404 list, and set up 301 redirects to the best matching live page.
- 404s linked from your own navigation or homepage. These are crawled constantly and send a consistent low-quality signal. Fix or remove the link immediately.
- Redirect chains (URL A → B → C). Each hop adds latency and loses a fraction of link equity. Flatten chains to a single direct redirect.
- All remaining 404s and redirects — work through them systematically. Batch similar URLs (e.g. all
/old-blog/...paths) to handle them efficiently. - 500 errors — fix the server issue, then re-crawl to verify the page now returns 200.
How Often to Audit
For actively updated sites (blog, e-commerce, news), run a health check at least monthly. For more static sites, quarterly is usually sufficient. After any major change — a CMS migration, a URL restructure, a large content deletion — always run a health check immediately before and after.
The best setup is auto-refresh with health monitoring: schedule a crawl weekly or monthly, and check the health score each time. A score that drops suddenly is an early warning that something broke — catching it in your next crawl is far better than waiting for Google to flag it in Search Console weeks later.