Validate every URL in your sitemap
Fetch your XML sitemap, check every URL for broken links, redirects, noindex directives, and canonical mismatches — all in real time. Supports sitemap index files with nested child sitemaps.
Newsletter
Get more from your audits
Keep your sitemap healthy — get our weekly site health guide.
What is an XML sitemap?
An XML sitemap is a structured file that lists every important URL on your website.
Search engine crawlers — Googlebot, Bingbot, and others — read your sitemap to discover
pages they might not find through normal link following. Each entry can optionally include
metadata like lastmod (last modified date), changefreq (update
frequency), and priority (relative importance within your site).
The sitemap format is defined by the Sitemaps.org protocol and is supported by Google, Bing, Yahoo, and Ask. Sitemaps are one of the most reliable ways to ensure all of your content is crawlable, especially for large sites, new sites with few inbound links, or sites with deep navigation hierarchies.
Why validate your sitemap?
A sitemap with errors actively harms your SEO. When you submit a sitemap to Google Search Console containing broken URLs, redirect chains, or noindex pages, you're wasting crawl budget and sending conflicting signals to search engines.
Search engines have limited time to crawl your site. Submitting broken or redirect URLs in your sitemap consumes crawl budget on URLs that deliver no indexable content.
A URL in your sitemap that also has a noindex directive sends contradictory signals. Google's official guidance is to remove noindex pages from your sitemap.
When a sitemap URL's canonical tag points to a different URL, you're telling search engines the submitted URL is not the authoritative version — defeating the purpose of the sitemap entry.
Redirects in your sitemap mean every crawl visit goes through at least two HTTP requests. Update sitemap URLs to their final destinations to eliminate unnecessary hops.
Common sitemap issues detected
- 404 Not Found (Broken)
- The URL returns a 404 or 410 HTTP status. The page no longer exists. Remove it from your sitemap and add a redirect if the content moved.
- 3xx Redirect
- The URL redirects to another location. Update the sitemap entry to the final destination URL so crawlers don't waste a request following the redirect.
- Noindex Detected
- The page has a
noindexdirective via<meta name="robots">or theX-Robots-Tagresponse header. Remove it from your sitemap — you're asking search engines to index a page while simultaneously instructing them not to. - Canonical Mismatch
- The page's
<link rel="canonical">tag points to a different URL than the sitemap entry. This signals the page is a duplicate; the canonical URL is the one that should be in your sitemap. - Network Error
- The URL timed out or returned a network-level error (DNS failure, connection refused, SSL error). These are treated as errors since the page is unreachable by crawlers.
Sitemap best practices
- Max 50,000 URLs per sitemap file — larger sitemaps must be split using a sitemap index.
- Max 50 MB uncompressed — compress with gzip (
.xml.gz) for large sitemaps. - Include only canonical, indexable URLs — no noindex pages, no redirect targets, no blocked paths.
- Keep
lastmodaccurate — use the actual last modified date, not today's date. False lastmod dates can waste crawl budget. - Submit to Google Search Console and Bing Webmaster Tools — do not rely on the
Sitemap:directive inrobots.txtalone. - Use HTTPS URLs — if your site is HTTPS, all sitemap URLs must use
https://. - Include trailing slashes consistently — decide on a URL format and use it everywhere to avoid duplicate content.
- Re-validate after major deployments — CMS migrations, URL structure changes, and server moves frequently introduce sitemap errors.
Frequently asked questions
- An XML sitemap is a file that lists all important URLs on your website, helping search engines discover and crawl your content efficiently. It's a core technical SEO requirement for any site with more than a handful of pages, especially for new sites with few inbound links.
- Broken URLs, redirect chains, and noindex pages in your sitemap waste crawl budget and confuse search engines. Validating your sitemap ensures every submitted URL is accessible, canonical, and indexable — maximising the return on Googlebot's limited crawl visits to your site.
- The validator checks every URL for: broken links (4xx/5xx HTTP responses), redirects (3xx responses), noindex directives (meta robots tag and X-Robots-Tag header), canonical mismatches (canonical points to a different URL), network errors, and XML parse errors for malformed sitemap files.
- A sitemap index is an XML file that references multiple child sitemaps instead of individual URLs. Large sites use sitemap indexes to split their URLs across multiple files, since each sitemap file supports a maximum of 50,000 URLs. The validator automatically follows sitemap index files and validates all child sitemaps up to 10 levels deep.
-
Yes.
POST /api/v1/sitemap-validator/validatewith your sitemap URL returns avalidationIdand WebSocket URL. Connect via WebSocket to streamsitemap:progressframes in real time. UseGET /api/v1/sitemap-validator/validation/:id/urlsto retrieve paginated results. See the API documentation for full details.
Related tools
- → Link Checker — crawl every link on your site, not just what's in the sitemap
- → SEO Site Audit — score every page for SEO health after fixing sitemap issues
- → Page Speed Inspector — check TTFB and performance for any URL from your sitemap