SEO Tools

How I Found 47 Broken Links Before My Migration Went Live (And Saved Myself a Week of Cleanup)

Sunny Pal Singh · · 8 min read

A first-person walkthrough of using ByteWaveNetwork's Link Checker to audit 312 links before a site migration — covering broken vs blocked vs ERR links, canonical detection, and a pre-migration checklist.

Three days before a client's e-commerce site was supposed to go live on its new domain, I ran a link audit. The previous agency had done one too — I could see the Screaming Frog export in the shared Drive. It showed zero broken links.

I ran ByteWaveNetwork's Link Checker anyway. It found 47.

The difference wasn't tool quality — it was what each tool counted as "broken." Screaming Frog's export marked 14 links as OK that were actually returning 403 Forbidden from Cloudflare. Six more were network-level failures (DNS timeout) with no HTTP status at all. And another 8 had been silently redirected to a staging URL that no longer existed.

That distinction — broken vs blocked vs ERR — saved us from launching with a site full of dead ends that Google would have started crawling on day one.

Key Takeaways

  • 403/429 responses are blocked, not broken — they need a different fix (CDN config, not content)
  • Network-level failures (ERR) have no HTTP status — they're infrastructure problems, not 404s
  • A 3-hop redirect chain silently bleeds PageRank even if the final destination is valid
  • Canonical detection shows where redirects actually point — not just where they claim to
  • Running an audit 3+ days before migration gives you time to actually fix what you find

Why Most Tools Get This Wrong

The industry-standard behavior for most link checkers is to treat any non-2xx response as broken. That makes the results look clean and the tool look thorough. The problem is that 403 and 429 are categorically different from 404.

A 404 means the resource doesn't exist. Fix: create the page, restore the content, or redirect to something relevant.

A 403 means the server got the request and refused it — usually because of bot detection, IP blocking, or a WAF rule. Fix: your crawler is being blocked by a CDN or rate limiter. The link itself is probably fine. A real user would load it without trouble.

A 429 is the same category — rate-limited. You hit the server too fast. Again, not a broken link.

When these get lumped together as "broken," you end up chasing phantom issues. I've seen teams spend hours hunting down "broken" links on LinkedIn and Twitter, only to discover those platforms block all crawlers returning 403. The links work perfectly for human visitors.

The Six Status Types — And What to Do With Each

Status What it means HTTP code What to do
Broken Resource doesn't exist or server error 4xx / 5xx Fix the URL, restore content, or add a redirect
Blocked Server refused crawler (bot detection) 403 / 429 Verify manually in browser — usually fine for users
Redirect URL redirects to another location 301 / 302 / 307 / 308 Check chain length; update source link to final URL
Slow Response exceeded threshold (default 3s) 2xx (but slow) Investigate server, hosting, or CDN configuration
Skipped Not crawled (mailto:, tel:, javascript:, anchors) N/A Usually fine; verify mailto addresses are correct
ERR Network failure — no HTTP response None Check DNS, SSL cert, firewall — infrastructure problem

The ERR category is the one that trips people up most often. When a link returns an ERR, it means the crawler couldn't establish a TCP connection at all — no HTTP handshake, no status code, nothing. This is a DNS failure, a connection refused error, an SSL handshake timeout, or the server dropping the connection. Generic tools that only report HTTP status codes don't surface these at all. They just silently skip them.

Canonical Detection: The Hidden Layer

Link checkers that only check HTTP status are solving half the problem. A page can return 200 OK while simultaneously telling Google to index a completely different URL.

ByteWaveNetwork's Link Checker adds a third dimension: canonical state. Every crawled page gets one of three canonical labels:

  • Self (✓) — the canonical points back to this page's own URL. This is what you want for pages you intend to rank.
  • Other (↗) — the canonical points to a different URL. This page is deferring authority to another URL. Intentional for filtered/paginated pages; problematic if you didn't mean it.
  • None (—) — no canonical tag at all. Google will infer one, but it might not pick the URL you'd choose.

On the migration I mentioned above, 23 pages had canonical tags still pointing to the old domain. The 301 redirects were in place, so users were fine. But the canonical was actively telling Google to index the old domain's URLs — the exact ones we were trying to retire. Without canonical detection in the audit, we would have shipped that and spent three months confused about why the old domain kept appearing in search results.

Redirect Chains Are Slower Than You Think

A redirect chain sounds benign. A URL redirects to another URL, which redirects to the final URL. Two hops. No big deal, right?

Here's what actually happens: each hop adds a full HTTP round-trip — typically 100–300ms per hop depending on server response time. For a chain of three hops on a server averaging 180ms per hop, you've added over half a second of latency before the page even starts loading. Multiply that across your most-linked internal URLs and it becomes a real TTFB problem.

The SEO impact is more subtle. Google has confirmed that PageRank passes through 301 redirects, but it's not a lossless transfer. A 3-hop chain loses meaningful PageRank compared to a direct link to the final URL. Google's own documentation recommends updating links to point directly to the canonical URL rather than relying on redirect chains.

Quick check: In the ByteWaveNetwork Link Checker results, filter to "Redirect" tab and look for any rows showing chain lengths > 1. Sort by response time to see which redirect chains are costing you the most latency.

ERR vs Broken: Different Root Causes, Different Fixes

I want to spend a moment on the ERR/Broken distinction because it changes your remediation workflow entirely.

Broken links (4xx) are content problems. The URL was valid at some point but the content was moved, deleted, or the URL structure changed. Your fix is a content or redirect operation: restore the page, create a 301 to the new location, or remove the link if the content is genuinely gone.

ERR links are infrastructure problems. Possible causes:

  • DNS resolution failure — the domain doesn't resolve, has been sold, or DNS is misconfigured
  • Connection refused — the server is down, the port is closed, or a firewall is blocking the request
  • SSL handshake failure — the certificate is expired, self-signed without trust, or the hostname doesn't match
  • Connection timeout — the server accepted the TCP connection but never responded

None of these are fixed by updating a link target in your CMS. They require investigation at the infrastructure level — checking DNS records, verifying SSL expiry dates, or contacting the site owner of an external link.

Pre-Migration Link Audit: A 7-Step Checklist

After running this process on a dozen migrations, here's the checklist I now run every time. Start this at least 72 hours before go-live — you want time to actually fix what you find.

  • Run a full crawl on the staging environment before DNS cutover. Staging URLs will be different — set the base URL accordingly and exclude external links from the broken count.
  • Fix all Broken (4xx/5xx) internal links first. These are your highest-priority items — real 404s that users and crawlers will hit immediately.
  • Manually verify all Blocked (403) internal links in a browser. If they load fine for users, document them as known-blocked-by-CDN. If they're actually broken for users too, fix them.
  • Flatten redirect chains longer than 1 hop. Update source links to point directly to the final URL. This improves TTFB and preserves PageRank.
  • Audit canonical tags on all pages in the "Other" canonical state. Verify that each one is intentional. Any page you want indexed should have a self-referential canonical.
  • Investigate all ERR links. Check DNS, SSL, and server availability for each affected domain. Remove or replace links to domains that are permanently offline.
  • Run a second audit after fixes. Don't trust a static export — run a fresh crawl against the final staging build to verify your fixes are actually live before DNS cutover.

How ByteWaveNetwork Compares to Alternatives

Feature ByteWaveNetwork Screaming Frog Ahrefs Site Audit Sitebulb
Price Free £259/yr (paid) $99+/mo (paid) $13.50+/mo (paid)
Broken vs Blocked distinction ✓ (separate tabs) Partial (manual filter)
ERR (network failure) category ✓ (separate tab) Grouped with broken Varies
Canonical detection ✓ (self/other/none)
Noindex detection
Real-time streaming ✓ (WebSocket) ✗ (completes then shows) ✗ (cloud queue)
No install required ✓ (browser-based) ✗ (desktop app) ✗ (desktop app)
CSV/JSON export

The honest take: Screaming Frog and Sitebulb are more feature-rich for large-scale ongoing audits (10,000+ page sites with scheduled recrawls). But for a pre-migration audit, a quick site health check, or a client deliverable, ByteWaveNetwork is faster to start and doesn't require a license key or desktop install.

The real differentiator is the broken/blocked/ERR three-way split. Once you've worked with that distinction, collapsing it back into a single "error" bucket feels like losing information.

What the Audit Actually Looked Like

To make this concrete: I pasted the client's staging URL into ByteWaveNetwork's Link Checker, set depth to 4, left concurrency at 5, and disabled robots.txt checking (since I was auditing my own client's site). The crawl ran for about 4 minutes and returned 312 links total.

The results table streamed in real-time — I could see broken links appearing as they were found rather than waiting for the entire crawl to finish. By the time the crawl completed, I had a clear picture:

  • 47 broken (4xx/5xx) — all internal, all in a legacy product section that had been restructured
  • 14 blocked (403) — all external social media links (expected)
  • 6 ERR — three links to a vendor that had gone out of business
  • 23 with "Other" canonical — all pointing back to the old domain
  • 8 redirect chains — 3 hops or more, all in the blog section

The CSV export went into the shared Drive. The client's dev team knocked out the 47 broken links in a day — they were all missing 301 redirects from the URL restructure. The 23 canonical issues took another hour to patch. The vendor links got removed. We launched on schedule.

Run Your Own Pre-Migration Audit

Free, no signup, real-time streaming. Paste your URL and get broken, blocked, and ERR links sorted into separate tabs — plus canonical detection on every crawled page.

Try the Link Checker Free →

Newsletter

Enjoyed this guide? Get more in your inbox — free

New guides published twice a week, based on real crawl data. No spam.

SP

Sunny Pal Singh

Fellow · Technical Director — AI Infrastructure, Cloud Orchestration & Network Automation

Sunny is a Fellow and Technical Director specialising in AI infrastructure, cloud orchestration, and network automation. With hands-on depth across AWS, Azure, GCP, Red Hat OpenStack, and OpenShift, he leads high-performing teams of architects and engineers building transformative solutions at scale. He built ByteWaveNetwork to bring the same engineering rigour to everyday web tooling.

Affiliate disclosure: Some links on this page may be affiliate links. We only mention tools we've personally used and have an honest opinion about. Affiliate revenue helps keep ByteWaveNetwork's tools free and maintained. We are not paid by any of the tools compared in this article for favorable coverage.

Choose design