Skip to main content

404 Auto-Heal Suggestions

The 404 Capture Log tells you which paths are bleeding traffic. The Auto-Healer tells you what to do about them.

How it works

┌──────────────────────────┐
│ Daily cron (04:00) │
└──────────┬───────────────┘

┌──────────────────────────┐
│ Pull unresolved 404s │ filtered by min_hits (default ≥3)
└──────────┬───────────────┘

┌──────────────────────────┐
│ Build live URL corpus │ from url_rewrite where redirect_type=0
└──────────┬───────────────┘

┌──────────────────────────┐
│ FuzzyMatcher scores │ 3 strategies: slug, exact_token, fuzzy
└──────────┬───────────────┘

┌──────────────────────────┐
│ Persist if score ≥ │ default min_confidence = 0.6
│ min_confidence │ into byte8_seosuite_404_suggestion
└──────────┬───────────────┘

┌──────────────────────────┐
│ Auto-approve if score │ default auto_approve_at = 1.01 (off)
│ ≥ auto_approve_at │ if on, applies via RedirectManager
└──────────────────────────┘

The RedirectManager auto-marks the originating 404 row as resolved when the redirect is created — so the auto-healer won't keep generating suggestions for the same path on subsequent runs.

Match strategies

Each candidate live URL is scored by the highest-applicable strategy. A given 404 walks every entry in the corpus once and keeps the best match.

slug — last-segment exact match (~0.85+ confidence)

The 404's last URL segment matches the live URL's last URL segment exactly.

404:   /old-shop/acme-pro-runner.html
Live: /shop/footwear/acme-pro-runner.html
└─ both end in "acme-pro-runner" → slug match

Strong signal — products and CMS pages typically keep their slug across URL reorganisations even when the path changes.

Base confidence 0.85, blended with token Jaccard for up to ~1.0.

exact_token — 404 tokens fully contained in live tokens (~0.7+)

Every token in the 404'd path is also present in the live URL. Catches reorganisations where the same words appear in a deeper path.

404:   /running-shoes
Live: /shop/footwear/running-shoes/all
└─ "running" + "shoes" both present → exact_token match

Base 0.7, blended with Jaccard.

fuzzy — Jaccard + similar_text (≥0.6 by default)

Catch-all. Tokenises both paths, computes Jaccard similarity (intersection / union), blends with PHP's similar_text percentage at 60%/40%. Discards anything below 0.2 Jaccard early to avoid expensive similar_text calls.

404:   /sneakers
Live: /shop/sneeker
└─ Jaccard ~0.5, similar_text ~0.7 → fuzzy ~0.58 (below default threshold, skipped)

Tunable via min_confidence config — lower = more suggestions, higher false-positive rate.

Configuration

Stores → Configuration → SEO Suite → Redirects & 404 capture

FieldDefaultNotes
Enable 404 auto-heal cronNoMaster switch (per store)
Min hits before auto-heal considers a 4043Filters out one-off mistakes
Min confidence to persist a suggestion0.6Below this = no suggestion at all
Auto-approve threshold1.01At or above = applied immediately. 1.01 = never auto-approve
Cron budget per run50Max 404 logs scanned per run

auto_heal_active is gated by log_404 being enabled — there's no point auto-healing if no 404s are being captured.

Admin grid

Marketing → SEO Suite → 404 Auto-Heal Suggestions

Default sort: confidence DESC so the highest-confidence matches surface first.

ColumnNotes
Confidence0.0 – 1.0
404'd pathThe original captured path
Suggested targetThe fuzzy-matched live URL
Target typeproduct / category / cms-page / custom
StrategyWhich matcher fired
Statuspending / approved / rejected / applied / failed
Batchauto-heal-YYYYMMDD-HHMMSS

Toolbar: Run Auto-Heal Now triggers an inline run for admins without cron access.

Per-row actions on pending rows:

  • Approve & Apply → calls RedirectManager → creates 301 → marks suggestion applied
  • Reject → marks rejected, cron won't re-suggest the same target

Mass actions: Approve & Apply, Reject, Delete.

CLI

bin/magento seosuite:redirect:auto-heal           # all enabled stores, config budget
bin/magento seosuite:redirect:auto-heal -s 1 # specific store
bin/magento seosuite:redirect:auto-heal -l 10 # cap suggestions this run

Output:

Batch auto-heal-20260426-143000 — scanned 47, suggested 31, applied 0, skipped 16, errors 0.
Review at Marketing → SEO Suite → 404 Auto-Heal Suggestions

Exit code is 1 only when all suggestion attempts fail with errors and zero are produced — useful as a CI check.

Hands-off ops

For a fully automated pipeline:

log_404 = Yes
auto_heal_active = Yes
auto_heal_min_hits = 5
auto_heal_min_confidence = 0.75
auto_heal_auto_approve_at = 0.9

This config will auto-apply any suggestion at confidence ≥ 0.9 (slug matches with strong token overlap), queue suggestions between 0.75 and 0.9 for human review, and ignore everything below 0.75. The 5-hit minimum filters out long-tail noise.

For a more conservative posture, set auto_approve_at = 1.01 and review every suggestion in the queue before applying.

Idempotency

Re-runs are cheap and safe:

  • 404 logs that already have a non-rejected suggestion against the same target → skipped
  • Rejected suggestions stay rejected; the cron won't re-suggest the same target for the same 404
  • New 404s discovered between runs get fresh suggestions

The unique constraint (log_id, suggested_target) enforces this at the DB layer.

What gets suggested

The corpus is built from url_rewrite rows where redirect_type = 0 — i.e. real, live URLs serving real content. Existing redirects (redirect_type = 301|302) are excluded so the auto-healer never suggests redirecting to another redirect.

Coverage:

  • All product URLs (current url_rewrite entries)
  • All category URLs
  • All CMS page URLs
  • Custom URL rewrites you've created (e.g. landing pages with non-standard URLs)

Out of scope: URLs that aren't in url_rewrite (e.g. controller routes you've added without rewrites). Those need manual redirects.

Performance

  • Corpus is built per-store and cached in-memory for the cron run (one DB query per store)
  • Each 404 path costs O(corpus_size) of integer-set intersection operations + at most one similar_text call when Jaccard ≥ 0.2
  • A typical run of 50 404s against a 5000-URL corpus completes in ~1–3 seconds

If your corpus is huge (>50k URLs), drop auto_heal_budget_per_run to 20–30 to keep the cron under a minute.

Limitations

  • Doesn't crawl external content (e.g. Search Console's "discovered URLs") — only matches against URLs Magento knows about
  • Doesn't speak Magento's URL key history (a renamed product's old url_key isn't preserved) — but the OOS Rules Engine and the manual Redirects grid both create entries the matcher will respect
  • No cross-language matching (a de_DE 404 won't match an en_GB URL by default since locales typically have different slugs)

Roadmap

  • v2.11+: regex redirects (matcher-aware)
  • v2.11+: integrate with Index Budget Audit so audit findings can suggest the same fixes