404 Auto-Heal Suggestions
The 404 Capture Log tells you which paths are bleeding traffic. The Auto-Healer tells you what to do about them.
How it works
┌──────────────────────────┐
│ Daily cron (04:00) │
└──────────┬───────────────┘
↓
┌──────────────────────────┐
│ Pull unresolved 404s │ filtered by min_hits (default ≥3)
└──────────┬───────────────┘
↓
┌──────────────────────────┐
│ Build live URL corpus │ from url_rewrite where redirect_type=0
└──────────┬───────────────┘
↓
┌──────────────────────────┐
│ FuzzyMatcher scores │ 3 strategies: slug, exact_token, fuzzy
└──────────┬───────────────┘
↓
┌──────────────────────────┐
│ Persist if score ≥ │ default min_confidence = 0.6
│ min_confidence │ into byte8_seosuite_404_suggestion
└──────────┬───────────────┘
↓
┌──────────────────────────┐
│ Auto-approve if score │ default auto_approve_at = 1.01 (off)
│ ≥ auto_approve_at │ if on, applies via RedirectManager
└──────────────────────────┘
The RedirectManager auto-marks the originating 404 row as resolved when the redirect is created — so the auto-healer won't keep generating suggestions for the same path on subsequent runs.
Match strategies
Each candidate live URL is scored by the highest-applicable strategy. A given 404 walks every entry in the corpus once and keeps the best match.
slug — last-segment exact match (~0.85+ confidence)
The 404's last URL segment matches the live URL's last URL segment exactly.
404: /old-shop/acme-pro-runner.html
Live: /shop/footwear/acme-pro-runner.html
└─ both end in "acme-pro-runner" → slug match
Strong signal — products and CMS pages typically keep their slug across URL reorganisations even when the path changes.
Base confidence 0.85, blended with token Jaccard for up to ~1.0.
exact_token — 404 tokens fully contained in live tokens (~0.7+)
Every token in the 404'd path is also present in the live URL. Catches reorganisations where the same words appear in a deeper path.
404: /running-shoes
Live: /shop/footwear/running-shoes/all
└─ "running" + "shoes" both present → exact_token match
Base 0.7, blended with Jaccard.
fuzzy — Jaccard + similar_text (≥0.6 by default)
Catch-all. Tokenises both paths, computes Jaccard similarity (intersection / union), blends with PHP's similar_text percentage at 60%/40%. Discards anything below 0.2 Jaccard early to avoid expensive similar_text calls.
404: /sneakers
Live: /shop/sneeker
└─ Jaccard ~0.5, similar_text ~0.7 → fuzzy ~0.58 (below default threshold, skipped)
Tunable via min_confidence config — lower = more suggestions, higher false-positive rate.
Configuration
Stores → Configuration → SEO Suite → Redirects & 404 capture
| Field | Default | Notes |
|---|---|---|
| Enable 404 auto-heal cron | No | Master switch (per store) |
| Min hits before auto-heal considers a 404 | 3 | Filters out one-off mistakes |
| Min confidence to persist a suggestion | 0.6 | Below this = no suggestion at all |
| Auto-approve threshold | 1.01 | At or above = applied immediately. 1.01 = never auto-approve |
| Cron budget per run | 50 | Max 404 logs scanned per run |
auto_heal_active is gated by log_404 being enabled — there's no point auto-healing if no 404s are being captured.
Admin grid
Marketing → SEO Suite → 404 Auto-Heal Suggestions
Default sort: confidence DESC so the highest-confidence matches surface first.
| Column | Notes |
|---|---|
| Confidence | 0.0 – 1.0 |
| 404'd path | The original captured path |
| Suggested target | The fuzzy-matched live URL |
| Target type | product / category / cms-page / custom |
| Strategy | Which matcher fired |
| Status | pending / approved / rejected / applied / failed |
| Batch | auto-heal-YYYYMMDD-HHMMSS |
Toolbar: Run Auto-Heal Now triggers an inline run for admins without cron access.
Per-row actions on pending rows:
- Approve & Apply → calls
RedirectManager→ creates 301 → marks suggestionapplied - Reject → marks
rejected, cron won't re-suggest the same target
Mass actions: Approve & Apply, Reject, Delete.
CLI
bin/magento seosuite:redirect:auto-heal # all enabled stores, config budget
bin/magento seosuite:redirect:auto-heal -s 1 # specific store
bin/magento seosuite:redirect:auto-heal -l 10 # cap suggestions this run
Output:
Batch auto-heal-20260426-143000 — scanned 47, suggested 31, applied 0, skipped 16, errors 0.
Review at Marketing → SEO Suite → 404 Auto-Heal Suggestions
Exit code is 1 only when all suggestion attempts fail with errors and zero are produced — useful as a CI check.
Hands-off ops
For a fully automated pipeline:
log_404 = Yes
auto_heal_active = Yes
auto_heal_min_hits = 5
auto_heal_min_confidence = 0.75
auto_heal_auto_approve_at = 0.9
This config will auto-apply any suggestion at confidence ≥ 0.9 (slug matches with strong token overlap), queue suggestions between 0.75 and 0.9 for human review, and ignore everything below 0.75. The 5-hit minimum filters out long-tail noise.
For a more conservative posture, set auto_approve_at = 1.01 and review every suggestion in the queue before applying.
Idempotency
Re-runs are cheap and safe:
- 404 logs that already have a non-rejected suggestion against the same target → skipped
- Rejected suggestions stay rejected; the cron won't re-suggest the same target for the same 404
- New 404s discovered between runs get fresh suggestions
The unique constraint (log_id, suggested_target) enforces this at the DB layer.
What gets suggested
The corpus is built from url_rewrite rows where redirect_type = 0 — i.e. real, live URLs serving real content. Existing redirects (redirect_type = 301|302) are excluded so the auto-healer never suggests redirecting to another redirect.
Coverage:
- All product URLs (current url_rewrite entries)
- All category URLs
- All CMS page URLs
- Custom URL rewrites you've created (e.g. landing pages with non-standard URLs)
Out of scope: URLs that aren't in url_rewrite (e.g. controller routes you've added without rewrites). Those need manual redirects.
Performance
- Corpus is built per-store and cached in-memory for the cron run (one DB query per store)
- Each 404 path costs O(corpus_size) of integer-set intersection operations + at most one
similar_textcall when Jaccard ≥ 0.2 - A typical run of 50 404s against a 5000-URL corpus completes in ~1–3 seconds
If your corpus is huge (>50k URLs), drop auto_heal_budget_per_run to 20–30 to keep the cron under a minute.
Limitations
- Doesn't crawl external content (e.g. Search Console's "discovered URLs") — only matches against URLs Magento knows about
- Doesn't speak Magento's URL key history (a renamed product's old url_key isn't preserved) — but the OOS Rules Engine and the manual Redirects grid both create entries the matcher will respect
- No cross-language matching (a
de_DE404 won't match anen_GBURL by default since locales typically have different slugs)
Roadmap
- v2.11+: regex redirects (matcher-aware)
- v2.11+: integrate with Index Budget Audit so audit findings can suggest the same fixes