404 Capture Log
An observer records every 404 your storefront serves, aggregates them by (request_path, store_id), and lets editors create redirects for the ones that matter most without leaving the admin.
Why capture 404s
Most stores have hidden 404 traffic from:
- Old URLs from a pre-Magento site that still get linked to
- Blog post URLs that changed when the content was rewritten
- Product URLs from a catalog migration where url_keys changed
- Deep-links from broken external sites
Without a capture system you only learn about these from sporadic Search Console reports. With one, you see every hit, sorted by frequency, with referrer + user-agent context.
Configuration
Stores → Configuration → SEO Suite → Redirects & 404 capture
| Field | Default |
|---|---|
| Capture 404s into the log | No |
| 404 log retention (days) | 90 |
| Additional ignore patterns | (empty — defaults already cover sensible noise) |
Per-store overrides supported.
Default ignore patterns
The capture observer skips paths matching:
/static/*, /media/*, /pub/static/*, /pub/media/*,
/api/*, /rest/*, /graphql*,
*.map, *.ico, *.txt, *.xml,
/favicon*, /apple-touch-icon*
These are static-asset 404s, robots.txt probes, sitemap requests, sourcemap fetches — noise that would dwarf your real 404 traffic. Add additional site-specific patterns via the config field (one per line, * wildcard supported).
How aggregation works
The observer doesn't insert one row per 404 hit — it INSERT … ON DUPLICATE KEY UPDATEs on (request_path, store_id):
- First hit → new row with
hit_count = 1 - Subsequent hits →
hit_countincremented,last_seenupdated, latestreferreranduser_agentoverwritten query_stringoverwritten on each hit (last-wins)
So a path that's hit 1000 times produces one row with hit_count = 1000, not 1000 rows.
Admin grid
Marketing → SEO Suite → 404 Capture Log
Default sort: hit_count DESC so the worst offenders surface first.
| Column | Notes |
|---|---|
| ID | Internal entity_id |
| Hits | Total since first_seen for this path |
| Path | The 404'd request_path |
| Query string | Last query string seen |
| Store | Store ID where the 404 occurred |
| Last referrer | Most recent referrer header |
| Last UA | Most recent user-agent (truncated to 510 chars) |
| Resolved | Yes when a redirect has been created for this path |
| First seen / Last seen | Timestamps |
Per-row action: Create redirect — only shown when not resolved. Lands you on the redirect form with request_path pre-populated.
Workflow: from 404 to 301
- Editor opens the 404 Capture Log → sees
/old-product-page.htmlwith 247 hits, last hit yesterday - Clicks Create redirect → redirect form opens with
request_path = old-product-page.html - Editor fills in
target_path = new-product-page.html, picks 301 - Save →
RedirectManagercreates the rewrite + sidecar row + marks the 404 entry as resolved - Future 404s for this path cease (request now serves a 301)
- Existing 404 row stays in the grid as
resolved=Yesfor audit history
Daily purge cron
byte8_seosuite_purge_404_log runs at 30 3 * * *. Purges entries with last_seen < (now - retention_days). Multi-store installs use the LARGEST retention configured across stores, so no store loses data another store wanted to keep.
If the feature is disabled on every store, the cron exits without touching any rows.
Performance
Each 404 hit triggers one INSERT … ON DUPLICATE KEY UPDATE against an indexed table. On a healthy database this is sub-millisecond — invisible to the response cycle. The observer wraps the call in try/catch so a database failure can never break the 404 page itself (logged to system.log instead).
For very high-traffic 404s (e.g. a viral broken link with 100/sec), you'd want to either:
- Add the path to the ignore list (if it's truly unfixable noise), or
- Create the redirect immediately so the 404 stops happening
Privacy
The log stores: requested path, query string, store ID, referrer header, user-agent header. No IP addresses, no session/cookie data. Referrer and user-agent both cap at 510 chars to avoid table bloat from absurdly long values.
Under GDPR, this typically falls under "legitimate interest" (operating a website) and doesn't require consent — but consult your DPO. The 90-day retention default is conservative; lower it if your privacy posture requires.
Next
- Redirects Manager — what happens when you click Create redirect
- CSV format — for bulk-importing 404 → 301 mappings from existing data