Skip to main content

404 Capture Log

An observer records every 404 your storefront serves, aggregates them by (request_path, store_id), and lets editors create redirects for the ones that matter most without leaving the admin.

Why capture 404s

Most stores have hidden 404 traffic from:

  • Old URLs from a pre-Magento site that still get linked to
  • Blog post URLs that changed when the content was rewritten
  • Product URLs from a catalog migration where url_keys changed
  • Deep-links from broken external sites

Without a capture system you only learn about these from sporadic Search Console reports. With one, you see every hit, sorted by frequency, with referrer + user-agent context.

Configuration

Stores → Configuration → SEO Suite → Redirects & 404 capture

FieldDefault
Capture 404s into the logNo
404 log retention (days)90
Additional ignore patterns(empty — defaults already cover sensible noise)

Per-store overrides supported.

Default ignore patterns

The capture observer skips paths matching:

/static/*, /media/*, /pub/static/*, /pub/media/*,
/api/*, /rest/*, /graphql*,
*.map, *.ico, *.txt, *.xml,
/favicon*, /apple-touch-icon*

These are static-asset 404s, robots.txt probes, sitemap requests, sourcemap fetches — noise that would dwarf your real 404 traffic. Add additional site-specific patterns via the config field (one per line, * wildcard supported).

How aggregation works

The observer doesn't insert one row per 404 hit — it INSERT … ON DUPLICATE KEY UPDATEs on (request_path, store_id):

  • First hit → new row with hit_count = 1
  • Subsequent hits → hit_count incremented, last_seen updated, latest referrer and user_agent overwritten
  • query_string overwritten on each hit (last-wins)

So a path that's hit 1000 times produces one row with hit_count = 1000, not 1000 rows.

Admin grid

Marketing → SEO Suite → 404 Capture Log

Default sort: hit_count DESC so the worst offenders surface first.

ColumnNotes
IDInternal entity_id
HitsTotal since first_seen for this path
PathThe 404'd request_path
Query stringLast query string seen
StoreStore ID where the 404 occurred
Last referrerMost recent referrer header
Last UAMost recent user-agent (truncated to 510 chars)
ResolvedYes when a redirect has been created for this path
First seen / Last seenTimestamps

Per-row action: Create redirect — only shown when not resolved. Lands you on the redirect form with request_path pre-populated.

Workflow: from 404 to 301

  1. Editor opens the 404 Capture Log → sees /old-product-page.html with 247 hits, last hit yesterday
  2. Clicks Create redirect → redirect form opens with request_path = old-product-page.html
  3. Editor fills in target_path = new-product-page.html, picks 301
  4. Save → RedirectManager creates the rewrite + sidecar row + marks the 404 entry as resolved
  5. Future 404s for this path cease (request now serves a 301)
  6. Existing 404 row stays in the grid as resolved=Yes for audit history

Daily purge cron

byte8_seosuite_purge_404_log runs at 30 3 * * *. Purges entries with last_seen < (now - retention_days). Multi-store installs use the LARGEST retention configured across stores, so no store loses data another store wanted to keep.

If the feature is disabled on every store, the cron exits without touching any rows.

Performance

Each 404 hit triggers one INSERT … ON DUPLICATE KEY UPDATE against an indexed table. On a healthy database this is sub-millisecond — invisible to the response cycle. The observer wraps the call in try/catch so a database failure can never break the 404 page itself (logged to system.log instead).

For very high-traffic 404s (e.g. a viral broken link with 100/sec), you'd want to either:

  • Add the path to the ignore list (if it's truly unfixable noise), or
  • Create the redirect immediately so the 404 stops happening

Privacy

The log stores: requested path, query string, store ID, referrer header, user-agent header. No IP addresses, no session/cookie data. Referrer and user-agent both cap at 510 chars to avoid table bloat from absurdly long values.

Under GDPR, this typically falls under "legitimate interest" (operating a website) and doesn't require consent — but consult your DPO. The 90-day retention default is conservative; lower it if your privacy posture requires.

Next

  • Redirects Manager — what happens when you click Create redirect
  • CSV format — for bulk-importing 404 → 301 mappings from existing data