Sports Dashboard

MI Bivariate Poisson + Dixon-Coles + Elo

← Back to Blog
|System Update

When Three Data Sources Die in One Session: Building Pipeline Resilience

FotMob API died (404), CDN blocked (403), Sofascore blocked from server. Fixed GK PSxG via CDN underscore format, built Sofascore warm standby (270K shots in Supabase), added Discord alerting. Every critical data need now has 2+ sources except GK PSxG.

The Question

In one session, three data sources died: FotMob /api/ endpoints (404), data.fotmob.com CDN (403 from server), Sofascore API (403 from datacenter IPs). How resilient is our data pipeline, and what happens when sources break?

What We Found

Every critical data need now has at least 2 sources, except GK PSxG.

Data NeedPrimary SourceBackupAutomated
Shot x,y (Big 5)Understat scraperStatsBomb open dataLocal
Shot x,y (non-Big-5)FotMob page scrapingSofascore (local)**Server cron** ✅
Match-level xGSofascore → SupabaseFotMob match-level cacheLocal → server
GK PSxGdata.fotmob.com CDN**NONE** ⚠️Server
Pinnacle oddsThe Odds APIfootball-data.co.ukServer
Match resultsFootyStats APIfootball-data.co.ukServer

What Was Fixed

FotMob API is dead. All /api/ endpoints return 404 (permanent, not temporary). But FotMob page routes (/matches/{slug}) work from everywhere — including Hetzner. This is how we get shot data now.

data.fotmob.com CDN confusion: The CDN blocks some URL formats but not others. expected_goals.json → 200. goals_prevented.json → 403. _goals_prevented.json (with underscore) → 200. The GK PSxG fix uses the underscore format and works.

Sofascore blocks datacenter IPs but works locally. The Sofascore → Supabase pipeline runs from the local Mac, pushes to Supabase, and the server cron reads from Supabase. 270K shots + 10K matches in the warm standby.

FotMob page scraping from Hetzner: Confirmed working (200, full __NEXT_DATA__). Unlike Sofascore, FotMob doesn't block datacenter IPs on their page routes. This is why the shot scraper runs as a server cron.

What Was Built

ComponentPurpose
Server cron (09:30 UTC)FotMob shot scraping, daily, 20 leagues
Alerting wrapperDiscord #red-alert when 0 matches fetched
Sofascore warm standby270K shots in Supabase, activatable in 5 min
GK PSxG fixCDN with numeric tournament IDs + underscore prefix
`--incremental` flagOnly scrapes last 3 days, skips cached

What This Means

The biggest remaining risk is GK PSxG with no backup. The CDN format could change at any time. FotMob league stats pages have GK data in __NEXT_DATA__ (verified) — this is the identified backup path but not yet built.

Key principle established: Every critical data source needs 2 providers and 1 automated backup. The cost of maintaining warm standby (Sofascore alongside FotMob) is negligible vs discovering your pipeline silently failed for 2 weeks.

What's Next

  1. Build GK PSxG backup (FotMob page scraping for league stats)
  2. Monthly Sofascore health check
  3. Automated data quality alerts: "did we get xG for >90% of yesterday's matches?"