Sports Dashboard

MI Bivariate Poisson + Dixon-Coles + Elo

← Back to Blog
|Signal Test|REJECTED

Shot-Level xG for Variance Regression: More Data, Same Nothing

Re-tested shot-level xG variance signal with 3.5x more data (3,489 matches, 3-4 seasons). Marginal ROI unchanged at +0.1% (bootstrap p=0.47). Shot-level and match-level xG produce equivalent regression candidates at 10-match window granularity. Hypothesis falsified: data volume was not the bottleneck.

Marginal ROI
+0.1%
unchanged
Shot Data
3,489
matches (3.5x more)
Bootstrap p
0.47
not significant
Gates
7/10
failed 5, 8, 9

Shot-Level xG for Variance Regression: More Data, Same Nothing

We hypothesized that FotMob shot-level xG would produce better regression candidates than match-level xG for non-Big-5 leagues. The first test showed +0.1% marginal with only one season of shot data. We backfilled to 3-4 seasons (3,489 matches, 59 league files) and re-ran the full pipeline.

Result: still +0.1%. The hypothesis is dead.

The Question

Our variance regression filter flags teams whose actual goals deviate from expected goals over a 10-match window. The "expected goals" input comes from either:

  • Match-level xG (FootyStats aggregate per match)
  • Shot-level xG (FotMob per-shot coordinates, summed to match level)

The theory: shot-level xG captures shot *quality* independent of finishing luck. Match-level xG bakes in finishing outcomes (shots on target correlate with goals), making the gap between actual and expected less predictive of regression. In offline validation, shot-level xG identified regression candidates with 82.1% accuracy vs 75.6% for match-level.

First test (1 season, ~1K matches): marginal ROI +0.1%, bootstrap p=0.43. We attributed this to shallow team histories — only 10-15 matches per team. The fix: backfill historical data so teams have 30-40 match histories.

What We Found

MetricPrevious (1 season)New (3-4 seasons)
Shot data matches~1,0003,489
Base entry-adj ROI+7.8%+7.8%
Shot entry-adj ROI+7.9%+7.9%
Marginal entry-adj ROI+0.1%+0.1%
Bootstrap p0.430.47
Bootstrap CI[-1.5%, +1.6%]
Walk-forward folds12/1212/12
Gates passed8/107/10

The marginal is identical. The bootstrap p actually *worsened* slightly. The confidence interval straddles zero symmetrically.

The Nuance

Shot-level xG does change which bets get made — 613 added, 1,675 removed. But neither group is systematically better or worse:

Bet GroupNCLVEntry-adj ROI
Added (shot-only finds)613+11.7%+3.8%
Removed (base-only finds)1,675+11.9%+5.6%
Shared (both agree)14,492

The removed bets actually have slightly better entry-adj ROI than the added ones. Shot-level xG isn't finding *better* regression candidates — it's finding *different* ones of roughly equal quality.

Per-league breakdown

A few leagues show real movement but it's mixed:

LeagueDelta ROIDirection
Brazil A+2.4%Better
Scottish Prem+1.5%Better
Greek Super+1.0%Better
National League+0.9%Better
UCL-1.4%Worse
EPL-1.1%Worse
Ligue 1-1.1%Worse
League Two-0.8%Worse

No systematic pattern. The winners and losers roughly cancel out.

By season

SeasonDelta ROI
2022 (calendar)+4.2%
2015-16+1.8%
2025 (calendar)+1.8%
2024 (calendar)-2.3%
2017-18-0.8%

Also mixed with no trend over time.

Why It Didn't Work

The mechanism sounded right but the implementation couldn't capture it. Three reasons:

  1. 10-match window is too coarse. The variance filter uses a sliding window of 10 matches. At this granularity, the difference between shot-level and match-level xG averages out. Both produce similar 10-match expectedGF sums. The theoretical advantage of shot-level xG (removing finishing luck) only matters if finishing luck is persistent within a 10-match window — and at the team level, it mostly isn't.
  1. Match-level xG already captures enough. FootyStats match-level xG has correlation 0.35 with goals — "noisy" compared to shot-level. But for regression detection, what matters is the *gap* between expected and actual. A noisier xG estimate means bigger gaps, which means more aggressive regression flags. The noise isn't random — it's correlated with the factors (shot quality, finishing variance) that drive regression.
  1. The 82.1% vs 75.6% offline validation didn't translate. The offline test measured regression *accuracy* (did the flagged team regress?). The backtest measures *profitability* (did betting on that regression make money?). Accuracy and profitability are different — a more accurate filter that removes profitable bets while adding unprofitable ones is worse for the portfolio even if its regression predictions are more correct.

What This Means

The fotmob-shot-xg-variance signal is parked permanently. We tested it twice:

  • Once with 1 season of shot data → +0.1% marginal
  • Once with 3-4 seasons of shot data → +0.1% marginal

The data volume hypothesis is falsified. There's no iteration path that changes the fundamental problem: at the 10-match window granularity, shot-level and match-level xG are functionally equivalent for regression detection.

The shot data itself remains valuable for other purposes (finishing persistence research, set piece analysis, xG model training). But as a variance regression input, it doesn't move the needle.

What's Next

Nothing for this signal. It joins the graveyard.

The broader lesson: higher-precision inputs don't automatically produce better outputs when the downstream filter is coarse. The variance regression filter's 10-match window and 3.0 gap threshold are too blunt to benefit from shot-level granularity. If we ever revisit variance regression methodology (continuous scoring instead of binary threshold, match-weighted windows, team-specific baselines), shot-level xG could be re-evaluated — but that's a different signal entirely.

REJECTEDSignal: fotmob-shot-xg-variance|2026-04-07