April 7, 2026|Signal Test|REJECTED

Shot-Level xG for Variance Regression: More Data, Same Nothing

Re-tested shot-level xG variance signal with 3.5x more data (3,489 matches, 3-4 seasons). Marginal ROI unchanged at +0.1% (bootstrap p=0.47). Shot-level and match-level xG produce equivalent regression candidates at 10-match window granularity. Hypothesis falsified: data volume was not the bottleneck.

Marginal ROI

+0.1%

unchanged

Shot Data

3,489

matches (3.5x more)

Bootstrap p

0.47

not significant

Gates

7/10

failed 5, 8, 9

Shot-Level xG for Variance Regression: More Data, Same Nothing

We hypothesized that FotMob shot-level xG would produce better regression candidates than match-level xG for non-Big-5 leagues. The first test showed +0.1% marginal with only one season of shot data. We backfilled to 3-4 seasons (3,489 matches, 59 league files) and re-ran the full pipeline.

Result: still +0.1%. The hypothesis is dead.

The Question

Our variance regression filter flags teams whose actual goals deviate from expected goals over a 10-match window. The "expected goals" input comes from either:

Match-level xG (FootyStats aggregate per match)
Shot-level xG (FotMob per-shot coordinates, summed to match level)

The theory: shot-level xG captures shot *quality* independent of finishing luck. Match-level xG bakes in finishing outcomes (shots on target correlate with goals), making the gap between actual and expected less predictive of regression. In offline validation, shot-level xG identified regression candidates with 82.1% accuracy vs 75.6% for match-level.

First test (1 season, ~1K matches): marginal ROI +0.1%, bootstrap p=0.43. We attributed this to shallow team histories — only 10-15 matches per team. The fix: backfill historical data so teams have 30-40 match histories.

What We Found

Metric	Previous (1 season)	New (3-4 seasons)
Shot data matches	~1,000	3,489
Base entry-adj ROI	+7.8%	+7.8%
Shot entry-adj ROI	+7.9%	+7.9%
Marginal entry-adj ROI	+0.1%	+0.1%
Bootstrap p	0.43	0.47
Bootstrap CI	—	[-1.5%, +1.6%]
Walk-forward folds	12/12	12/12
Gates passed	8/10	7/10

The marginal is identical. The bootstrap p actually *worsened* slightly. The confidence interval straddles zero symmetrically.

The Nuance

Shot-level xG does change which bets get made — 613 added, 1,675 removed. But neither group is systematically better or worse:

Bet Group	N	CLV	Entry-adj ROI
Added (shot-only finds)	613	+11.7%	+3.8%
Removed (base-only finds)	1,675	+11.9%	+5.6%
Shared (both agree)	14,492	—	—

The removed bets actually have slightly better entry-adj ROI than the added ones. Shot-level xG isn't finding *better* regression candidates — it's finding *different* ones of roughly equal quality.

Per-league breakdown

A few leagues show real movement but it's mixed:

League	Delta ROI	Direction
Brazil A	+2.4%	Better
Scottish Prem	+1.5%	Better
Greek Super	+1.0%	Better
National League	+0.9%	Better
UCL	-1.4%	Worse
EPL	-1.1%	Worse
Ligue 1	-1.1%	Worse
League Two	-0.8%	Worse

No systematic pattern. The winners and losers roughly cancel out.

By season

Season	Delta ROI
2022 (calendar)	+4.2%
2015-16	+1.8%
2025 (calendar)	+1.8%
2024 (calendar)	-2.3%
2017-18	-0.8%

Also mixed with no trend over time.

Why It Didn't Work

The mechanism sounded right but the implementation couldn't capture it. Three reasons:

10-match window is too coarse. The variance filter uses a sliding window of 10 matches. At this granularity, the difference between shot-level and match-level xG averages out. Both produce similar 10-match expectedGF sums. The theoretical advantage of shot-level xG (removing finishing luck) only matters if finishing luck is persistent within a 10-match window — and at the team level, it mostly isn't.

Match-level xG already captures enough. FootyStats match-level xG has correlation 0.35 with goals — "noisy" compared to shot-level. But for regression detection, what matters is the *gap* between expected and actual. A noisier xG estimate means bigger gaps, which means more aggressive regression flags. The noise isn't random — it's correlated with the factors (shot quality, finishing variance) that drive regression.

The 82.1% vs 75.6% offline validation didn't translate. The offline test measured regression *accuracy* (did the flagged team regress?). The backtest measures *profitability* (did betting on that regression make money?). Accuracy and profitability are different — a more accurate filter that removes profitable bets while adding unprofitable ones is worse for the portfolio even if its regression predictions are more correct.

What This Means

The fotmob-shot-xg-variance signal is parked permanently. We tested it twice:

Once with 1 season of shot data → +0.1% marginal
Once with 3-4 seasons of shot data → +0.1% marginal

The data volume hypothesis is falsified. There's no iteration path that changes the fundamental problem: at the 10-match window granularity, shot-level and match-level xG are functionally equivalent for regression detection.

The shot data itself remains valuable for other purposes (finishing persistence research, set piece analysis, xG model training). But as a variance regression input, it doesn't move the needle.

What's Next

Nothing for this signal. It joins the graveyard.

The broader lesson: higher-precision inputs don't automatically produce better outputs when the downstream filter is coarse. The variance regression filter's 10-match window and 3.0 gap threshold are too blunt to benefit from shot-level granularity. If we ever revisit variance regression methodology (continuous scoring instead of binary threshold, match-weighted windows, team-specific baselines), shot-level xG could be re-evaluated — but that's a different signal entirely.

REJECTEDSignal: fotmob-shot-xg-variance|2026-04-07