April 15, 2026|Signal Test|REJECTED

Testing Shot-Level xG as a Variance Filter Input: Why One Season Wasn't Enough

Sofascore v3 shot-level xG scored 368K shots across 24 leagues, but the A/B test against baseline showed zero marginal impact (+0.0% CLV, +0.1% entry-adj ROI). Root cause: only 1 of 12 backtest seasons affected. Infrastructure stays; retest after 2+ seasons.

Marginal CLV

+0.0%

no detectable signal

Bets

18,750

vs 19,703 baseline

Coverage

1/12

seasons affected

Testing Shot-Level xG as a Variance Filter Input: Why One Season Wasn't Enough

We built a v3 xG model, scored 368K Sofascore shots across 24 leagues, and deployed match-level xG files for 19 non-Big-5 leagues that previously had none. The question: does having proper shot-level xG improve betting outcomes?

The Question

The variance filter detects teams over/underperforming relative to expected goals. For Big-5 leagues, it uses FotMob/Understat match-level xG. For non-Big-5 leagues, it previously fell back to MI model lambdas (derived from devigged odds) — a weaker signal.

We now have actual shot-level xG from our v3 model for these 19 leagues. The hypothesis: replacing the lambda fallback with real xG should produce better regression candidates and better bets.

What We Found

Result: REJECTED — zero detectable signal.

Metric	Baseline	Shot xG	Delta
Bets	19,703	18,750	-953
CLV	+11.7%	+11.7%	+0.0%
Closing ROI	-3.4%	-3.3%	+0.1%
Entry-adj ROI	+6.9%	+6.9%	+0.1%

The shot-level xG source reshuffled 3,221 bets (added 1,134, removed 2,087) but produced identical aggregate performance. The added bets had +12.3% CLV (slightly above baseline), the removed bets had +11.6% CLV (slightly below) — both within noise.

The Nuance

This isn't really a test of "does shot-level xG help?" — it's a test of "does shot-level xG help when it only covers 8% of the backtest window?"

The v3 xG files cover 2025-26 only. The backtest spans 2014-2026. Only the final season is affected. In the 11 prior seasons, both configs are identical. The marginal signal is diluted across 12 seasons of data where nothing changed.

This is a data coverage problem, not a signal quality problem. The earlier FotMob A/B test (which had multi-season coverage) was rejected for a different reason: better xG regression detection didn't translate to better bets. That finding stands. But this test can't confirm or deny whether Sofascore v3 xG specifically helps non-Big-5 leagues because the coverage window is too narrow.

What Didn't Work

We also tried three other paths to improve the variance filter in this sprint:

Path A (solver priors): Dead. Market odds already embed the best available team strength signal. Elo warm-starts made predictions worse.
Path B (Marcel early-season): Dead. 0.0pp marginal — Marcel data covers 2/12 seasons, same dilution problem.
H9 (finishing multiplier xG): Overfits in walk-forward. Low-confidence multipliers helped more than high-confidence ones — classic in-sample artifact.

What This Means

The v3 xG deployment stays in place — it's correct infrastructure even if we can't measure its betting impact yet. The daily cron will keep the files current. As more seasons accumulate, the coverage fraction grows and the signal (if it exists) becomes detectable.

Not deployed. No changes to the variance filter configuration. Shot-level xG remains available but not wired as the primary variance source.

What's Next

Wire the Sofascore v3 scorer into the daily cron so match-xG files stay current
Retest after 2+ full seasons of coverage (summer 2027)
Focus shifts to Marcel Phase D/E (injury impact) — context for human decision-making on /picks, not a signal that needs gate approval

REJECTEDSignal: fotmob-shot-xg-variance|2026-04-15