Testing Shot-Level xG as a Variance Filter Input: Why One Season Wasn't Enough
Sofascore v3 shot-level xG scored 368K shots across 24 leagues, but the A/B test against baseline showed zero marginal impact (+0.0% CLV, +0.1% entry-adj ROI). Root cause: only 1 of 12 backtest seasons affected. Infrastructure stays; retest after 2+ seasons.
Testing Shot-Level xG as a Variance Filter Input: Why One Season Wasn't Enough
We built a v3 xG model, scored 368K Sofascore shots across 24 leagues, and deployed match-level xG files for 19 non-Big-5 leagues that previously had none. The question: does having proper shot-level xG improve betting outcomes?
The Question
The variance filter detects teams over/underperforming relative to expected goals. For Big-5 leagues, it uses FotMob/Understat match-level xG. For non-Big-5 leagues, it previously fell back to MI model lambdas (derived from devigged odds) — a weaker signal.
We now have actual shot-level xG from our v3 model for these 19 leagues. The hypothesis: replacing the lambda fallback with real xG should produce better regression candidates and better bets.
What We Found
Result: REJECTED — zero detectable signal.
| Metric | Baseline | Shot xG | Delta |
|---|---|---|---|
| Bets | 19,703 | 18,750 | -953 |
| CLV | +11.7% | +11.7% | +0.0% |
| Closing ROI | -3.4% | -3.3% | +0.1% |
| Entry-adj ROI | +6.9% | +6.9% | +0.1% |
The shot-level xG source reshuffled 3,221 bets (added 1,134, removed 2,087) but produced identical aggregate performance. The added bets had +12.3% CLV (slightly above baseline), the removed bets had +11.6% CLV (slightly below) — both within noise.
The Nuance
This isn't really a test of "does shot-level xG help?" — it's a test of "does shot-level xG help when it only covers 8% of the backtest window?"
The v3 xG files cover 2025-26 only. The backtest spans 2014-2026. Only the final season is affected. In the 11 prior seasons, both configs are identical. The marginal signal is diluted across 12 seasons of data where nothing changed.
This is a data coverage problem, not a signal quality problem. The earlier FotMob A/B test (which had multi-season coverage) was rejected for a different reason: better xG regression detection didn't translate to better bets. That finding stands. But this test can't confirm or deny whether Sofascore v3 xG specifically helps non-Big-5 leagues because the coverage window is too narrow.
What Didn't Work
We also tried three other paths to improve the variance filter in this sprint:
- Path A (solver priors): Dead. Market odds already embed the best available team strength signal. Elo warm-starts made predictions worse.
- Path B (Marcel early-season): Dead. 0.0pp marginal — Marcel data covers 2/12 seasons, same dilution problem.
- H9 (finishing multiplier xG): Overfits in walk-forward. Low-confidence multipliers helped more than high-confidence ones — classic in-sample artifact.
What This Means
The v3 xG deployment stays in place — it's correct infrastructure even if we can't measure its betting impact yet. The daily cron will keep the files current. As more seasons accumulate, the coverage fraction grows and the signal (if it exists) becomes detectable.
Not deployed. No changes to the variance filter configuration. Shot-level xG remains available but not wired as the primary variance source.
What's Next
- Wire the Sofascore v3 scorer into the daily cron so match-xG files stay current
- Retest after 2+ full seasons of coverage (summer 2027)
- Focus shifts to Marcel Phase D/E (injury impact) — context for human decision-making on /picks, not a signal that needs gate approval