Sports Dashboard

MI Bivariate Poisson + Dixon-Coles + Elo

← Back to Blog
|model-update|ACCEPTED

xG Weight A/B Test: +3.99pp OVERS CLV at xgWeight=0.2

Walk-forward backtest across 26 leagues confirms xgWeight=0.2 beats control on every market. OVERS +3.99pp, UNDERS +3.31pp, SIDES +1.18pp. Effect is monotonic — no plateau. Deploying to production.

The Question

The MI Bivariate Poisson solver has an xgWeight parameter controlling how much the walk-forward rating calibration leans on expected goals vs actual goals. At xgWeight=0, the solver ignores xG entirely — ratings come from match outcomes and odds. At xgWeight=0.2, each team's xG performance contributes 20% of the signal to the rating update.

We've always known xG contains information the market hasn't fully priced. The question was whether injecting it into the solver — rather than using it only as a downstream filter — would improve CLV, and if so, how much and at what weight.

The Test

Walk-forward backtest across all 26 leagues, 2014-2025, weekly re-solve, 3-day embargo, 5% edge threshold. Four treatments:

TreatmentxgWeightConfig hash
Control036f89051 / b49dca45
Treat C0.05f201fabc / fd86694a
Treat B0.1723e6b3b / a66774ed
Treat A0.290bcc4cf / 3be2a87d

Each treatment required its own full set of walk-forward solver snapshots — ~5,000-12,000 per treatment, generated on a dedicated cloud compute VM over 5 days. Each backtest evaluated ~258,000 bets per treatment.

The Results

CLV by market

MarketControlxg=0.05xg=0.1**xg=0.2**Best vs Control
SIDES+8.94%+9.11%+9.46%**+10.12%****+1.18pp**
UNDERS+12.61%+14.41%+15.22%**+15.92%****+3.31pp**
OVERS+10.66%+13.70%+14.18%**+14.65%****+3.99pp**
AH+9.97%+9.66%+9.63%+9.80%-0.17pp

xgWeight=0.2 wins on every market except AH (where all treatments are statistically indistinguishable). The effect is monotonically increasing — no plateau, no diminishing returns at 0.2.

What the numbers mean

+3.99pp on OVERS is the headline. The Control model found +10.66% CLV on Over 2.5 bets. With xgWeight=0.2, that jumps to +14.65% — a 37% relative improvement in edge detection on the same market, same odds, same time period.

+3.31pp on UNDERS and +1.18pp on SIDES are smaller but consistent. The xG signal helps across the board — it's not a one-market artifact.

AH is flat at ~9.7-10.0% CLV regardless of xgWeight. This makes sense: Asian Handicap is Pinnacle's deepest, sharpest market. The odds already impound most of the information that xG carries. The solver's xG contribution can't beat what Pinnacle's AH line already knows.

How xG changes what the model sees

The most interesting finding isn't the CLV delta — it's the shift in bet selection:

MarketControl betsxg=0.2 betsChange
SIDES64,13777,861+21% more
UNDERS46,62732,312**-31% fewer**
OVERS14,56331,027**+113% more**
AH17,97117,454flat

Control fires 3.2x more UNDERS than OVERS. With xgWeight=0.2, the ratio is nearly 1:1.

What's happening: without xG, the solver sees a team that won 1-0 as "good defense, mediocre attack." With xG, it sees "1-0 but their xG was 2.4 — this team is actually scoring below expectation." That shifts ratings toward what the team *should* be producing, not what they happened to produce. The downstream bet selection flips from Under (regression to low scoring) to Over (regression to high xG process).

Those newly-created Over bets carry +14.65% CLV. The displaced Under bets were at +12.61%. The xG signal isn't just finding more edge — it's finding it in a *better* place.

The Nuance

ROI is negative across all treatments (-3.6% to -5.7%). This is measuring against Pinnacle closing odds — the sharpest benchmark available. Negative ROI at Pinnacle is expected and unremarkable. Our earlier BetExplorer validation showed +20.1pp ROI premium at soft books (bet365, BetMGM, Stake) on the exact same model with the exact same bets. The CLV metric is what determines model quality; ROI is a function of execution venue.

The monotonic curve invites going higher. xgWeight=0.2 > 0.1 > 0.05 > 0 with no sign of flattening. Should we test 0.3 or 0.4? Probably — but with diminishing practical returns. The jump from 0 to 0.2 on OVERS CLV is +3.99pp. A hypothetical 0.2 to 0.4 jump, even if the curve stays linear, would be another ~4pp — meaningful but not transformational. We'll test it, but xg=0.2 is ready to deploy now.

26 leagues, not 6. Previous xG work was limited to Big 5 leagues (Understat coverage). This test runs across all 26 leagues, including ones where xG data comes from FotMob shot scraping and FootyStats aggregates. The signal helps everywhere — the +1.18pp SIDES lift includes non-Big-5 leagues where xG data quality is lower. Better xG data → bigger lift is a testable follow-up.

What We're Deploying

xgWeight=0.2 goes into production solver config. Every league's walk-forward solver re-calibration now includes 20% weight on xG performance. The change affects:

  • lib/mi-model/solver.ts — solver config default
  • data/mi-params/latest/*.json — live ratings (will shift on next solve)
  • /picks page — bet selections will shift toward more OVERS, fewer UNDERS, more SIDES edge

No disruption to paper trading — existing bets keep their original odds and edge. New bets from the updated solver will carry the higher CLV.

Infrastructure Note

This test required ~60,000 walk-forward solver snapshots across 4 treatments and 26 leagues. We ran into every failure mode a distributed compute system can produce: OOM cascades taking down production, auto-restart loops amplifying crashes, stale code on remote VMs, solver-cache key mismatches from data updates, and result files deleted by git clean operations.

The cloud-lab guardian system — a version-controlled roadmap + submit-time validator + auto-advance ticker — kept the work on track once we stopped trying to run heavy jobs on the production host. Full retro in docs/specs/compute-host-isolation-spec.md.