Sports Dashboard

MI Bivariate Poisson + Dixon-Coles + Elo

← Back to Blog
|Model Architecture|ACCEPTED

Trust the Market, Not the Scoreboard: How Removing Match Data Made the Model Better

We removed outcome prediction and xG fitting from the solver loss function. Validated on holdout (+0.45pp, 3/3 years) and production (26 leagues, +0.79pp, +118u saved, 12/19 improve). The solver produces better calibrated probabilities when fitting ONLY to Pinnacle odds. First structural model change to improve AH ROI.

ROI Delta
+1.05pp
passes playbook
CLV
+10.06%
at 5% edge filter
P&L Saved
+89u
32% less losses
Status
Holdout Next
4/4 criteria pass

Trust the Market, Not the Scoreboard: How Removing Match Data Made the Model Better

We removed outcome prediction and xG fitting from the solver's loss function. AH ROI improved by +1.05pp. CLV went up. Every year improved. This is the first structural change to pass all 4 playbook criteria.

The Question

The MI Bivariate Poisson solver fits team ratings by minimizing a combined loss function with 4 components: Pinnacle odds divergence (KL), AH market fit, outcome prediction, and xG prediction. The last two make the solver fit to what actually happened in matches -- noisy data that Pinnacle has already priced in.

Phase 1 of the playbook (RFB/decay sweep) showed the solver's form tracking is already optimal. Phase 2 asks: what if the solver is fitting to too much noise? What if the match results and xG data are HURTING calibration by pulling the solver away from what the market knows?

What We Found

Removing all match-result data from the loss function improves every metric.

ConfigAH ROICLVNDelta
Baseline (outcome=0.3, xg=0.2)-3.29%+6.22%8,568--
low-outcome (0.1, 0.2)-3.27%+6.26%8,583+0.02pp
no-outcome (0.0, 0.2)-3.66%+6.32%8,600-0.37pp
low-xg (0.3, 0.1)-2.88%+6.26%8,555+0.40pp
**market-only (0.0, 0.0)****-2.24%****+6.82%****8,607****+1.05pp**

Playbook criteria check:

CriterionResultStatus
ROI >= +1pp+1.05ppPASS
CLV >= 9% (filtered)10.06%PASS
>= 3/4 years improve3/3PASS
Bootstrap p < 0.100.0965PASS

The Nuance

The Gradient Tells a Story

Removing xG alone helps (+0.40pp). Removing outcomes alone is neutral (+0.02pp). But removing both is superadditive (+1.05pp > 0.40 + 0.02). This means the two noise sources interact -- the solver uses xG to anchor ratings, then outcomes to validate them, and the combination pulls it further from market truth than either alone.

Per-Year: 2024 Benefits Most

YearBaselinemarket-onlyDelta
2022-3.54%-2.71%+0.84pp
2023-1.63%-1.57%+0.06pp
2024-4.70%-2.45%+2.25pp

2024 had the worst baseline performance (-4.70%) and gets the biggest improvement (+2.25pp). This makes sense: 2024 had the most within-season collapses (Rangers, Barcelona, Sevilla), and those collapses show up in outcomes and xG but NOT in Pinnacle odds (which adjust instantly). By removing the lagging data sources, the solver doesn't get misled by stale form signals.

Per-League: 6/10 Improve

Belgian-pro (+6.32pp), serie-a (+4.17pp), and league-one (+3.62pp) see the largest gains. EPL (-3.96pp) and segunda (-2.77pp) worsen -- these may be leagues where outcomes/xG add genuine signal that the market misses.

The Profit Picture

Still negative: -192.5u vs -281.7u baseline. But that's +89.2u saved -- a 32% reduction in losses. The model is still underwater, but significantly less so.

What Didn't Work

The earlier phases this session:

  • RFB/decay sweep: 5 configs, baseline wins. Form weighting already optimal.
  • Edge shrinkage: WRONG DIRECTION. Higher edge thresholds make ROI monotonically worse.

Both of those tested the wrong layer. The loss weight sweep tests the solver's training objective -- a deeper architectural change.

What This Means

Why It Works: Information Hierarchy

Pinnacle odds are the most efficient predictor of match outcomes. They incorporate team form, injuries, suspensions, weather, market sentiment, and insider information -- all in real-time. Our solver can extract the market's implied team ratings without ever seeing a match result.

When we add outcomes and xG, we're asking the solver to also predict noisy data that Pinnacle already incorporates. This pulls the solver toward overfitting to recent results (the exact problem we tried to fix with RFB/decay). The market-only config lets the solver focus on what it does best: decomposing market prices into team-level parameters.

Deployment Status: PENDING HOLDOUT

This is NOT deployed yet. The 9-league holdout set (league-two, national-league, ligue-2, ligue-1, bundesliga, bundesliga-2, portuguese-liga, scottish-prem, greek-super) has never seen these configs. If market-only shows directional consistency on holdout, it becomes the new production config.

What's Next

  1. Holdout validation -- run market-only on 9 holdout leagues. Pass criteria: same-direction ROI improvement, effect within 2x of dev set.
  2. If holdout passes: full 26-league production baseline with outcomeWeight=0, xgWeight=0.
  3. Phase 3 of playbook: re-run top signals through 10-gate process with the improved base model. With -2.24% base ROI instead of -3.29%, signals adding +0.91pp might finally pass walk-forward.
  4. Investigate EPL/segunda regression -- these leagues may benefit from outcome/xG data. Could do per-league weight optimization later.
ACCEPTEDSignal: loss-weight-sweep|2026-03-19