Trust the Market, Not the Scoreboard: How Removing Match Data Made the Model Better
We removed outcome prediction and xG fitting from the solver loss function. Validated on holdout (+0.45pp, 3/3 years) and production (26 leagues, +0.79pp, +118u saved, 12/19 improve). The solver produces better calibrated probabilities when fitting ONLY to Pinnacle odds. First structural model change to improve AH ROI.
Trust the Market, Not the Scoreboard: How Removing Match Data Made the Model Better
We removed outcome prediction and xG fitting from the solver's loss function. AH ROI improved by +1.05pp. CLV went up. Every year improved. This is the first structural change to pass all 4 playbook criteria.
The Question
The MI Bivariate Poisson solver fits team ratings by minimizing a combined loss function with 4 components: Pinnacle odds divergence (KL), AH market fit, outcome prediction, and xG prediction. The last two make the solver fit to what actually happened in matches -- noisy data that Pinnacle has already priced in.
Phase 1 of the playbook (RFB/decay sweep) showed the solver's form tracking is already optimal. Phase 2 asks: what if the solver is fitting to too much noise? What if the match results and xG data are HURTING calibration by pulling the solver away from what the market knows?
What We Found
Removing all match-result data from the loss function improves every metric.
| Config | AH ROI | CLV | N | Delta |
|---|---|---|---|---|
| Baseline (outcome=0.3, xg=0.2) | -3.29% | +6.22% | 8,568 | -- |
| low-outcome (0.1, 0.2) | -3.27% | +6.26% | 8,583 | +0.02pp |
| no-outcome (0.0, 0.2) | -3.66% | +6.32% | 8,600 | -0.37pp |
| low-xg (0.3, 0.1) | -2.88% | +6.26% | 8,555 | +0.40pp |
| **market-only (0.0, 0.0)** | **-2.24%** | **+6.82%** | **8,607** | **+1.05pp** |
Playbook criteria check:
| Criterion | Result | Status |
|---|---|---|
| ROI >= +1pp | +1.05pp | PASS |
| CLV >= 9% (filtered) | 10.06% | PASS |
| >= 3/4 years improve | 3/3 | PASS |
| Bootstrap p < 0.10 | 0.0965 | PASS |
The Nuance
The Gradient Tells a Story
Removing xG alone helps (+0.40pp). Removing outcomes alone is neutral (+0.02pp). But removing both is superadditive (+1.05pp > 0.40 + 0.02). This means the two noise sources interact -- the solver uses xG to anchor ratings, then outcomes to validate them, and the combination pulls it further from market truth than either alone.
Per-Year: 2024 Benefits Most
| Year | Baseline | market-only | Delta |
|---|---|---|---|
| 2022 | -3.54% | -2.71% | +0.84pp |
| 2023 | -1.63% | -1.57% | +0.06pp |
| 2024 | -4.70% | -2.45% | +2.25pp |
2024 had the worst baseline performance (-4.70%) and gets the biggest improvement (+2.25pp). This makes sense: 2024 had the most within-season collapses (Rangers, Barcelona, Sevilla), and those collapses show up in outcomes and xG but NOT in Pinnacle odds (which adjust instantly). By removing the lagging data sources, the solver doesn't get misled by stale form signals.
Per-League: 6/10 Improve
Belgian-pro (+6.32pp), serie-a (+4.17pp), and league-one (+3.62pp) see the largest gains. EPL (-3.96pp) and segunda (-2.77pp) worsen -- these may be leagues where outcomes/xG add genuine signal that the market misses.
The Profit Picture
Still negative: -192.5u vs -281.7u baseline. But that's +89.2u saved -- a 32% reduction in losses. The model is still underwater, but significantly less so.
What Didn't Work
The earlier phases this session:
- RFB/decay sweep: 5 configs, baseline wins. Form weighting already optimal.
- Edge shrinkage: WRONG DIRECTION. Higher edge thresholds make ROI monotonically worse.
Both of those tested the wrong layer. The loss weight sweep tests the solver's training objective -- a deeper architectural change.
What This Means
Why It Works: Information Hierarchy
Pinnacle odds are the most efficient predictor of match outcomes. They incorporate team form, injuries, suspensions, weather, market sentiment, and insider information -- all in real-time. Our solver can extract the market's implied team ratings without ever seeing a match result.
When we add outcomes and xG, we're asking the solver to also predict noisy data that Pinnacle already incorporates. This pulls the solver toward overfitting to recent results (the exact problem we tried to fix with RFB/decay). The market-only config lets the solver focus on what it does best: decomposing market prices into team-level parameters.
Deployment Status: PENDING HOLDOUT
This is NOT deployed yet. The 9-league holdout set (league-two, national-league, ligue-2, ligue-1, bundesliga, bundesliga-2, portuguese-liga, scottish-prem, greek-super) has never seen these configs. If market-only shows directional consistency on holdout, it becomes the new production config.
What's Next
- Holdout validation -- run market-only on 9 holdout leagues. Pass criteria: same-direction ROI improvement, effect within 2x of dev set.
- If holdout passes: full 26-league production baseline with outcomeWeight=0, xgWeight=0.
- Phase 3 of playbook: re-run top signals through 10-gate process with the improved base model. With -2.24% base ROI instead of -3.29%, signals adding +0.91pp might finally pass walk-forward.
- Investigate EPL/segunda regression -- these leagues may benefit from outcome/xG data. Could do per-league weight optimization later.