March 19, 2026|Model Architecture|ACCEPTED

Trust the Market, Not the Scoreboard: How Removing Match Data Made the Model Better

We removed outcome prediction and xG fitting from the solver loss function. Validated on holdout (+0.45pp, 3/3 years) and production (26 leagues, +0.79pp, +118u saved, 12/19 improve). The solver produces better calibrated probabilities when fitting ONLY to Pinnacle odds. First structural model change to improve AH ROI.

ROI Delta

+1.05pp

passes playbook

CLV

+10.06%

at 5% edge filter

P&L Saved

+89u

32% less losses

Status

Holdout Next

4/4 criteria pass

Trust the Market, Not the Scoreboard: How Removing Match Data Made the Model Better

We removed outcome prediction and xG fitting from the solver's loss function. AH ROI improved by +1.05pp. CLV went up. Every year improved. This is the first structural change to pass all 4 playbook criteria.

The Question

The MI Bivariate Poisson solver fits team ratings by minimizing a combined loss function with 4 components: Pinnacle odds divergence (KL), AH market fit, outcome prediction, and xG prediction. The last two make the solver fit to what actually happened in matches -- noisy data that Pinnacle has already priced in.

Phase 1 of the playbook (RFB/decay sweep) showed the solver's form tracking is already optimal. Phase 2 asks: what if the solver is fitting to too much noise? What if the match results and xG data are HURTING calibration by pulling the solver away from what the market knows?

What We Found

Removing all match-result data from the loss function improves every metric.

Config	AH ROI	CLV	N	Delta
Baseline (outcome=0.3, xg=0.2)	-3.29%	+6.22%	8,568	--
low-outcome (0.1, 0.2)	-3.27%	+6.26%	8,583	+0.02pp
no-outcome (0.0, 0.2)	-3.66%	+6.32%	8,600	-0.37pp
low-xg (0.3, 0.1)	-2.88%	+6.26%	8,555	+0.40pp
market-only (0.0, 0.0)	-2.24%	+6.82%	8,607	+1.05pp

Playbook criteria check:

Criterion	Result	Status
ROI >= +1pp	+1.05pp	PASS
CLV >= 9% (filtered)	10.06%	PASS
>= 3/4 years improve	3/3	PASS
Bootstrap p < 0.10	0.0965	PASS

The Nuance

The Gradient Tells a Story

Removing xG alone helps (+0.40pp). Removing outcomes alone is neutral (+0.02pp). But removing both is superadditive (+1.05pp > 0.40 + 0.02). This means the two noise sources interact -- the solver uses xG to anchor ratings, then outcomes to validate them, and the combination pulls it further from market truth than either alone.

Per-Year: 2024 Benefits Most

Year	Baseline	market-only	Delta
2022	-3.54%	-2.71%	+0.84pp
2023	-1.63%	-1.57%	+0.06pp
2024	-4.70%	-2.45%	+2.25pp

2024 had the worst baseline performance (-4.70%) and gets the biggest improvement (+2.25pp). This makes sense: 2024 had the most within-season collapses (Rangers, Barcelona, Sevilla), and those collapses show up in outcomes and xG but NOT in Pinnacle odds (which adjust instantly). By removing the lagging data sources, the solver doesn't get misled by stale form signals.

Per-League: 6/10 Improve

Belgian-pro (+6.32pp), serie-a (+4.17pp), and league-one (+3.62pp) see the largest gains. EPL (-3.96pp) and segunda (-2.77pp) worsen -- these may be leagues where outcomes/xG add genuine signal that the market misses.

The Profit Picture

Still negative: -192.5u vs -281.7u baseline. But that's +89.2u saved -- a 32% reduction in losses. The model is still underwater, but significantly less so.

What Didn't Work

The earlier phases this session:

RFB/decay sweep: 5 configs, baseline wins. Form weighting already optimal.
Edge shrinkage: WRONG DIRECTION. Higher edge thresholds make ROI monotonically worse.

Both of those tested the wrong layer. The loss weight sweep tests the solver's training objective -- a deeper architectural change.

What This Means

Why It Works: Information Hierarchy

Pinnacle odds are the most efficient predictor of match outcomes. They incorporate team form, injuries, suspensions, weather, market sentiment, and insider information -- all in real-time. Our solver can extract the market's implied team ratings without ever seeing a match result.

When we add outcomes and xG, we're asking the solver to also predict noisy data that Pinnacle already incorporates. This pulls the solver toward overfitting to recent results (the exact problem we tried to fix with RFB/decay). The market-only config lets the solver focus on what it does best: decomposing market prices into team-level parameters.

Deployment Status: PENDING HOLDOUT

This is NOT deployed yet. The 9-league holdout set (league-two, national-league, ligue-2, ligue-1, bundesliga, bundesliga-2, portuguese-liga, scottish-prem, greek-super) has never seen these configs. If market-only shows directional consistency on holdout, it becomes the new production config.

What's Next

Holdout validation -- run market-only on 9 holdout leagues. Pass criteria: same-direction ROI improvement, effect within 2x of dev set.
If holdout passes: full 26-league production baseline with outcomeWeight=0, xgWeight=0.
Phase 3 of playbook: re-run top signals through 10-gate process with the improved base model. With -2.24% base ROI instead of -3.29%, signals adding +0.91pp might finally pass walk-forward.
Investigate EPL/segunda regression -- these leagues may benefit from outcome/xG data. Could do per-league weight optimization later.

ACCEPTEDSignal: loss-weight-sweep|2026-03-19