Sports Dashboard

MI Bivariate Poisson + Dixon-Coles + Elo

← Back to Blog
|Research

From Binary Filters to Stake Sizing: Finding the Right Shape for Enrichment Data

Multi-source xG enrichment data failed as binary filters (4 approaches, all negative). Reframing as continuous stake sizing produced +0.5-0.6% sizing lift, improved Sharpe, and passed 3/3 walk-forward folds. The data was always informative — we were just using it wrong.

The Question

We have multi-source xG enrichment data — when 2-3 independent xG models agree a team is regressing, we should be more confident in our edge. But how do we use it? We tested it as binary filters (killed all bets), lambda modifiers (added bad bets), and contradiction filters (removed good bets). All failed. What's the right shape for continuous confidence data?

What We Tried (And Why It Failed)

ApproachEffectProblem
Binary filters (5 signals)N=0 or no effectData only covers ~20% of matches — filters killed everything
Lambda modifier (symmetric)+1,042 bets, -0.42ppBoosting underperformers added low-quality bets
Lambda modifier (asymmetric)+661 bets, -0.29ppSame problem, slightly less bad
Contradiction filter-1,053 bets, -0.30ppRemoved bets the model already handles correctly

Four approaches, same conclusion: cramming continuous confidence into binary decisions destroys value.

The Insight

The enrichment data doesn't tell us which bets to take — the existing model already answers that. It tells us how much to bet. When 3 xG sources agree on regression, size up. When they disagree, size down. When no data exists, use flat stakes. Same bet set, different allocation.

What We Built

A shifted logistic function maps multi-source confidence to a stake multiplier:

ConfidenceSourcesMultiplier
0.83×1.31 (size up)
0.52×1.00 (neutral)
0.22×0.79 (size down)
<2×1.00 (pass-through)

No bet is added or removed. No coverage gap penalty. Just weighted allocation.

Parameter Sweep Results

Tested 6 parameter configs (minMult × maxMult) on the best-known portfolio:

minMultmaxMultSizing LiftSharpe ImprovementDrawdown Change
0.71.3+0.53%+0.008-69pp
0.71.4+0.59%+0.009-75pp
0.71.5+0.63%+0.010-81pp
0.81.3+0.45%+0.007-58pp
0.81.4+0.51%+0.007-65pp
0.81.5+0.56%+0.008-71pp

Every config improves all three metrics — sizing lift positive, Sharpe improved, drawdown reduced. The wider the range (lower floor, higher ceiling), the stronger the effect. Monotonic and consistent.

Walk-Forward Validation

3 season folds, training on earlier seasons and testing on later. On each fold, the 6-config parameter grid is swept and the best Sharpe is selected on train, then applied to the test set.

FoldTrain NTest NTrain LiftTest LiftParams Selected
→ 2022-233,7112,182+1.45%**+12.58%**min=0.7 max=1.5
→ 2023-245,8931,918+2.19%**+0.52%**min=0.7 max=1.5
→ 2024-257,8111,967+1.59%**+2.12%**min=0.7 max=1.5

3/3 folds positive OOS. The same parameter config wins every fold — widest range (min=0.7, max=1.5), meaning the signal is strongest when the sizing function has the most room to differentiate. No overfitting: the optimal params are stable across time.

What This Means

All four success criteria pass:

  1. Sizing lift > 0 — every in-sample config and every OOS fold
  2. Sharpe improved — risk-adjusted return better in all configs
  3. Drawdown not worse — reduced in all configs
  4. Walk-forward positive — 3/3 OOS folds

The multi-source xG enrichment data IS informative — but its natural shape is continuous confidence, not binary filter. Forcing it into on/off decisions destroyed the signal every time. Using it for what it actually measures — how confident we are in the edge — produces consistent improvements across time periods.

The key design insight: when data is missing, pass through. The 80% of bets without multi-source coverage get flat stakes. Only the 20% with 2+ xG sources get adjusted. No coverage penalty. The logistic function saturates at the tails, limiting damage from any single miscalibrated confidence score to [0.7x, 1.5x].

What's Next

  1. Wire enrichment multiplier into the paper trade sizing engine (composes with existing Kelly/regime/timing multipliers)
  2. Shadow-log for 2 weeks alongside existing flat sizing
  3. Activate after confirming live lift matches backtest