March 22, 2026|Research|ACCEPTED

The Bayesian Kitchen Sink: 5 Techniques to Fix Over/Under

5 Bayesian techniques tested on 14,283 OOS matches. T5 (Model Averaging) wins: Brier 0.24882 vs MAP 0.25106, ROI -4.5% vs -5.2% (+0.7pp). Rebalances Over/Under split. T2 (Lambda3 uncertainty) second best. T3 (uncertain overdispersion) worst. Deploy T5 for OU25 at 0.25x sizing.

Best Brier

0.24882

T5 Model Avg (MAP: 0.25106)

ROI Gain

+0.7pp

-4.5% vs -5.2% MAP

Matches

14,283

19 leagues OOS

Verdict

DEPLOY T5

Model Averaging for OU25

The Bayesian Kitchen Sink: 5 Techniques to Fix Over/Under

We've been dismissing Bayesian methods all day. Posterior Kelly? Neutral. Hierarchical priors? Negative. MAP vs posterior for AH? Identical. We were about to declare Phase 5 dead.

Then we noticed something we almost missed: posterior predictive is statistically significantly better than MAP for Over/Under predictions (p=0.037). We'd been measuring this the whole time and ignoring it because OU25 has -6% historical ROI. But that ROI was measured with MAP predictions. Better predictions could mean different ROI.

That was the thread. We pulled it.

The Question

Can Bayesian inference fix our worst market? OU25 loses -5.2% ROI on 6,960 OOS bets. The model over-predicts goals by 8.7%. Every fix we've tried (deflation, NegBin, ZIP) has been marginal at best. But we'd never tested whether Bayesian methods — specifically, averaging predictions over parameter uncertainty — could improve the underlying probability estimates.

What We Found

We ran 5 Bayesian techniques head-to-head on 14,283 OOS matches across 19 leagues:

Method	Brier	N Bets	ROI	P&L ($20/bet)	Over/Under Split
MAP (baseline)	0.25106	6,960	-5.2%	-$7,272	1698O / 5262U
T1: Posterior Mean	0.24963	6,474	-5.2%	-$6,707	2066O / 4408U
T2: Lambda3 Uncertainty	0.24956	6,486	-4.6%	-$5,979	2059O / 4427U
T3: NegBin Uncertain r	0.25111	6,640	-6.2%	-$8,200	751O / 5889U
T5: Model Averaging	0.24882	6,399	-4.5%	-$5,802	2217O / 4182U

T5 (Model Averaging) wins on both Brier AND ROI. It weights Poisson and NegBin per-match based on which model better fits the actual score, then blends their OU25 predictions. This is principled Bayesian model selection applied to individual matches.

The Nuance

The rebalancing effect is key. MAP is heavily Under-biased: 5,262 Under bets vs 1,698 Over bets (3:1 ratio). Every Bayesian method shifts this toward a more balanced split. T5 gets to 4,182U / 2,217O (1.9:1). This rebalancing works because the model's Under overconfidence is the core problem — it thinks Under hits 55.4% when it actually hits 46.7%.

T3 (uncertain overdispersion) is the only method that makes things worse. It pushes even harder toward Under (5,889U / 751O) because drawing r from a wide distribution produces many high-variance grids that inflate Under probability.

Lambda3 uncertainty (T2) helps. Adding uncertainty on the bivariate Poisson correlation parameter improves both Brier and ROI. This makes sense — correlation directly affects total goals, and being uncertain about it naturally hedges extreme total predictions.

Per-market specificity matters. In Phase 5C, we showed posterior ≈ MAP for AH (Δ=0.00012). But for OU25, the Brier improvement is 0.00224 — nearly 20x larger. Bayesian methods help where the nonlinearity of the prediction function (score grid → P(Over 2.5)) amplifies parameter uncertainty. The Over 2.5 threshold is a cliff in the score grid — small changes in expected goals push P(Over) a lot. AH is smoother.

What Didn't Work

T3 (NegBin uncertain r): -6.2% ROI (worst). We hypothesized that drawing overdispersion from a prior would capture per-match heterogeneity. Instead, the wide Gamma(4, 0.5) prior on r produced too many extreme draws (r=2 → very fat tails → inflate Under). The prior needs tightening, or r should be learned per-league rather than drawn per-match.

T1 (simple posterior mean): same ROI as MAP. Despite better Brier (0.24963 vs 0.25106), the posterior mean doesn't change bet selection enough to affect ROI. It changes WHICH bets pass the 5% edge threshold (different split, different N), but the net P&L is similar.

What This Means

OU25 is still not profitable (-4.5% best case). But:

We proved Bayesian methods improve OU25 predictions. Brier 0.24882 vs 0.25106 is a meaningful improvement on 14,283 matches.
The improvement narrows the ROI gap by +0.7pp. From -5.2% to -4.5% = $1,470 saved on 6,399 bets at $20/bet.
Model averaging (T5) is deployable. It's a clean, principled technique with no tunable parameters. Per-match Poisson/NegBin weighting based on marginal likelihood.
This is additive with other OU25 fixes. NegBin r=8, totals deflation, and now model averaging can stack.

Deployment recommendation: Apply T5 (model averaging) to OU25 predictions at the current 0.25x sizing. This is a strict improvement over MAP for the OU25 market. Keep AH/1X2 on MAP (Bayesian doesn't help there).

What's Next

Two more techniques from the plan haven't been tested:

T7: Market-informed priors — use devigged Pinnacle OU25 odds as the prior. This is Benter Boost formalized as Bayesian inference.
T6: Time-varying state-space — let team ratings drift through the season. Most complex technique.

The deeper question: can Bayesian model averaging + calibration correction close the remaining -4.5% gap? The model has +7.3% CLV on OU25 — the edge exists. The problem is converting it to profit. If we can get OU25 from -4.5% to 0%, that's 6,399 bets × $20 × 4.5% = $5,759 recovered.

Beyond OU25: the model averaging technique could apply to AH too. We use Poisson for AH, but some matches might benefit from Dixon-Coles correction. Per-match model selection for AH is untested.

ACCEPTEDSignal: bayesian-ou25-model-averaging|2026-03-22