Sports Dashboard

MI Bivariate Poisson + Dixon-Coles + Elo

← Back to Blog
|Research|WRONG-DIRECTION

The Model's Best Bets Are Its Worst: Why Edge Shrinkage Failed

We tested edge shrinkage and minimum edge thresholds to filter overconfident bets. Every threshold made ROI worse -- monotonically. minEdge=10% costs -3.20pp vs baseline. The model's largest edges are its most overconfident. Wrong-direction discovery: max-edge-cap registered as reversed hypothesis.

Direction
WRONG
higher edge = worse ROI
minEdge=10%
-6.49%
vs baseline -3.29%
AH Bets
8,568
10 dev leagues
Discovery
max-edge-cap
reversed hypothesis

The Model's Best Bets Are Its Worst: Why Edge Shrinkage Failed

We tested whether filtering for higher-confidence bets would improve ROI. The result was the exact opposite: the model's largest edges are its most overconfident predictions. Every edge threshold we tested made ROI worse.

The Question

The model has +6.2% CLV but -3.3% AH ROI. Prior exploratory work suggested multiplying edges by 0.4 (shrinkage) could improve ROI from 2.5% to 4.0% by discounting overconfident predictions. We built the infrastructure (--edge-shrink and --min-edge flags) and ran a rigorous 5-config sweep on the stratified 10-league dev set.

The hypothesis: higher-edge bets should be more profitable because the model is more confident. Filtering out low-edge bets should concentrate capital on winners.

What We Found

Every edge threshold makes ROI worse. Monotonically.

Min EdgeAH ROINvs Baseline
0% (baseline)-3.29%8,568--
5%-4.18%4,554-0.89pp
7%-4.74%3,233-1.45pp
10%-6.49%1,698-3.20pp
shrink 0.4 + 5%-9.32%840-6.03pp

CLV rises mechanically (you're selecting the highest-edge subset), but ROI falls because those bets lose more. The calibration gap -- the distance between what the model thinks and reality -- widens as edge increases.

The Nuance

The Calibration Gap Widens With Confidence

SubsetCLVROICalGap (CLV - ROI)
All bets+6.2%-3.3%9.5pp
Edge >= 5%+9.5%-4.2%13.7pp
Edge >= 10%+13.3%-6.5%19.8pp

The model gets MORE wrong as it gets MORE confident. A 10% model edge delivers -6.5% ROI -- the market is efficiently pricing away exactly the predictions the model is most confident about.

Why The Prior Result Was An Artifact

The exploratory calibrate-edge-shrinkage.ts showed shrinkage x0.4 improving ROI from 2.5% to 4.0%. This used a temporal train/validate/test split on all leagues combined. Our rigorous framework uses stratified league-level dev/holdout with bootstrap paired differences and Holm-Bonferroni correction.

The discrepancy comes from:

  1. Different baseline (prior work may have already had edge filtering)
  2. Selection bias (optimizing shrinkage factor on evaluation data)
  3. League confounding (temporal split doesn't control for league-specific effects)

This is why the rigorous framework exists. The exploratory result was overfit.

What Didn't Work

We also tested edge-level shrinkage (multiply CLV by alpha before bet selection). With a threshold of 0, shrinkage is a pure no-op -- it scales values but doesn't change which bets are selected. This was a design error in the first implementation attempt that we caught immediately. The fix was combining shrinkage with a minimum edge threshold.

Even after fixing the implementation, combined shrinkage + threshold (e.g., shrink=0.4 with minEdge=3%) performed at -4.70% -- still worse than baseline.

What This Means

  1. The model's calibration error scales with confidence. Small edges (~1-3%) are slightly overconfident. Large edges (~10%+) are massively overconfident. The Poisson grid assigns too much probability to extreme outcomes.
  1. Minimum edge thresholds in production are HARMFUL. If production uses minEdge=7%, it's actively filtering toward the worst bets. The baseline with minEdge=0% is better.
  1. The reversed hypothesis is promising. If high-edge bets are the worst, then CAPPING maximum edge (removing bets where edge > X%) should improve ROI by eliminating the overconfident tail. This is registered as max-edge-cap.
  1. The prior shrinkage result was selection bias. This validates the decision to build rigorous dev/holdout infrastructure before trusting any parameter optimization.

What's Next

  1. Test `max-edge-cap` -- the reversed hypothesis. Cap edge at 15%, 12%, 10% and measure ROI improvement.
  2. Quarter-line routing -- quarter-lines are +4.3% ROI vs whole-lines at -7.8%. This structural feature is orthogonal to edge calibration.
  3. Investigate why the Poisson grid overestimates extreme edges -- this is likely the 1-goal margin underprediction identified in the line mispricing taxonomy.
WRONG-DIRECTIONSignal: edge-shrinkage-sweep|2026-03-19