March 19, 2026|Research|WRONG-DIRECTION

The Model's Best Bets Are Its Worst: Why Edge Shrinkage Failed

We tested edge shrinkage and minimum edge thresholds to filter overconfident bets. Every threshold made ROI worse -- monotonically. minEdge=10% costs -3.20pp vs baseline. The model's largest edges are its most overconfident. Wrong-direction discovery: max-edge-cap registered as reversed hypothesis.

Direction

WRONG

higher edge = worse ROI

minEdge=10%

-6.49%

vs baseline -3.29%

AH Bets

8,568

10 dev leagues

Discovery

max-edge-cap

reversed hypothesis

The Model's Best Bets Are Its Worst: Why Edge Shrinkage Failed

We tested whether filtering for higher-confidence bets would improve ROI. The result was the exact opposite: the model's largest edges are its most overconfident predictions. Every edge threshold we tested made ROI worse.

The Question

The model has +6.2% CLV but -3.3% AH ROI. Prior exploratory work suggested multiplying edges by 0.4 (shrinkage) could improve ROI from 2.5% to 4.0% by discounting overconfident predictions. We built the infrastructure (--edge-shrink and --min-edge flags) and ran a rigorous 5-config sweep on the stratified 10-league dev set.

The hypothesis: higher-edge bets should be more profitable because the model is more confident. Filtering out low-edge bets should concentrate capital on winners.

What We Found

Every edge threshold makes ROI worse. Monotonically.

Min Edge	AH ROI	N	vs Baseline
0% (baseline)	-3.29%	8,568	--
5%	-4.18%	4,554	-0.89pp
7%	-4.74%	3,233	-1.45pp
10%	-6.49%	1,698	-3.20pp
shrink 0.4 + 5%	-9.32%	840	-6.03pp

CLV rises mechanically (you're selecting the highest-edge subset), but ROI falls because those bets lose more. The calibration gap -- the distance between what the model thinks and reality -- widens as edge increases.

The Nuance

The Calibration Gap Widens With Confidence

Subset	CLV	ROI	CalGap (CLV - ROI)
All bets	+6.2%	-3.3%	9.5pp
Edge >= 5%	+9.5%	-4.2%	13.7pp
Edge >= 10%	+13.3%	-6.5%	19.8pp

The model gets MORE wrong as it gets MORE confident. A 10% model edge delivers -6.5% ROI -- the market is efficiently pricing away exactly the predictions the model is most confident about.

Why The Prior Result Was An Artifact

The exploratory calibrate-edge-shrinkage.ts showed shrinkage x0.4 improving ROI from 2.5% to 4.0%. This used a temporal train/validate/test split on all leagues combined. Our rigorous framework uses stratified league-level dev/holdout with bootstrap paired differences and Holm-Bonferroni correction.

The discrepancy comes from:

Different baseline (prior work may have already had edge filtering)
Selection bias (optimizing shrinkage factor on evaluation data)
League confounding (temporal split doesn't control for league-specific effects)

This is why the rigorous framework exists. The exploratory result was overfit.

What Didn't Work

We also tested edge-level shrinkage (multiply CLV by alpha before bet selection). With a threshold of 0, shrinkage is a pure no-op -- it scales values but doesn't change which bets are selected. This was a design error in the first implementation attempt that we caught immediately. The fix was combining shrinkage with a minimum edge threshold.

Even after fixing the implementation, combined shrinkage + threshold (e.g., shrink=0.4 with minEdge=3%) performed at -4.70% -- still worse than baseline.

What This Means

The model's calibration error scales with confidence. Small edges (~1-3%) are slightly overconfident. Large edges (~10%+) are massively overconfident. The Poisson grid assigns too much probability to extreme outcomes.

Minimum edge thresholds in production are HARMFUL. If production uses minEdge=7%, it's actively filtering toward the worst bets. The baseline with minEdge=0% is better.

The reversed hypothesis is promising. If high-edge bets are the worst, then CAPPING maximum edge (removing bets where edge > X%) should improve ROI by eliminating the overconfident tail. This is registered as max-edge-cap.

The prior shrinkage result was selection bias. This validates the decision to build rigorous dev/holdout infrastructure before trusting any parameter optimization.

What's Next

Test `max-edge-cap` -- the reversed hypothesis. Cap edge at 15%, 12%, 10% and measure ROI improvement.
Quarter-line routing -- quarter-lines are +4.3% ROI vs whole-lines at -7.8%. This structural feature is orthogonal to edge calibration.
Investigate why the Poisson grid overestimates extreme edges -- this is likely the 1-goal margin underprediction identified in the line mispricing taxonomy.

WRONG-DIRECTIONSignal: edge-shrinkage-sweep|2026-03-19