The Model's Best Bets Are Its Worst: Why Edge Shrinkage Failed
We tested edge shrinkage and minimum edge thresholds to filter overconfident bets. Every threshold made ROI worse -- monotonically. minEdge=10% costs -3.20pp vs baseline. The model's largest edges are its most overconfident. Wrong-direction discovery: max-edge-cap registered as reversed hypothesis.
The Model's Best Bets Are Its Worst: Why Edge Shrinkage Failed
We tested whether filtering for higher-confidence bets would improve ROI. The result was the exact opposite: the model's largest edges are its most overconfident predictions. Every edge threshold we tested made ROI worse.
The Question
The model has +6.2% CLV but -3.3% AH ROI. Prior exploratory work suggested multiplying edges by 0.4 (shrinkage) could improve ROI from 2.5% to 4.0% by discounting overconfident predictions. We built the infrastructure (--edge-shrink and --min-edge flags) and ran a rigorous 5-config sweep on the stratified 10-league dev set.
The hypothesis: higher-edge bets should be more profitable because the model is more confident. Filtering out low-edge bets should concentrate capital on winners.
What We Found
Every edge threshold makes ROI worse. Monotonically.
| Min Edge | AH ROI | N | vs Baseline |
|---|---|---|---|
| 0% (baseline) | -3.29% | 8,568 | -- |
| 5% | -4.18% | 4,554 | -0.89pp |
| 7% | -4.74% | 3,233 | -1.45pp |
| 10% | -6.49% | 1,698 | -3.20pp |
| shrink 0.4 + 5% | -9.32% | 840 | -6.03pp |
CLV rises mechanically (you're selecting the highest-edge subset), but ROI falls because those bets lose more. The calibration gap -- the distance between what the model thinks and reality -- widens as edge increases.
The Nuance
The Calibration Gap Widens With Confidence
| Subset | CLV | ROI | CalGap (CLV - ROI) |
|---|---|---|---|
| All bets | +6.2% | -3.3% | 9.5pp |
| Edge >= 5% | +9.5% | -4.2% | 13.7pp |
| Edge >= 10% | +13.3% | -6.5% | 19.8pp |
The model gets MORE wrong as it gets MORE confident. A 10% model edge delivers -6.5% ROI -- the market is efficiently pricing away exactly the predictions the model is most confident about.
Why The Prior Result Was An Artifact
The exploratory calibrate-edge-shrinkage.ts showed shrinkage x0.4 improving ROI from 2.5% to 4.0%. This used a temporal train/validate/test split on all leagues combined. Our rigorous framework uses stratified league-level dev/holdout with bootstrap paired differences and Holm-Bonferroni correction.
The discrepancy comes from:
- Different baseline (prior work may have already had edge filtering)
- Selection bias (optimizing shrinkage factor on evaluation data)
- League confounding (temporal split doesn't control for league-specific effects)
This is why the rigorous framework exists. The exploratory result was overfit.
What Didn't Work
We also tested edge-level shrinkage (multiply CLV by alpha before bet selection). With a threshold of 0, shrinkage is a pure no-op -- it scales values but doesn't change which bets are selected. This was a design error in the first implementation attempt that we caught immediately. The fix was combining shrinkage with a minimum edge threshold.
Even after fixing the implementation, combined shrinkage + threshold (e.g., shrink=0.4 with minEdge=3%) performed at -4.70% -- still worse than baseline.
What This Means
- The model's calibration error scales with confidence. Small edges (~1-3%) are slightly overconfident. Large edges (~10%+) are massively overconfident. The Poisson grid assigns too much probability to extreme outcomes.
- Minimum edge thresholds in production are HARMFUL. If production uses minEdge=7%, it's actively filtering toward the worst bets. The baseline with minEdge=0% is better.
- The reversed hypothesis is promising. If high-edge bets are the worst, then CAPPING maximum edge (removing bets where edge > X%) should improve ROI by eliminating the overconfident tail. This is registered as
max-edge-cap.
- The prior shrinkage result was selection bias. This validates the decision to build rigorous dev/holdout infrastructure before trusting any parameter optimization.
What's Next
- Test `max-edge-cap` -- the reversed hypothesis. Cap edge at 15%, 12%, 10% and measure ROI improvement.
- Quarter-line routing -- quarter-lines are +4.3% ROI vs whole-lines at -7.8%. This structural feature is orthogonal to edge calibration.
- Investigate why the Poisson grid overestimates extreme edges -- this is likely the 1-goal margin underprediction identified in the line mispricing taxonomy.