The Model Works, The Execution Doesn't: Why Capture Signals Failed and What That Tells Us
We tested 4 capture signals (vig-aware, line movement confirmation, Pinnacle vs market gap, AH line shift) against 6,986 matches with complete open/close/average odds. All 4 rejected. The CLV→ROI gap isn't about execution quality — it's about calibration. High-vig bets actually have HIGHER CLV. 94% of our bets are already on the sharp side. The path to profitability is Track 2 (model improvements), not Track 1 (better execution).
Our model sees +11% CLV across 26 leagues. Our portfolio returns -3.28% ROI. That's a 14 percentage point execution gap — and it's not the model's fault.
Ted Knutson quantified this in his own results: on a +7.7% neutral ROI season (182 bets), vig alone consumed £1,700 of £2,800 in expected profit. Sixty percent of his edge, eaten by margin. The remaining 40% is what he actually kept.
We've been hunting for new signals (Track 2: Discover) when the bigger opportunity was staring at us: we have capture infrastructure already built but never wired in. The match cache contains opening odds, closing odds, market averages, and AH line movements for 6,986 matches. The code to analyze vig distribution and line movement already exists in lib/signals/. We just never connected it to the backtest.
This post covers what we found when we finally did.
The Capture Gap
Here's how we lose money on a model that sees real edge:
Model CLV: +11.0% (what we see) Vig cost: ~4-5% (what the bookie takes) Calibration: ~3-4% (model overconfidence on some bets) Timing/quality: ~2-3% (we bet the wrong lines at the wrong prices) ───────────────────────── Net ROI: -3.28% (what we keep)
Track 2 (Discover) attacks the calibration layer — better signals, regime awareness, model improvements. We've been doing this for weeks. Track 1 (Capture) attacks the vig and timing layers — picking bets with better execution quality. We haven't touched this.
The match cache has everything we need:
| Data Field | What It Tells Us | Available? |
|---|---|---|
| `pinnacleHome` vs `pinnacleCloseHome` | How the sharp line moved | 6,986 matches |
| `avgHome` vs `pinnacleHome` | Sharp vs soft book disagreement | 6,986 matches |
| `pinnacleHome/Draw/Away` combined | Vig overround per match | 6,986 matches |
| `ahLine` vs `ahCloseLine` | Whether the handicap itself shifted | 2,388 matches (34.2%) |
We also have infrastructure already coded but never tested:
lib/signals/vig-distribution.ts— computes per-side vig share and identifies the sharp sidelib/signals/line-movement.ts— detects steam moves, reverse line movement, sharp-model agreementlib/signals/line-tracker.ts— tracks movement speed and direction
All of this was built months ago. None of it has been connected to the evaluation pipeline.
Four Capture Signals
C1: Vig-Aware Bet Selection
Hypothesis: Bets on matches with lower Pinnacle overround convert CLV to ROI more efficiently. High-vig matches eat our edge.
Mechanism: Pinnacle's overround varies by match. A match with 2.5% total overround costs less to bet than one with 5%. If we preferentially take low-vig bets, we keep more of our CLV.
Computation: From Pinnacle closing 1X2 odds, compute: overround = (1/home + 1/draw + 1/away) - 1. For AH: overround = (1/ahHome + 1/ahAway) - 1. Split bets into low-vig (< median) and high-vig (>= median).
C2: Line Movement Confirmation
Hypothesis: When our model's edge agrees with the direction Pinnacle's line moved (opening → closing), the bet is higher quality. Sharp money is confirming our view. When they disagree, we may be on the wrong side.
Mechanism: Pinnacle's line moves because sharp bettors bet into it. If the line moved TOWARD the outcome our model favors, sharp money and our model agree. If the line moved AWAY, sharp money disagrees.
Computation: openProb = 1/pinnacleHome, closeProb = 1/pinnacleCloseHome. movement = closeProb - openProb. For our bet: if we back Home and movement > 0, sharps agree (line shortened on Home = more money on Home). If movement < 0, sharps disagree.
C3: Pinnacle vs Market Average Gap
Hypothesis: When Pinnacle offers better odds than the market average on the outcome we're betting, we're getting value from the sharpest book. When Pinnacle is tighter than average, we're paying a premium.
Mechanism: Pinnacle's odds reflect sharp information. Market average includes soft books that are risk-managing. If Pinnacle is MORE generous than average on our selection, the soft books are overpricing it — and we're on the sharp side.
Computation: pinnaclePrice = pinnacleCloseHome, avgPrice = avgCloseHome. gap = pinnaclePrice - avgPrice. Positive gap = Pinnacle more generous = we're on the sharp side.
C4: AH Line Shift Direction
Hypothesis: When the AH line itself shifts between opening and closing in a direction that helps our bet, the market is agreeing with our model's view of the match.
Mechanism: AH lines shift when the balance of sharp money changes the book's view of the fair handicap. A shift from -1.0 to -0.75 means the market thinks the home team is weaker than initially priced. If we're backing the away side, this shift helps us.
Computation: lineShift = ahCloseLine - ahLine. For our bet: determine whether the shift made our handicap easier or harder to cover. Tag as favorable/unfavorable/stable.
Test Protocol
Same 8-phase protocol as all signal tests:
- Pre-register each hypothesis in signal-registry.json
- True standalone (minEdge=0, full AH universe)
- Marginal contribution (deployed baseline → add signal)
- IS/OOS split (6 dev leagues vs 13 OOS)
- Bootstrap significance (permutation test, 5K resamples)
- Regime stratification (season phase × side)
- Per-category breakdown for any categorical dimension
- Combined stack with previously accepted signals
Baseline: AH-only, odds ≤ 2.0, edge ≥ 7%. N=3,927 bets. ROI=-3.28%.
These signals differ from discovery signals in one key way: they don't change WHAT we bet on, they change HOW WELL we bet. A capture signal that improves ROI by +2pp on the same bets is worth more than a discovery signal that finds +2pp edge on a smaller pool — because capture compounds across every single bet we make.
The Results
Every capture signal was rejected. None met the +2pp marginal ROI threshold. Here's what happened and — more importantly — why.
| Signal | N Kept | Marginal ROI | p-value | Verdict |
|---|---|---|---|---|
| vig-aware (< median overround) | 1,957 | +0.20pp | 0.007 | ❌ REJECT |
| vig-aware-strict (< 3%) | 2,776 | -0.75pp | 0.001 | ❌ REJECT |
| line-movement-confirms | 1,009 | +1.72pp | 0.0002 | ❌ REJECT (close) |
| line-movement-no-against | 1,548 | +0.75pp | 0.0002 | ❌ REJECT |
| pinnacle-avg-sharp-side | 3,703 | +0.09pp | 0.919 | ❌ REJECT |
| pinnacle-avg-not-against | 3,889 | -0.19pp | 1.000 | ❌ REJECT |
| ah-shift-favorable | 3,791 | -0.18pp | 0.244 | ❌ REJECT |
| ah-shift-not-against | 3,791 | -0.18pp | 0.244 | ❌ REJECT |
The Closest Miss: Line Movement Confirmation (+1.72pp)
The most interesting result was line-movement-confirms at +1.72pp marginal ROI — agonizingly close to the +2pp threshold. Bets where Pinnacle's line moved in the same direction as our model had -1.56% ROI (kept 1,009 bets), while bets where sharps disagreed had -3.87% ROI (2,918 bets). The model + sharp money agreeing IS a real signal.
But the IS/OOS split killed it when stacked. IS leagues showed +6.03pp improvement; OOS showed +0.02pp. The signal works on EPL/La Liga/etc. (where Pinnacle lines are efficient and movement is informative) but not on lower leagues (where movement is noise from risk management).
Why Vig Filtering Fails
The counterintuitive finding: high-vig matches have HIGHER CLV (+6.63% vs +6.06% for low-vig). The model's edge is actually LARGER on expensive matches. This makes sense — high vig means the market is uncertain (wider spreads), which is exactly where a model can disagree most with the market. Filtering on vig removes the most uncertain matches, which are also the ones where the model has the most to say.
The +0.20pp marginal ROI improvement is real but trivial. Vig costs us, but it costs us uniformly — there's no subset of bets where the vig-to-edge ratio is dramatically better.
Why Pinnacle vs Average Gap Is Useless
94% of our baseline bets are already on the "sharp side" (Pinnacle more generous than average). Our model, which uses devigged Pinnacle odds as input, naturally generates bets that align with Pinnacle's pricing. The signal has no discriminating power — it fires on nearly every bet.
Why AH Line Shifts Don't Help
92% of shifted AH lines shifted in our bet's favor. Again, no discrimination. The model already picks sides that the market is moving toward. The remaining 8% of unfavorable shifts are too few to meaningfully filter.
What This Taught Us
The CLV→ROI gap isn't about odds quality
We hypothesized that execution cost (vig, timing, line quality) explains the gap between +11% CLV and -3.28% ROI. The data says otherwise:
- Vig is uniform — low-vig and high-vig bets have approximately the same ROI. Vig doesn't selectively eat our edge.
- Line movement doesn't discriminate well enough — the closest signal (+1.72pp) fails OOS. Sharp confirmation helps on liquid leagues but not broadly.
- We're already on the sharp side — 94% of bets. The model inherently aligns with Pinnacle.
So what IS the gap?
The gap is calibration — the model overestimates its edge on certain match types. This is Track 2 territory (model improvements), not Track 1 (execution). Specifically:
- Draw probability miscalibration — Poisson overweights draws, but AH avoids this
- Home team overconfidence — home AH bets at -6.85% ROI (known problem)
- League-specific miscalibration — Segunda/La Liga/Ligue 2 are structurally negative
- Late-season overconfidence — model doesn't adjust for motivation/crystallization
The discovery signals from Ted Canon 2 (league filter + home AH rescue) attack exactly these problems. The capture signals don't add on top because the execution layer isn't where the problem is.
The remaining path to profitability
The combined discovery stack already gets us to +0.97% ROI on 2,055 bets. The next improvements need to come from:
- Better lambdas — the O/U loss term addition (+90u in prior test) shows model architecture changes have the biggest impact
- Tier-specific drift — lower-league ratings need more flexibility (Ted Canon insight, not yet tested)
- Manager transition window — wrong-direction discovery suggests coaching upgrades are underpriced
- Score-state xG adjustment — raw xG doesn't account for game state effects
All of these are Track 2 (Discover) improvements that change the model's calibration, not the execution quality of individual bets.
Running the Tests
npx tsx scripts/test-capture-signals.ts npx tsx scripts/test-capture-signals.ts --signal line-movement
Results saved to data/backtest/capture-signal-results.json.