Testing Sufferball Routing: Why Our Strongest Signal Failed the Gate
Our original strongest finding — Under 2.5 vs sufferball teams at +10.83% CLV, +7.63% ROI — was retested through the corrected 10-gate pipeline. 5/10 gates passed. Marginal ROI = -0.2pp (signal hurts the stack). Original N=262 was a pre-filtered artifact. Walk-forward: 2024 -4.2%, 2025 -8.6%. Shelved.
This was supposed to be our strongest signal. Under 2.5 against sufferball teams: +10.83% CLV, +7.63% ROI, N=262. A 6 percentage point delta over sides betting. Ted Knutson's own words: "Forest's entire purpose in football is to leech the fun out of games."
It failed the 10-gate approval with 5/10 gates passed — including Gate 4 (marginal ROI), the deployment decision. Adding the signal to the existing stack makes things 0.2 percentage points WORSE.
The Question
When one team plays sufferball — suppresses shots (< 10 conceded per game) and keeps xGA under 1.0 — should we route bets to the Under 2.5 market instead of Asian Handicaps? The original backtest (2026-03-17) said yes, dramatically so.
What We Found
| Metric | Original Test (2026-03-17) | Canonical Pipeline (2026-03-19) |
|---|---|---|
| Method | Custom analysis, pre-filtered bets | `approve-signal.ts` 10-gate |
| N | 262 | 72,666 (standalone) / 5,842 (with signal) |
| CLV | +10.83% | +5.7% (standalone) |
| ROI | +7.63% | -5.4% (standalone) |
| Marginal ROI | +5.98pp (claimed) | **-0.2pp** (actual) |
| p-value | not computed | 0.575 |
The original result was a 262-bet sample from a pre-filtered pool (AH bets with edge ≥ 7%, odds ≤ 2.0, already passing Ted filters). The canonical pipeline tests against 29,977 matches with minEdge=0 and all filters off. The signal goes from hero to zero.
The Nuance
Walk-forward tells the story
| Year | N | ROI |
|---|---|---|
| 2022 | 980 | -0.2% |
| 2023 | 1,793 | -0.5% |
| 2024 | 1,985 | -4.2% |
| 2025 | 1,084 | -8.6% |
The signal was approximately neutral in 2022-2023 and actively harmful in 2024-2025. Whatever pattern existed in historical data is gone.
Odds quality tier
| Tier | N | ROI | CLV |
|---|---|---|---|
| Sharp (Big 6 + UCL) | 1,852 | -1.1% | +11.3% |
| Medium | 1,701 | -5.9% | +11.1% |
| Soft | 2,289 | -2.8% | +11.2% |
CLV is stable across tiers, but ROI is worst in medium-tier leagues. Sharp leagues (where the original sufferball analysis was focused) perform best regardless of the style signal.
The proxy problem
The original style classifier used FootyStats data with xG (shots conceded < 10 AND xGA < 1.0). The canonical pipeline doesn't have per-match xG from FootyStats — it uses shots-against from the standard match cache as a proxy. This means the sufferball classification is cruder: any team conceding < 10 shots per game qualifies, even if their xGA is high (allowing few but dangerous chances).
This proxy difference could explain some of the gap. But even if the original classifier were perfect, the N=262 sample was 4x below the 1,000-bet minimum. We were never going to get a reliable verdict from that sample.
What Didn't Work
The signal fails at three levels:
- Gate 4 (Marginal ROI < 0): Adding sufferball routing to the stack makes things 0.2pp worse. The bets it removes (1X2/AH on sufferball matches) are not worse than average — they're approximately the same. Removing them just shrinks the pool without improving quality.
- Gate 10 (Walk-forward): Even if the marginal were positive, the 2024-2025 collapse would kill it. The signal degrades sharply over time — the exact opposite of what you want from something you're about to deploy.
- Gate 8 (Suspicious N): Standalone N=72,666 is within 10% of 9 other signals. This means the sufferball filter doesn't meaningfully change the bet universe in standalone mode — most matches don't have a sufferball team, so the signal passes nearly everything through. It's not selecting a distinctive subset.
What This Means
The style-matchup routing concept is shelved. The original finding was a pre-correction artifact: small sample, pre-filtered pool, no walk-forward validation. The corrected pipeline shows no incremental value.
This doesn't mean team style is irrelevant to betting. It means:
- The sufferball → Under routing doesn't survive as a signal-layer filter
- Whatever style edge exists is already captured by the MI model's lambda estimates (which use Pinnacle odds that already price in defensive styles)
- The original N=262 was never a reliable basis for deployment
What's Next
No retest planned unless FootyStats xG data is wired into the canonical evaluation pipeline. The shots-against proxy may be too crude, but even with better classification, the walk-forward degradation suggests the underlying edge — if it ever existed — has been arbitraged.
# Reproduce this result npx tsx scripts/approve-signal.ts --signal=style-matchup-bet-routing