April 12, 2026|Research

BTTS Workstream: What We Built, What We Learned, What's Next

Two weeks building a complete BTTS pipeline — shot-level xG model, dual-grid calibration, five per-match adjustment layers, 16K-bet backtest. The model finds +4.47% entry ROI against soft books but agrees with Pinnacle. The blocker isn't the model — it's execution venue.

BTTS Workstream: What We Built, What We Learned, What's Next

Two weeks of intensive work on Both Teams To Score. We built a complete BTTS pipeline from scratch — shot-level xG model, per-league correlation calibration, five per-match adjustment layers, multi-source disagreement signals, and a full backtest infrastructure covering 16,000+ historical BTTS bets across 26 leagues. Here's what happened.

What We Built

The Foundation: Dual-Grid Lambda3 Calibration

Standard bivariate Poisson assumes near-independence between home and away scoring (lambda3 ~ 0.02). That's fine for 1X2 and Asian Handicap, but it systematically underestimates BTTS Yes by ~6pp. Goals in football are positively correlated — when one team scores, the other is more likely to open up and score too.

We calibrated a separate lambda3 per league optimized for BTTS Brier score. The architecture uses two score grids: the solver's standard grid for 1X2/AH, and a BTTS-specific grid with higher correlation (typically 0.10-0.25 depending on league).

Result: +2.78pp OOS ROI improvement on BTTS. The single biggest win in the workstream. Deployed to production.

The xG Model: 632K Shots, 25+ Leagues

We trained an XGBoost model on 632,000 shots from three independent sources (StatsBomb, Understat, FotMob) with 69 features. Walk-forward validation: Brier 0.064-0.076, AUC 0.76-0.82.

The key discovery: our precise model (0.67 correlation with goals) is worse at regression detection than FotMob's noisy aggregate (0.35 correlation). This seems paradoxical until you realize the variance filter needs the gap between process and outcome. A precise model closes that gap by design — it can't detect what it explains away.

This led to the multi-layer architecture: use the precise model for decomposition (what CAN be explained), but use the noisy model for regression detection (what CAN'T).

Five Per-Match Adjustment Layers

Each layer captures information the market doesn't price:

Layer 1 — Shot Quality Entropy. Shannon entropy of per-shot xG distribution over rolling 6 matches. High entropy (many medium-quality shots from both teams) means more "at bats" — BTTS more likely. Low entropy (a few high-quality chances) means feast-or-famine — one team blanks. The market sees "2.0 xG" and prices identically regardless of how those 2.0 xG were distributed across shots.

Layer 2 — Finishing Luck Regression. When both teams have been scoring above their xG (hot finishing streaks), BTTS Yes is overpriced — both will regress. When both have been underperforming, BTTS Yes is underpriced. Uses attacking overperformance (goals - xGFor) per team, normalized per match, capped at +/-4pp adjustment.

Layer 3 — GK Post-Shot Expected Goals. A backup goalkeeper changes the BTTS equation — the opponent's probability of scoring goes up. We detect GK changes from lineup data 1 hour before kickoff. The dual-grid bttsCorrelation now flows through GK-adjusted re-predictions so the BTTS probability correctly reflects the weaker goalkeeper.

Layer 4 — Garbage Time Discount. Teams leading 3-0 generate inflated xGA as opponents chase. The market sees "2.5 xGA" and prices BTTS accordingly, but 1.0 of that xGA came after the game was dead. We discount xG generated during extreme score states using minute-by-minute reconstruction from goal timings (available for all 26 leagues via FootyStats).

Layer 5 — Multi-Source Disagreement. Five signals derived from comparing our v3 model, FotMob match-level, and Understat. When all three sources agree a team is overperforming, regression confidence is highest. When sources disagree, prediction uncertainty is high — size down. The most promising sub-signal: layered threshold variance (+0.8% marginal ROI), which requires 2+ independent sources to flag regression before betting.

Multi-Source xG Data Infrastructure

321 league-season files containing per-match xG from three independent sources (fotmob_match, match_xg/Understat, v3_model). Loaded into the backtest pipeline with per-team rolling gap computation. This is the first time we've had true multi-source per-match xG comparison at scale — most systems have one xG source.

BTTS Backtest Pipeline

Full BTTS evaluation against BetExplorer historical odds: 12,298 matches with BTTS data across 12 leagues (all major European). Per-layer impact testing with temporal holdout. The backtest confirmed +9.93% CLV and +4.47% entry-adjusted ROI across 16,494 bets.

What We Learned

The Model Works — Against Soft Books

The uncomfortable truth we circled around for two weeks before confronting it directly:

Comparison	Edge
Model vs FootyStats/bet365 odds	+4.47% entry ROI
Model vs Pinnacle odds	~1% (below 3% threshold)

The model finds real mispricings in BTTS markets. But those mispricings exist at soft books (bet365, Stake, 1xBet), not at Pinnacle. The live pipeline compares against Pinnacle — the sharpest BTTS book. Two well-calibrated models (ours and Pinnacle's) converge. That's why zero BTTS bets have been placed despite the entire infrastructure being production-ready.

Signal Adjustments Can't Bridge the Sharp/Soft Gap

We tested five per-match adjustment layers. The maximum adjustment is +/-4pp per layer. Even stacked, they can't consistently push a 1% model-vs-Pinnacle disagreement past the 3% bet threshold. The adjustments are real and improve bet quality — but they solve a different problem than the one blocking BTTS bets.

Variance Filter is Harmful for BTTS

The standard variance regression filter (which gates all AH/1X2 bets) actively removes the most profitable BTTS bets. Bets where variance opposes the model had +7.02% entry ROI vs +2.23% for bets where variance confirms. The assumptions about regression direction that work for sides betting don't transfer to totals/BTTS.

Noise > Precision for Regression Detection

Confirmed three times with different methodologies: FotMob's noisy match-level xG (0.35 correlation) produces better regression signals (93.8% rate) than our 69-feature model (0.67 correlation, 90.7% rate). The gap between process and outcome IS the signal. A model that explains the gap can't also detect it.

Entry Timing Matters More Than Signal Selection

The +4.47% entry ROI vs -5.24% closing ROI tells us the market corrects toward our model by close. The edge exists early and shrinks. Betting 6-12 hours before kickoff captures more value than any signal refinement at kickoff time.

What's Next

The One Thing That Matters: Soft Book Execution

The model is ready. The pipeline is ready. The adjustments are ready. The blocker is execution venue. The edge exists at soft books. We need accounts at Stake, 1xBet, or bet365 to execute BTTS bets where the mispricing actually lives.

The architecture going forward:

Model validation: Pinnacle (CLV tracking — is the model good? Yes, +11.7%)
Execution target: Soft book odds (where the +4.47% entry ROI exists)
Signal layers: Apply per-match adjustments to identify which soft-book-mispriced matches have the strongest edge

Technical Debt

overperformance-decomposition signal produces 0 bets (filter too restrictive, needs debugging)
inter-model-disagreement threshold too loose (0.5 per match), doesn't activate
Shot quality entropy needs per-team lookup wiring (currently league-level only in BTTS pipeline)
BTS backfill incomplete: 12/44 leagues done, 32 remaining

The Architecture Holds

Despite zero BTTS bets placed, the infrastructure investment pays forward:

Multi-source xG comparison works for AH/1X2 signals too (layered threshold shows +0.8% marginal)
Shot-level data pipeline serves all future per-match analysis
Garbage time discount applies to Over/Under markets (not just BTTS)
The dual-grid correlation framework extends to corners, cards, or any derived market