How the Model Actually Works: A Plain-English Guide
A plain-English walkthrough of the entire system: the MI Bivariate Poisson model (the engine), production filters (the transmission), and experimental signals (the steering). What's working (+7% CLV), what isn't (-2.2% ROI), and why the gap exists.
How the Model Actually Works: A Plain-English Guide
You don't need a statistics degree to understand what we're building. This post explains the entire system — what it does, how it finds betting edges, and why it's still not profitable (yet).
The Big Picture: Three Layers
Think of the system like a car:
- Layer 1: The Engine — A mathematical model that predicts football match outcomes
- Layer 2: The Transmission — Production filters that decide which predictions are good enough to bet on
- Layer 3: The Steering — Experimental signals that fine-tune which bets to take and which to skip
The engine does 90% of the work. The transmission prevents bad bets from getting through. The steering nudges the system toward slightly better decisions.
Layer 1: The Engine (MI Bivariate Poisson Model)
The model's job is simple: figure out how good each team's attack and defense are, then predict what will happen when two teams play.
Here's how it works, step by step:
Step 1: Read the Market
The model starts with Pinnacle odds — the sharpest (most accurate) betting market in the world. When Pinnacle says Arsenal are 1.65 to beat Crystal Palace, that price encodes everything: form, injuries, home advantage, motivation, weather, everything. Thousands of professional bettors have already priced all of that in.
The model asks: "If Arsenal are priced at 1.65, what does that tell me about their attack strength and Crystal Palace's defensive strength?"
Step 2: Build Team Ratings
Using odds from hundreds of matches across a season, the model figures out a single number for each team's attack strength and each team's defense strength. Arsenal might have attack=2.1 and defense=0.7 (strong attack, solid defense). Crystal Palace might be attack=0.9 and defense=1.3 (weak attack, leaky defense).
These aren't arbitrary numbers — they're the values that best explain all the Pinnacle odds across all matches in that league.
Step 3: The Score Grid
This is the core innovation. The model takes the two teams' attack and defense ratings, combines them with home advantage and a correlation factor, and produces a grid of probabilities for every possible scoreline.
Imagine a spreadsheet where the rows are home goals (0, 1, 2, 3, 4, 5) and the columns are away goals (0, 1, 2, 3, 4, 5). Each cell contains the probability of that exact score. Like:
Away 0 Away 1 Away 2 Away 3 Home 0 4.2% 8.1% 5.9% 2.1% Home 1 10.3% 14.2% 8.7% 2.8% Home 2 11.5% 12.8% 6.9% 2.0% Home 3 7.8% 7.5% 3.6% 0.9%
From this single grid, you can derive the probability of ANYTHING:
- Home win? Add up all cells where home goals > away goals
- Over 2.5 goals? Add up all cells where total > 2
- Asian Handicap -0.5? Add up cells where home wins by 1+ goals
- Both teams to score? Add up cells where both columns > 0
This is why it's called "Bivariate Poisson" — it uses the Poisson distribution (a formula that models how often random events happen) for two variables at once (home goals and away goals), with a correlation parameter that captures the fact that goals in football aren't fully independent.
Step 4: Find Edge
Now the model compares its probabilities to what the market says.
The model thinks Arsenal have a 62% chance of winning. Pinnacle's closing odds imply 58%. That's a 4% edge — the model sees value that the market doesn't.
But here's the catch: just because the model THINKS there's a 4% edge doesn't mean there IS a 4% edge. The model could be wrong. And in fact, we've discovered that the model is consistently overconfident on its biggest edges — when it thinks it has a 15% edge, the real edge might only be 5%.
Layer 2: The Transmission (Production Filters)
Not every edge is worth betting. The filters are quality control:
- Minimum edge: 7% — Don't bet unless the model sees at least 7% more probability than the market. Small edges get eaten by commission and variance.
- Maximum odds: 2.0 — Don't bet on long shots (odds above 2.0). The model is less reliable for unlikely outcomes.
- No draws — The model is systematically overconfident on draws. Skip them entirely.
- Variance regression — Only bet when at least one team's recent results diverge from their underlying quality (measured by expected goals). These are "regression candidate" matches where the model has the biggest informational advantage.
- Congestion filter — Skip matches where a team played 3+ times in 8 days. Fatigue makes outcomes less predictable.
- Defiance filter — If a team has defied the model 10+ consecutive times, something structural has changed that the model hasn't caught. Skip them.
After these filters, about 2,700 bets survive per 3-year backtest window (out of ~30,000 possible). These are the model's "best ideas."
Layer 3: The Steering (Experimental Signals)
Signals are hypotheses we test. Examples:
- "Excluding La Liga, Segunda, and Ligue 2 improves ROI" (tc2-league-filter)
- "Only betting home Asian Handicap against bottom-quarter opponents" (tc2-home-ah-rescue)
- "Capping maximum edge at 15% removes overconfident bets" (max-edge-cap)
Each signal goes through a 10-gate approval process before it can be deployed:
- Must be pre-registered (no fishing for patterns after seeing results)
- Must work in isolation (not just piggybacking on other filters)
- At least 1,000 bets (small samples are unreliable)
- Must improve ROI when added to the existing stack
- Statistically significant (bootstrap test, p < 0.10)
- Works in-sample AND out-of-sample (no overfitting)
- Works across all season phases (early, mid, late)
- Doesn't suspiciously overlap with other signals
- Meaningful improvement (> 0.5 percentage points)
- Holds up in walk-forward testing (works in every year, not just one)
In practice, most signals fail. We've tested 40+ and only a handful have passed. The gates are deliberately harsh — it's better to miss a real signal than to deploy a false one.
The Numbers: What's Working and What Isn't
What's working:
- CLV (Closing Line Value): +7.0% — The model correctly predicts which way the market will move. When we bet, the closing line (the final price before kickoff) moves in our direction 7% of the time on average. This is genuine edge — we're seeing something the market hasn't fully priced yet.
What isn't working:
- ROI: -2.2% — Despite finding genuine edge, we're still losing money. For every 100 units staked, we lose 2.2 units.
Why the gap?
The model is overconfident on its biggest edges. When the model says "15% edge," reality is more like "5% edge." The small edges (1-5%) are fairly accurate. The large edges (10%+) are systematically inflated.
This happens because the Poisson score grid doesn't perfectly match how real football scores work. Real matches produce more 1-0 and 2-1 results than the model predicts, and fewer 3-0 and 4-1 blowouts. When the model predicts a blowout is more likely than the market says, it generates a big edge — but the blowout probability is wrong.
What We've Done So Far
The "Fix the Engine" session (today):
We tested three approaches to closing the CLV-to-ROI gap:
- Make the solver react faster to form changes — FAILED. The solver's reaction speed is already optimal.
- Shrink overconfident edges — WRONG DIRECTION. Filtering high-edge bets makes things worse (those are the only profitable bets mixed in with the overconfident ones).
- Remove noisy data from the solver — WORKED. The solver was fitting to match results and expected goals data, which added noise. When we told it to ONLY fit Pinnacle odds, calibration improved. ROI went from -3.0% to -2.2%.
What's next:
The score grid itself. The Bivariate Poisson distribution assumes goals follow a specific mathematical pattern. Real goals don't perfectly follow this pattern — they're "overdispersed" (more variance than Poisson predicts) and the correlation between home and away goals is more complex than a single parameter can capture.
Fixing the grid would fix the root cause of the overconfidence problem. Every filter and signal downstream would benefit because the predictions they're filtering would be more accurate.
The Honest State of Things
We have a system that:
- Genuinely finds edge in football betting markets (+7% CLV across 19 leagues, 15,000+ bets)
- Has the most rigorous testing infrastructure we could build (10-gate approval, walk-forward validation, stratified dev/holdout splits)
- Is still not profitable (-2.2% ROI)
- Is 27% less unprofitable than it was this morning
The gap between "right about outcomes" and "making money" is a calibration problem. We know exactly what's causing it (score grid overconfidence on extreme outcomes) and we have a clear path to fix it (improve the distributional model).
We're building the plane while flying it, documenting every bolt we tighten and every one that strips. The model is genuinely good at its core job. The engineering challenge is converting that into sustainable profit.