Sports Dashboard

MI Bivariate Poisson + Dixon-Coles + Elo

← Back to Blog
|Research|INVESTIGATION

The Variance Filter Was Using Fake Data: How We Found a Bug in Ted's Core Signal

The variance filter compared goals to a constant 1.35 instead of real xG. Real xG improves dev (+0.99pp) but not holdout (+0.03pp). Disabling the filter is a coin flip. Ted's thesis needs a richer implementation — not a binary filter, but a multi-factor xG regression score. The data exists (269 match-xG files) but isn't being used properly.

The Variance Filter Was Using Fake Data: How We Found a Fundamental Bug in Ted's Core Signal

Ted Knutson's core thesis: bet when a team's actual results diverge from their expected goals (xG). Teams outperforming xG will regress; teams underperforming will bounce back. The variance filter is supposed to identify these regression candidates.

We discovered it was comparing actual goals to a hardcoded constant of 1.35, not to real per-match xG. The filter was essentially asking "did this team score more or fewer than league average?" — a question Ted would never ask.

The Question

30 signals failed the overnight 10-gate suite. Every quick parametric lever was exhausted. Charles asked: "Is there anything about the 30 failed tests we did wrong that could be hiding alpha? These are real sources of alpha that Ted Knutson has profitably bet on."

He was right to push back. We went looking for implementation bugs in the testing framework itself.

What We Found

Bug: Constant 1.35 Instead of Real xG

In lib/backtest/data-loader.ts line 342:

const avgRate = 1.35;
hh.matches.push({ expectedGF: avgRate, actualGF: m.homeGoals ... });

For training-period matches, expectedGF and expectedGA were both set to 1.35 — the league average goals per team per match. The variance filter then checked if |actualGoals - 1.35| >= 3.0, which is just "did the team score/concede 4+ or 0 in recent matches."

That's not xG regression. That's goal counting.

The Real xG Version Exists But Wasn't Used

buildTeamHistoriesXG() (line 395) properly uses match-level xG from our 269 data files. It falls back to model lambdas when xG is missing. But the 10-gate approval and all signal tests used buildTeamHistories() (the constant version).

Real xG Improves Dev But Fails Holdout

SetStandard (1.35)Real xGDelta
Dev (10 leagues)-4.72%-3.73%+0.99pp
Holdout (9 leagues)-0.35%-0.32%+0.03pp

The holdout improvement is nearly zero. xG data coverage is 20-34% in some holdout leagues (national-league, greek-super), causing fallback to 1.35 anyway.

Disabling the Filter Entirely

SetWith FilterWithoutDelta
Dev-4.72%-4.83%-0.12pp
Holdout-0.35%+0.20%+0.55pp

Not directionally consistent. Per-league analysis: 10 leagues better without the filter, 9 better with. Pure coin flip.

The bets the filter REMOVES have -0.31% ROI (nearly breakeven). The bets it KEEPS have -2.76% ROI. The filter is discarding good bets.

The Nuance

Why Ted's Thesis Doesn't Translate to Our Filter

Ted's xG regression thesis works because:

  1. He uses real per-match xG (from StatsBomb/Opta), not constants
  2. He considers home vs away xG separately (venue splits)
  3. He looks at sustained divergence, not a simple threshold
  4. He factors in opponent quality — outperforming xG against strong teams means something different than against weak ones
  5. He combines xG with other indicators (shot quality, pressing stats, coaching style)

Our filter reduces all of this to: |goals - 1.35| >= 3.0. That's not Ted's thesis. It's a crude approximation that happens to be net-neutral.

The Data Exists

We have 269 match-level xG files covering 75-100% of most leagues. We have GK PSxG data. We have per-match shot quality metrics. The infrastructure to implement Ted's actual methodology exists — the variance filter just doesn't use it.

What This Means

  1. The testing framework DID hide potential alpha — but the alpha is in the data pipeline, not the signal logic. Using real xG data produces different (sometimes better) regression candidate selection, but it doesn't survive holdout validation with our current simple threshold.
  1. The variance filter is not harmful enough to remove — 10/19 leagues better without, 9/19 better with. Net effect ~0. Leave it as-is.
  1. Ted's thesis needs a richer implementation — not a binary filter, but a multi-factor regression score that uses real xG, venue splits, opponent strength, and sustained divergence. This is Batch 2/3 work.

What's Next

The variance filter bug was the last stone to turn in the quick-win category. To properly implement Ted's xG regression thesis, we need to build a new signal from scratch using the match-level xG data we already have. That's hours of work, not minutes — but the data is ready.

INVESTIGATIONSignal: variance-regression|2026-03-20