Sports Dashboard

MI Bivariate Poisson + Dixon-Coles + Elo

← Back to Blog
|Process|DEPLOYED

The 4-Minute Signal Test: How We Explore Fast and Deploy Slow

The complete signal testing workflow — from hypothesis to deployment in 4 minutes. Register, explore, analyze, gate. Designed for parallel terminals. 10 automated approval gates including per-league matchday interleave OOS, walk-forward validation, and practical significance checks. Nothing reaches production without passing.

Time per Signal
~4 min
register → explore → gate
Approval Gates
8
all must pass
Parallel Safe
Yes
multi-terminal
Pending Signals
21
ready to test

This is how we test betting signals now. Not the theory — the actual workflow, terminal by terminal.

We redesigned the process after a meta-analysis revealed that our previous testing infrastructure had a bug causing 40 "accepted" signals to be validated on a pre-filtered pool. The corrected process is built around one principle: explore fast, deploy slow.


The 4-Minute Loop

Every signal goes through the same loop. The whole thing takes about 4 minutes if you have a hypothesis ready.

Minute 0-1: Register

/signal teams with 7+ days rest outperform AH expectations

The /signal command is the single entry point for the entire pipeline. It checks the registry, determines where the signal is in the pipeline, and runs the appropriate next step. On first invocation it asks for hypothesis, mechanism, metric, and threshold, then registers before any testing happens.

Why register first? Because if you test 20 things and pick the winner, your p-value is lying. The registry tracks the denominator — how many things you tried. Our acceptance rate is 37% (40 out of 107). Without the registry, you'd only see the 40 wins.

Minute 1-3: Explore

The evaluation runs through the canonical pipeline:

npx tsx scripts/test-signal.ts --signal=rest-days-7plus --by-league --by-season

This loads 29,977 precomputed matches across 26 leagues, applies the signal, and shows:

  • Standalone: Signal alone, all other filters off, minEdge=0 (true isolation)
  • Marginal: Does adding this signal to the existing stack improve ROI? (leave-one-out test)
  • By league: Does it work in EPL and Serie B, or just EPL?
  • By season: Stable across 2022-2025, or one-season fluke?

The /signal command summarizes all of this into a pass/fail assessment before you decide whether to proceed.

Minute 3-4: Gate

If it looks promising:

npx tsx scripts/approve-signal.ts --signal=rest-days-7plus

Ten automated gates. All must pass:

GateWhat it checksWhy
1. Pre-registeredHypothesis exists in registryPrevents post-hoc rationalization
2. True standaloneminEdge=0, all filters offHonest isolation (the bug we fixed)
3. N ≥ 1,000Enough bets to trustBelow this, ROI is noise
4. Marginal ROI > 0Helps the stackThe deployment decision
5. Bootstrap p < 0.10Statistically significantCould this be luck?
6. Matchday interleave OOSOdd/even matchdays per league, within 3ppNo late-season bias
7. No regime flipConsistent across conditionsDoesn't blow up in certain regimes
8. No suspicious NNot riding a shared pre-filterThe exact bug we caught
9. Practical significanceMarginal > +0.5ppWorth the complexity
10. Walk-forwardPositive in 2/3 season foldsStable over time

Gate passes → signal accepted, registry updated. Next:

  1. Add per-signal edge delta computation to scripts/backfill-shadow-signals.ts
  2. Add a SIGNAL_DEFS entry in app/gauntlet/page.tsx — it automatically appears in all column dropdowns
  3. Run npx tsx scripts/backfill-shadow-signals.ts to backfill historical impact
  4. Monitor on /gauntlet for 2 weeks before flipping to live

Gate fails → you see exactly which gate and why. Fix it, variant it, or shelf it.


The Layered Stack

This is the mental model behind the testing. The system has three layers:

Layer 3: Signals         ← what you're testing
Layer 2: Core filters    ← minEdge ≥ 7%, odds ≤ 2.0
Layer 1: MI BP Model     ← produces +11% CLV

Layer 1 is the foundation. The MI Bivariate Poisson model prices match outcomes better than Pinnacle closing lines by 11% on average. This is real and universal across 26 leagues.

Layer 2 is where ROI improves. The edge threshold (only bet when CLV ≥ 7%) and odds cap (≤ 2.0) turn -6% unfiltered ROI into ~-1.4%. These aren't signals — they're the core filtration.

Layer 3 is where signals live. Each one layers on top. The test isn't "does this work alone?" — it's "does adding this to the existing stack improve marginal ROI?" That's Gate 4.

The previous testing infrastructure tested signals as "Signal X + 7% edge threshold" and attributed the improvement to the signal. We fixed this. Now standalone tests use minEdge=0 (true isolation) and the deployment decision is marginal contribution (leave-one-out).


Parallel Exploration

The process is designed for multiple terminal windows running simultaneously:

Terminal 1: /signal bookmaker consensus predicts AH direction
Terminal 2: /signal vig asymmetry toward our side hurts ROI
Terminal 3: /signal midweek matches have different AH margins

Each terminal runs independently — loadAllData() is read-only, signal tests don't share state, and the registry uses unique IDs so there are no write conflicts.

The quality guarantee is the approval gate at the end, not a bureaucratic process at the beginning. Explore aggressively. The gate catches the problems.


What Changed from v1

Before (broken)After (corrected)
Standalone test inherited minEdge=0.07minEdge=0, all filters off
40 signals validated on same pre-filtered poolEach signal tested in true isolation
standaloneN ≈ 1,092 for 4 different signalsstandaloneN = 29K-86K (full universe)
No OOS requirementMust hold on 20 held-out leagues
Deployment based on standalone ROIDeployment based on marginal ROI
No automated gate8-gate approval script
CLV and ROI conflatedCLV for model eval, ROI for deployment

The Wrong-Direction Protocol

When a signal shows the opposite of your hypothesis, that's not a failure — it might be the most valuable finding.

Three of our best deployed discoveries came from wrong-direction results:

  • Congestion filter was REMOVING our best bets (+6.2% ROI)
  • AH +0.25 line was rejected despite +9.2% ROI
  • New managers were expected to hurt teams but actually helped

The /signal-test command automatically detects wrong-direction results and offers to register the reversed hypothesis. The reversed hypothesis goes through the same register → explore → gate loop.


Current Stack Performance

After the meta-analysis and corrections (29,977 matches, 26 leagues):

ComponentMarginal ROIStatus
MI Bivariate PoissonFoundation (+11% CLV)
minEdge ≥ 7%~+4.6ppCore
odds-cap ≤ 2.0+4.2ppCore (strongest filter)
variance-regression-0.4ppDeployed but neutral
congestion-filter-0.3ppHarmful (removal confirmed)
defiance-filter-0.2ppNeutral

IS ROI: -1.4%. OOS ROI: -3.6%. The model works. The edge exists. The gap is execution cost — vig, odds quality, entry timing. That's a separate workstream.


How to Get Started

# Test a new hypothesis
/signal-test [your hypothesis here]

# Retest an existing signal with corrected infrastructure
/signal-test [signal-id]

# Run the approval gate on a promising signal
npx tsx scripts/approve-signal.ts --signal=[signal-id]

# See all pending signals
jq '.signals[] | select(.status=="pending") | .id' data/signal-registry.json

The signal registry has 21 pending hypotheses waiting to be tested. The approval gate is ready. The parallel backtest runs 26 leagues in under an hour. Go find alpha.