The 4-Minute Signal Test: How We Explore Fast and Deploy Slow

March 19, 2026|Process|DEPLOYED

The 4-Minute Signal Test: How We Explore Fast and Deploy Slow

The complete signal testing workflow — from hypothesis to deployment in 4 minutes. Register, explore, analyze, gate. Designed for parallel terminals. 10 automated approval gates including per-league matchday interleave OOS, walk-forward validation, and practical significance checks. Nothing reaches production without passing.

Time per Signal

~4 min

Approval Gates

all must pass

Parallel Safe

Yes

multi-terminal

Pending Signals

ready to test

This is how we test betting signals now. Not the theory — the actual workflow, terminal by terminal.

We redesigned the process after a meta-analysis revealed that our previous testing infrastructure had a bug causing 40 "accepted" signals to be validated on a pre-filtered pool. The corrected process is built around one principle: explore fast, deploy slow.

The 4-Minute Loop

Every signal goes through the same loop. The whole thing takes about 4 minutes if you have a hypothesis ready.

Minute 0-1: Register

/signal teams with 7+ days rest outperform AH expectations

The /signal command is the single entry point for the entire pipeline. It checks the registry, determines where the signal is in the pipeline, and runs the appropriate next step. On first invocation it asks for hypothesis, mechanism, metric, and threshold, then registers before any testing happens.

Why register first? Because if you test 20 things and pick the winner, your p-value is lying. The registry tracks the denominator — how many things you tried. Our acceptance rate is 37% (40 out of 107). Without the registry, you'd only see the 40 wins.

Minute 1-3: Explore

The evaluation runs through the canonical pipeline:

npx tsx scripts/test-signal.ts --signal=rest-days-7plus --by-league --by-season

This loads 29,977 precomputed matches across 26 leagues, applies the signal, and shows:

Standalone: Signal alone, all other filters off, minEdge=0 (true isolation)
Marginal: Does adding this signal to the existing stack improve ROI? (leave-one-out test)
By league: Does it work in EPL and Serie B, or just EPL?
By season: Stable across 2022-2025, or one-season fluke?

The /signal command summarizes all of this into a pass/fail assessment before you decide whether to proceed.

Minute 3-4: Gate

If it looks promising:

npx tsx scripts/approve-signal.ts --signal=rest-days-7plus

Ten automated gates. All must pass:

Gate	What it checks	Why
1. Pre-registered	Hypothesis exists in registry	Prevents post-hoc rationalization
2. True standalone	minEdge=0, all filters off	Honest isolation (the bug we fixed)
3. N ≥ 1,000	Enough bets to trust	Below this, ROI is noise
4. Marginal ROI > 0	Helps the stack	The deployment decision
5. Bootstrap p < 0.10	Statistically significant	Could this be luck?
6. Matchday interleave OOS	Odd/even matchdays per league, within 3pp	No late-season bias
7. No regime flip	Consistent across conditions	Doesn't blow up in certain regimes
8. No suspicious N	Not riding a shared pre-filter	The exact bug we caught
9. Practical significance	Marginal > +0.5pp	Worth the complexity
10. Walk-forward	Positive in 2/3 season folds	Stable over time

Gate passes → signal accepted, registry updated. Next:

Add per-signal edge delta computation to scripts/backfill-shadow-signals.ts
Add a SIGNAL_DEFS entry in app/gauntlet/page.tsx — it automatically appears in all column dropdowns
Run npx tsx scripts/backfill-shadow-signals.ts to backfill historical impact
Monitor on /gauntlet for 2 weeks before flipping to live

Gate fails → you see exactly which gate and why. Fix it, variant it, or shelf it.

The Layered Stack

This is the mental model behind the testing. The system has three layers:

Layer 3: Signals         ← what you're testing
Layer 2: Core filters    ← minEdge ≥ 7%, odds ≤ 2.0
Layer 1: MI BP Model     ← produces +11% CLV

Layer 1 is the foundation. The MI Bivariate Poisson model prices match outcomes better than Pinnacle closing lines by 11% on average. This is real and universal across 26 leagues.

Layer 2 is where ROI improves. The edge threshold (only bet when CLV ≥ 7%) and odds cap (≤ 2.0) turn -6% unfiltered ROI into ~-1.4%. These aren't signals — they're the core filtration.

Layer 3 is where signals live. Each one layers on top. The test isn't "does this work alone?" — it's "does adding this to the existing stack improve marginal ROI?" That's Gate 4.

The previous testing infrastructure tested signals as "Signal X + 7% edge threshold" and attributed the improvement to the signal. We fixed this. Now standalone tests use minEdge=0 (true isolation) and the deployment decision is marginal contribution (leave-one-out).

Parallel Exploration

The process is designed for multiple terminal windows running simultaneously:

Terminal 1: /signal bookmaker consensus predicts AH direction
Terminal 2: /signal vig asymmetry toward our side hurts ROI
Terminal 3: /signal midweek matches have different AH margins

Each terminal runs independently — loadAllData() is read-only, signal tests don't share state, and the registry uses unique IDs so there are no write conflicts.

The quality guarantee is the approval gate at the end, not a bureaucratic process at the beginning. Explore aggressively. The gate catches the problems.

What Changed from v1

Before (broken)	After (corrected)
Standalone test inherited minEdge=0.07	minEdge=0, all filters off
40 signals validated on same pre-filtered pool	Each signal tested in true isolation
standaloneN ≈ 1,092 for 4 different signals	standaloneN = 29K-86K (full universe)
No OOS requirement	Must hold on 20 held-out leagues
Deployment based on standalone ROI	Deployment based on marginal ROI
No automated gate	8-gate approval script
CLV and ROI conflated	CLV for model eval, ROI for deployment

The Wrong-Direction Protocol

When a signal shows the opposite of your hypothesis, that's not a failure — it might be the most valuable finding.

Three of our best deployed discoveries came from wrong-direction results:

Congestion filter was REMOVING our best bets (+6.2% ROI)
AH +0.25 line was rejected despite +9.2% ROI
New managers were expected to hurt teams but actually helped

The /signal-test command automatically detects wrong-direction results and offers to register the reversed hypothesis. The reversed hypothesis goes through the same register → explore → gate loop.

Current Stack Performance

After the meta-analysis and corrections (29,977 matches, 26 leagues):

Component	Marginal ROI	Status
MI Bivariate Poisson	—	Foundation (+11% CLV)
minEdge ≥ 7%	~+4.6pp	Core
odds-cap ≤ 2.0	+4.2pp	Core (strongest filter)
variance-regression	-0.4pp	Deployed but neutral
congestion-filter	-0.3pp	Harmful (removal confirmed)
defiance-filter	-0.2pp	Neutral

IS ROI: -1.4%. OOS ROI: -3.6%. The model works. The edge exists. The gap is execution cost — vig, odds quality, entry timing. That's a separate workstream.

How to Get Started

# Test a new hypothesis
/signal-test [your hypothesis here]

# Retest an existing signal with corrected infrastructure
/signal-test [signal-id]

# Run the approval gate on a promising signal
npx tsx scripts/approve-signal.ts --signal=[signal-id]

# See all pending signals
jq '.signals[] | select(.status=="pending") | .id' data/signal-registry.json

The signal registry has 21 pending hypotheses waiting to be tested. The approval gate is ready. The parallel backtest runs 26 leagues in under an hour. Go find alpha.