The Gate Killed Our Darlings: How Two 'Validated' Signals Failed the Formal Process

March 19, 2026|Post-Mortem|REJECTED

The Gate Killed Our Darlings: How Two 'Validated' Signals Failed the Formal Process

We found two signals worth ~700u, validated them with Monte Carlo and walk-forward, deployed them, then ran the 10-gate process. Both failed — then we discovered the gate had a bug (wasn't toggling signals). Fixed it, re-ran: congestion +0.3pp (p=0.36), AH lines -0.1pp (p=0.54). Still rejected. Right answer, wrong path to get there.

Congestion

+0.3pp

marginal ROI, p=0.36 (6/10 gates)

AH Lines

-0.1pp

marginal ROI, p=0.54 (4/10 gates)

Gate Bug

Fixed

was reporting 0.0pp for all unmapped signals

CLV-ROI Gap

14pp

+11% CLV → -3% ROI

Two weeks ago we mined 39 rejected experiments and found what looked like ~700 units of hidden profit. We ran Monte Carlo bootstrap (p=0.009), walk-forward validation (3/3 replicate), and even audited the settlement code. Everything checked out. We deployed two changes: disabled the congestion filter and added AH line-specific confidence adjustments.

Then we ran them through our new 10-gate approval process.

Both failed. We reverted everything.

Then we discovered the gate itself had a bug — it wasn't actually toggling the signals on/off, just comparing the base portfolio to itself. We fixed it and re-ran. Real results: congestion removal adds +0.3pp marginal ROI (p=0.36, not significant). AH line exclusion adds -0.1pp (slightly hurts). Both still rejected, but now with honest numbers.

This is the story of how a rigorous process killed two signals that looked bulletproof, then how we found the process itself was broken, fixed it, and still got the same answer.

The Hypothesis

Signal 1: Congestion Filter Removal

Our congestion filter removes matches where either team plays 3+ times in 8 days. A deep stratification study found that congested bets actually had the *best* ROI (+6.2%) and best calibration (+2.0pp CalGap) of any rest-day bucket. The filter appeared to be removing our most profitable bets.

Signal 2: AH +0.25 Line Confidence Boost

A per-line breakdown of all 15,200 AH bets revealed that the +0.25 line had +9.2% ROI on 6,360 bets — 42% of all AH bets generating 135% of total profit. Meanwhile, the +0.75 line was -16.7% ROI. We proposed boosting confidence on +0.25 bets and penalizing +0.75 bets.

Both findings were statistically significant, consistent across seasons and leagues, and passed independent validation.

The Tests That Passed

We didn't deploy blindly. Both signals went through serious validation before the first commit:

Monte Carlo Bootstrap (10,000 iterations)

Signal	Delta	p-value (block)	95% CI
Congestion removal	+0.6pp ROI	0.009	[+0.01pp, +1.2pp]
+0.25 line advantage	+10.9pp ROI	<0.0001	[+7.9pp, +14.0pp]

Walk-Forward Hold-Out (train on early seasons, validate on later)

Signal	Config A	Config B	Config C
Congestion	Replicates	Replicates	Replicates
+0.25 line	Replicates	Replicates	Replicates

Probability of Ruin

The proposed portfolio nearly doubled the Sharpe ratio (2.48 to 4.69) and dropped ruin probability from 1.05% to 0.05%.

These results looked airtight. We deployed.

The Test That Failed

Then we ran approve-signal.ts — the 10-gate formal approval process that was built *after* our initial deployment. This is the new canonical pipeline that evaluates signals as marginal contributions to the full stack, not in isolation.

Congestion filter removal: 5/10 gates passed

Gate	Result	Verdict
1. Pre-registered	PASS
2. True standalone	PASS	N=87,210, ROI=-5.9%, CLV=+5.6%
3. Minimum N	PASS	87,210 >> 1,000
4. Marginal ROI	FAIL	+0.0pp (base -3.0% with or without)
5. Bootstrap marginal	FAIL	p=0.50
6. OOS interleave	PASS	gap 2.8pp
7. Regime stratification	PASS
8. Suspicious N	FAIL	Similar to 2 other signals
9. Practical significance	FAIL	+0.0pp < +0.5pp threshold
10. Walk-forward	FAIL	2/4 folds positive (2024, 2025 negative)

AH +0.25 line: identical results. 5/10 gates passed, same failures.

The critical gate is Gate 4: Marginal ROI. It asks: "Does adding this signal to the existing filter stack improve ROI?" The answer for both signals: no. Zero. The base portfolio produces -3.0% ROI with the full stack. Adding or removing either signal doesn't move that number.

Why Did the Earlier Tests Pass But the Gate Fail?

Three differences between our earlier analysis and the canonical pipeline:

1. Standalone vs. Marginal

Our earlier Monte Carlo and walk-forward tests measured the signal in isolation — comparing congested-only bets to non-congested bets, or +0.25 bets to other lines. These showed real differences.

But the approval gate measures marginal contribution to the stack. When you add the congestion signal on top of the existing minEdge filter, variance filter, pass-rate filter, and all the other deployed signals, the incremental value is zero. The stack already captures whatever the congestion signal was finding.

This is the "last mile" problem. A signal can be statistically real in isolation and still add nothing to a system that already works differently.

2. 15K bets vs. 87K bets

Our earlier analysis used 15,200 AH-only bets from the original backtest. The approval gate uses 87,210 bets across all markets from the 26-league canonical dataset. The larger, more diverse dataset diluted the effects that looked strong in the narrower sample.

3. ROI vs. CLV

Our earlier work focused heavily on ROI differences. But the approval gate revealed something more fundamental: CLV is +11% across the board regardless of which signals are active. The model genuinely beats the closing line. The problem is converting that CLV to ROI — and neither of these signals helps with that conversion.

What We Actually Learned

The Model Works. The Conversion Doesn't.

This is the biggest takeaway. Across 87,210 bets:

Metric	Standalone	With Filters
CLV	+5.6%	+11.0%
ROI	-5.9%	-3.0%
Gap	11.6pp	14.0pp

The model finds genuine edge. The filters concentrate it. But there's a 14 percentage point gap between "the model is right" (CLV) and "we make money" (ROI). Every signal we've tested — 107 hypotheses across two weeks — adds zero to the ROI side. The bottleneck is not signal selection. It's CLV-to-ROI conversion.

Sharp Markets Convert Better

The approval gate's informational tier analysis showed:

Odds Quality	N	ROI	CLV
Sharp	2,103	-1.2%	+11.3%
Medium	1,942	-5.6%	+11.1%
Soft	2,561	-2.4%	+11.1%

CLV is identical. But sharp-market leagues convert 2.4pp better than soft-market leagues. The edge is the same — the extraction rate differs. This is a capital allocation question, not a modeling question.

The Process Worked Exactly As Designed

We deployed two signals based on compelling standalone evidence. Then the approval gate caught them. This is the system working. Better to catch false positives at the gate than in the live P&L.

The old process would have left these deployed permanently. The new process caught them in hours.

What We Reverted

Both changes, fully rolled back:

Congestion filter re-enabled in ted-filters.ts. Matches with 3+ games in 8 days are filtered again.
AH line confidence adjustments removed from decision-table.ts. No line-specific confidence modifiers. AH line parsing removed from picks-engine.ts.

Both signals updated to "rejected" in the signal registry with the gate failure details.

New Opportunities Registered

The analysis wasn't wasted. Two new hypotheses emerged from the gate output:

1. Odds Quality Routing (`odds-quality-routing`)

Sharp leagues convert CLV to ROI at a 2.4pp higher rate than soft leagues, despite identical CLV. If we route full stake to sharp leagues and reduce stake on soft leagues, we might close part of the 14pp gap. This is a capital allocation signal, not an edge signal.

2. CLV-ROI Gap Structural Investigation (`clv-roi-gap-structural-investigation`)

The 14pp gap between +11% CLV and -3% ROI is the central question. Every signal we've tested adds zero marginal ROI. The bottleneck is conversion, not selection. Candidate root causes:

Systematic odds staleness between model solve and market close
Vig structure absorbing edge asymmetrically by line/market
Model overconfidence at specific probability ranges
Correlated same-day losses amplifying drawdowns

If we can close even 5pp of that 14pp gap, the portfolio becomes profitable. This is now the highest-priority investigation.

The Meta-Lesson

We went through four phases in 48 hours:

Discovery — mined rejections, found compelling patterns
Validation — Monte Carlo, walk-forward, settlement audit — all passed
Deployment — shipped two changes with evidence
Formal gate — 10-gate process killed both signals

The temptation is to feel like we wasted two days. We didn't. We learned that:

Standalone significance is not deployment significance. A signal can be real and still add nothing to the stack.
The approval gate is the only test that matters for deployment. Everything else is exploratory.
The real problem isn't finding edges. The model finds +11% CLV. The real problem is converting that edge to profit. That's where the next breakthrough will come from.

The process is painful. It killed two signals we were excited about. But it also prevented us from running a production system with changes that provably add zero value. That's exactly what it's for.

REJECTEDSignal: congestion-quality-boost|2026-03-19