Four More Signals Just Failed Revalidation
Four signals failed under the corrected protocol: defensive-overperf, gameweek timing, fixture congestion AH shrinkage, and DGF motivation filter. All select 87,210/87,210 matches at minEdge=0. Zero marginal ROI. The existing filter stack already captures all four phenomena.
Four more signals just failed revalidation under our corrected testing protocol. All had been deployed or queued based on pre-correction backtests. Under honest isolation (minEdge=0), all three select nearly every match and contribute zero marginal value.
The Four Casualties
1. defensive-overperf-regression-trigger
Original claim: When a team's cumulative defensive overperformance exceeds 10 goals (xGA - goals conceded), regression is near-certain. Bet aggressively against them. Ted: "I honestly can't remember a defensive overperformance like this since Leicester won the Premier League."
Original test (2026-03-17): N=232, CLV=+11.2%, +1.3pp delta at threshold=10. Monotonic CLV increase across thresholds. Accepted as sizing signal.
Corrected test (2026-03-19): Standalone at minEdge=0: N=87,210 (selects ALL matches). Marginal ROI: +0.0%. Base and without-signal produce identical results: 6,606 bets, -3.0% ROI, -195.9u. The signal changes nothing.
Why it failed: The original N=232 came from the intersection of this signal with the pre-existing 7% edge threshold + odds cap + Ted filters. Those 232 bets were the same 232 bets that would have been selected without this signal. The "monotonic CLV increase" was real but already captured by the variance-regression filter.
2. timing-gameweek-clv
Original claim: CLV varies by gameweek — mid-season (GW 11-29) shows +15.1% ROI while late season (GW 30+) shows -1.3%. Betting more aggressively mid-season improves ROI.
Original test (2026-03-14): N=241, ROI=+15.1%, confirmed massive late-season collapse.
Corrected test (2026-03-19): Standalone at minEdge=0: N=87,210 (selects ALL matches). Marginal ROI: +0.0%. Identical to defensive-overperf — the signal is non-selective and redundant.
Why it failed: The late-season penalty is already handled by the regime system's season-phase classification. The signal was deployed as a GW filter, but the regime system applies confidence penalties to late-season bets that achieve the same effect. Adding this signal on top changes nothing.
The Pattern
Both signals share the same failure mode:
- Pre-filter artifact: The old
runStandaloneSignal()kept the 7% edge threshold active, so "standalone" actually meant "this signal + edge filter + odds cap." N=232 looked selective; N=87,210 reveals it isn't.
- Redundancy: Both signals describe real phenomena (defensive overperformance regresses, late-season ROI collapses) but the existing filter stack already captures them through different mechanisms (variance filter, regime system).
- N inflation: At minEdge=0, both signals select essentially all matches because their conditions are broadly true — most teams have some defensive variance, and most gameweeks fall in some phase. Without the edge threshold doing the actual filtering, the signals are noise.
3. fixture-congestion-ah-margin-shrinkage
Original claim: Fixture congestion (3+ games in 8 days) reduces AH margin coverage for elite teams without reducing 1X2 win probability. Teams in European competition win at the same rate but by smaller margins due to fatigue and rotation.
Corrected test (2026-03-19): Standalone at minEdge=0: N=87,210 (selects ALL matches). Marginal ROI: +0.0%. Identical base and without-signal results.
Why it failed: The existing congestion-filter in the Ted filter stack already handles fixture congestion. The AH-specific margin shrinkage theory isn't wrong — elite teams probably do cover smaller margins when congested — but the existing filter already removes those matches from the bet pool. Adding a second congestion signal on top changes nothing.
4. dgf-motivation-filter
Original claim: Teams that have mathematically secured their position (safe from relegation, can't reach playoffs) in the final 8 GWs are systematically underpriced to bet against. This targets the mechanism behind late-season collapse: opponent motivation asymmetry, not just gameweek number.
Corrected test (2026-03-19): Standalone at minEdge=0: N=87,210 (selects ALL matches). Marginal ROI: +0.0%.
Why it failed: The motivation system — relegation-direction-filter and one-coasting-skip — already handles DGF teams through the regime decision table. The regime system classifies motivation context (high-stakes, one-coasting, both-coasting, relegation-involved) and adjusts confidence accordingly. Adding an explicit DGF filter on top changes nothing.
What This Means
We now have 4 signals that were "accepted" pre-correction and have been retested:
- defensive-overperf-regression-trigger → REJECTED (zero marginal)
- timing-gameweek-clv → REJECTED (zero marginal)
- promoted-team-penalty → REJECTED (p=0.46, insignificant)
- More to come as we work through the backlog
The corrected protocol is doing its job. The 10-gate approval process exists precisely for this — catching signals that look good in isolation but add nothing to the deployed stack. Every rejection makes the system more honest about where value actually comes from.
The Corrected Protocol
For reference, the key change that exposed these:
Before (buggy): runStandaloneSignal() kept minEdge=7% active. Standalone N was small (100-300), making every signal look selective and effective.
After (fixed): runStandaloneSignal() sets minEdge=0, maxOdds=99, skipEarly=0. True isolation. If a signal selects 87,000 of 87,000 matches, it's not a signal — it's a description of reality.
The marginal test (base ROI minus without-signal ROI) is the deployment decision. If removing the signal doesn't change anything, the signal isn't doing anything.