Phase 3: The Signals Still Aren't Enough
Re-ran top 2 signals through 10-gate approval on the new market-only baseline. Both failed again (6/10 gates). The base got better (-3.3% to -2.6%) but signal marginals unchanged (+0.9pp, +0.4pp). Walk-forward 1/4 and 0/4 folds. Individual Layer 3 signals can't close the remaining -2.2% gap.
Phase 3: The Signals Still Aren't Enough
We improved the engine. Base AH ROI went from -3.3% to -2.2%. Then we re-ran the top two signals through the 10-gate approval process. Both failed again. The market-only config raised the floor, but the signals didn't get stronger.
The Question
The "Fix the Engine" playbook had three phases:
- Parameter sweep (RFB/decay) -- DONE, baseline wins
- Loss weight sweep -- DONE, market-only deployed (+0.79pp)
- Re-test top signals on improved base -- THIS POST
The hypothesis: with base ROI at -2.2% instead of -3.3%, signals adding +0.9pp might now be enough to pass walk-forward validation (which requires positive marginal in 2/3 of season folds).
What We Found
Both signals failed again. 6/10 gates each.
tc2-league-filter (exclude segunda, la-liga, ligue-2)
| Gate | Result | Detail |
|---|---|---|
| 1. Pre-registered | PASS | |
| 2. True standalone | PASS | N=78,125, ROI=-5.4% |
| 3. Minimum N | PASS | 78,125 >= 1,000 |
| 4. Marginal ROI > 0 | PASS | +0.9pp |
| 5. Bootstrap p < 0.10 | FAIL | p=0.19 |
| 6. IS/OOS gap < 3pp | FAIL | 3.9pp gap |
| 7. Regime stratification | PASS | No opposite-sign regimes |
| 8. Suspicious N | FAIL | 14 signals similar N |
| 9. Practical significance | PASS | +0.9pp > 0.5pp |
| 10. Walk-forward | FAIL | 1/4 folds positive |
tc2-home-ah-rescue (home AH only vs bottom-quarter opponents)
| Gate | Result | Detail |
|---|---|---|
| 4. Marginal ROI > 0 | PASS | +0.4pp |
| 5. Bootstrap p < 0.10 | FAIL | p=0.32 |
| 9. Practical significance | FAIL | +0.4pp < 0.5pp |
| 10. Walk-forward | FAIL | 0/4 folds positive |
The Nuance
The Base Got Better, The Signals Didn't
| Metric | Old Baseline | Market-Only |
|---|---|---|
| Base AH ROI | -3.28% | -2.60% |
| tc2-league-filter marginal | +0.91pp | +0.90pp |
| tc2-home-ah-rescue marginal | +0.43pp | +0.40pp |
The marginal contributions are essentially unchanged. The signals filter the same bets regardless of the underlying solver config. The improvement came from the base, not the signals.
Walk-Forward Is the Binding Constraint
Walk-forward requires positive marginal in 2/3 of season folds. With the market-only base, the folds look like:
| Year | Base ROI | With tc2-league-filter | Marginal |
|---|---|---|---|
| 2022 | -2.6% | -2.5% | -0.1pp FAIL |
| 2023 | -0.5% | -0.5% | -0.0pp OK |
| 2024 | -5.7% | -5.7% | +0.0pp FAIL |
| 2025 | -1.8% | -1.8% | -0.0pp FAIL |
The marginal is tiny in every fold -- sometimes positive, sometimes negative. There's no consistent directional effect across time periods.
Infrastructure Fix: Snapshot Loading Bug
During this phase, we discovered that loadSnapshots() in data-loader.ts was loading ALL cached solver snapshots regardless of config hash. With dozens of sweep configs cached, the loader was picking a random mix of old (outcome=0.3/xg=0.2) and new (market-only) snapshots.
Fixed by adding a configHashFilter parameter and computing per-league hashes based on the production config (accounting for per-league ouWeight tiers).
What This Means
The "Fix the Engine" playbook is complete:
- Phase 1 (RFB/decay): baseline optimal, no change
- Phase 2 (loss weights): market-only deployed, +0.79pp, +118u
- Phase 3 (re-test signals): both top signals still fail
The remaining -2.2% AH ROI gap cannot be closed by individual signal filters. The path forward is structural:
- Quarter-line routing (+4.3% vs whole-line -7.8%, already validated in regime search)
- Edge-weighted sizing (+1.5pp OOS)
- Max-edge cap (reversed hypothesis from today's wrong-direction finding)
- Or accepting that -2.2% is the model's structural floor and focusing on live execution quality
What's Next
The playbook's three phases are done. Time for a new playbook focused on the structural opportunities identified during this session's regime search.