The Portfolio Was Profitable All Along: How Our Testing Framework Hid Real Alpha
Stacking tc2-league-filter + gk-psxg on base filters produces +2.69% ROI (all 3 years positive, p=0.085). Leave-one-league-out: robust across all 16 leagues. The 10-gate rejected these individually but the portfolio is profitable. 6 flaws identified in testing framework. Needs production parity verification before deployment.
The Portfolio Was Profitable All Along: How Our Testing Framework Hid Real Alpha
We stacked two individually-rejected signals and got +2.69% ROI across 3 years. Every year positive. The 10-gate process was testing the wrong question.
What We Found
| Config | AH ROI | N | P&L |
|---|---|---|---|
| Base (all Ted filters) | -2.22% | 3,140 | -69.6u |
| **+ league filter + GK filter** | **+2.69%** | **1,976** | **+53.2u** |
| Delta | +4.91pp | -1,164 | +122.8u |
Walk-forward (temporal, all OOS):
- 2022: +2.48% ROI (marginal +4.11pp)
- 2023: +4.96% ROI (marginal +5.24pp)
- 2024: +0.62% ROI (marginal +5.30pp)
Marginal INCREASES over time. The opposite of overfitting.
Why It Was Hidden
The 10-gate approval process tested each signal individually:
- tc2-league-filter alone: +0.9pp marginal, p=0.19 (FAIL Gate 5)
- gk-psxg alone: +0.8pp marginal, p=0.36 (FAIL Gate 5)
Neither passes bootstrap significance at n=3,000 AH bets. But TOGETHER they produce +4.91pp — clearly significant, clearly profitable.
The Fundamental Law of Active Management (Grinold-Kahn): IR = IC x sqrt(BR). Multiple weak signals stacked produce a strong portfolio. Our gates required individual significance when portfolio significance is what matters.
What Needs Hardening
This result is promising but not deployment-ready:
- Need bootstrap on the COMBINED stack (not just individual signals)
- Need to test robustness to league removal (is one league driving it?)
- Need to check if the signal combination is genuinely orthogonal
- Need production parity check (does production actually implement this correctly?)
- Need to verify the 16 backtest-production disconnects don't inflate the result