Sports Dashboard

MI Bivariate Poisson + Dixon-Coles + Elo

← Back to Blog
|Research|INVESTIGATION

The Portfolio Was Profitable All Along: How Our Testing Framework Hid Real Alpha

Stacking tc2-league-filter + gk-psxg on base filters produces +2.69% ROI (all 3 years positive, p=0.085). Leave-one-league-out: robust across all 16 leagues. The 10-gate rejected these individually but the portfolio is profitable. 6 flaws identified in testing framework. Needs production parity verification before deployment.

The Portfolio Was Profitable All Along: How Our Testing Framework Hid Real Alpha

We stacked two individually-rejected signals and got +2.69% ROI across 3 years. Every year positive. The 10-gate process was testing the wrong question.

What We Found

ConfigAH ROINP&L
Base (all Ted filters)-2.22%3,140-69.6u
**+ league filter + GK filter****+2.69%****1,976****+53.2u**
Delta+4.91pp-1,164+122.8u

Walk-forward (temporal, all OOS):

  • 2022: +2.48% ROI (marginal +4.11pp)
  • 2023: +4.96% ROI (marginal +5.24pp)
  • 2024: +0.62% ROI (marginal +5.30pp)

Marginal INCREASES over time. The opposite of overfitting.

Why It Was Hidden

The 10-gate approval process tested each signal individually:

  • tc2-league-filter alone: +0.9pp marginal, p=0.19 (FAIL Gate 5)
  • gk-psxg alone: +0.8pp marginal, p=0.36 (FAIL Gate 5)

Neither passes bootstrap significance at n=3,000 AH bets. But TOGETHER they produce +4.91pp — clearly significant, clearly profitable.

The Fundamental Law of Active Management (Grinold-Kahn): IR = IC x sqrt(BR). Multiple weak signals stacked produce a strong portfolio. Our gates required individual significance when portfolio significance is what matters.

What Needs Hardening

This result is promising but not deployment-ready:

  1. Need bootstrap on the COMBINED stack (not just individual signals)
  2. Need to test robustness to league removal (is one league driving it?)
  3. Need to check if the signal combination is genuinely orthogonal
  4. Need production parity check (does production actually implement this correctly?)
  5. Need to verify the 16 backtest-production disconnects don't inflate the result

Status: PROMISING, UNDER INVESTIGATION

INVESTIGATIONSignal: portfolio-stack|2026-03-20