March 20, 2026|System Update|DEPLOYED

The Shadow Model: Proving Improvements Before Deploying Them

Shadow model v1 launched alongside production. Contains market-only solver + DC rho correction (validated +1.25pp on backtest). Portfolio stack shows +2.92% ROI (p=0.064, all years positive). Shadow must prove itself on 100+ live bets before graduating to production.

The Shadow Model: Proving Improvements Before Deploying Them

We built three validated improvements but haven't deployed any to production. Instead, we created a shadow model that runs alongside production on the same matches. When it proves itself on live data, it graduates.

Why a Shadow Model

This session exposed a pattern: we kept finding improvements in the backtest that didn't match what production actually does. The variance filter used fake data (constant 1.35). We missed 269 FootyStats files. We assumed regime skip couldn't work when the data was right there. We changed backtest defaults without touching the live solver.

The lesson: don't trust the backtest to represent production. Run both and compare.

What the Shadow Contains

Two rigorously validated solver improvements that production doesn't have:

Change	Backtest Evidence	Live Status
market-only (outcomeWeight=0, xgWeight=0)	Dev +1.05pp (p=0.097), holdout +0.45pp	NOT in production solver
Dixon-Coles rho=+0.05	Dev +1.60pp (p=0.025), holdout +0.80pp	NOT in production solver

Production still uses outcomeWeight=0.3, xgWeight=0.2, no DC rho. The paper trading's +29.7% ROI (58 bets) was on the OLD model.

The Portfolio Stack Discovery

When we stacked tc2-league-filter + gk-psxg-opponent-filter on the improved base:

+2.92% ROI (1,954 AH bets, +57.1u P&L)
3/3 years positive (marginals +4.24 to +5.21pp)
23/23 leagues survive leave-one-out
Bootstrap p=0.064 (significant at 10%, not 5%)

This was hidden by the 10-gate process testing signals individually. The Fundamental Law of Active Management says: stack weak signals. We had weak signals all along — the test was wrong.

What Happens Next

The shadow model generates picks on the same matches as production. Both log bets. Both settle against real outcomes. After 100+ settled bets:

Shadow CLV > Production CLV → directionally better
Shadow ROI ≥ Production ROI → makes at least as much money
No league-level regression → doesn't break anything

When all three pass, shadow replaces production.

The Honest State

Metric	Production (live)	Shadow (backtest)
Solver	outcome=0.3, xg=0.2	outcome=0, xg=0
DC rho	None	+0.05
Regime skip	Yes (live HFA)	Yes (Fotmob cached)
GK adjust	Feature-flagged	Enabled
AH ROI (backtest)	~-2.98%	-1.89% (base), +2.92% (with stack)

The shadow is better on backtest. Whether it's better on live data is what the comparison will prove.