Changelog

Signal verdicts, model updates, backtest results, and research findings.

April 2026

model-updateApr 23ACCEPTEDauto

xG Weight A/B Test: +3.99pp OVERS CLV at xgWeight=0.2

Walk-forward backtest across 26 leagues confirms xgWeight=0.2 beats control on every market. OVERS +3.99pp, UNDERS +3.31pp, SIDES +1.18pp. Effect is monotonic — no plateau. Deploying to production.

Signal TestApr 17PARTIALLY-ACCEPTED

The Squad-Strength Signal Extends to Bundesliga 2 and League One (But Not 7 Other Leagues)

Walk-forward extension test: bundesliga-2 and league-one pass (direction-correct IS and OOS, magnitudes comparable to validated Big-5 leagues). Seven other leagues fail — serie-b, ligue-2, league-two, eredivisie, belgian-pro, portuguese-liga, scottish-prem. The signal extends along English + German pyramid tiers but not cross-country.

Signal TestApr 17REJECTED

The Variance Filter Doesn't Care What Window You Use

Lowering VARIANCE_LOOKBACK from 10 to 5/6/7/8 produced identical CLV (+10.0-10.1%) and fewer bets, not more. The 3.0-goal threshold is coupled to window length — shrinking the window tightens the filter, doesn't relax it. Second rejection in the variance-tuning space after attack-defense-asymmetry. The 22 early-season bets this was meant to unlock need a two-parameter fix (window + scaled threshold), not this one.

ResearchApr 16ACCEPTED

Raising the Edge Floor from 7% to 12%: Why It Works Now (and Didn't Before)

Higher minEdge monotonically improves entryROI AND CLV across 13 walk-forward folds on the factorial base combo. minEdge=0.12 gives +4.91pp entryROI and +4.24pp CLV over production. Previous tests (March 26) found no effect — the difference is the factorial combo changes which bets appear at each edge level.

Signal TestApr 15REJECTED

Testing Shot-Level xG as a Variance Filter Input: Why One Season Wasn't Enough

Sofascore v3 shot-level xG scored 368K shots across 24 leagues, but the A/B test against baseline showed zero marginal impact (+0.0% CLV, +0.1% entry-adj ROI). Root cause: only 1 of 12 backtest seasons affected. Infrastructure stays; retest after 2+ seasons.

Post-MortemApr 14

The Cloud Lab Guardian (and the seven reasons we needed one)

A week-long cloud-lab outage traced to one 5-line shell script bug (LAST=0 in the idle-shutdown check, reproduced live twice). Full arc: the false SSH-key diagnosis, the real bug, six stacked infrastructure issues (including fire-and-poll shipped as 5 serial hotfixes to live traffic), the rebuild in three phases, a version-controlled priority-task roadmap with a submit-time validator and a 60-second auto-advance ticker — then a red-team battery against the guardian that caught a critical capitalization bypass (target="Cloud-Lab" skipped the validator entirely). 22/22 tests pass after the bypass fix.

ResearchApr 14

From Binary Filters to Stake Sizing: Finding the Right Shape for Enrichment Data

Multi-source xG enrichment data failed as binary filters (4 approaches, all negative). Reframing as continuous stake sizing produced +0.5-0.6% sizing lift, improved Sharpe, and passed 3/3 walk-forward folds. The data was always informative — we were just using it wrong.

ResearchApr 14

The Signal Pipeline Was Killing Real Alpha

Rewrote the 10-gate signal-approval pipeline into 6 data-driven gates. Dropped hardcoded N≥1000, +0.5pp practical-significance floor, p<0.10 bootstrap, 3pp interleave tolerance, and 1% suspicious-N dedup. Dry-run confirmed 0 live signals regress and 9 previously-rejected signals would be unblocked. Ran end-to-end on two of them: inter-model-disagreement failed Gate 2 (CI width too wide — a sharper rejection than the old pipeline managed), contextXg passed 6/6 (previously killed on the old +0.5pp floor). The reformed pipeline is live; pod-shop math fix and bake-off come next.

ResearchApr 14auto

Where Aging Footballers Go to Decline

Tracked 249 players who migrated from Big 5 leagues at age 30+ to lower divisions. 85% declined, average xG/90 dropped 38%. Finishers survive the drop; creators don't. Registering squad-age-creative-concentration as a pod-shop signal.

ResearchApr 13REJECTED

The Market IS the Prior: Why Elo and Marcel Can't Beat Pinnacle

Tested three Bayesian priors for early-season predictions. Elo warm-start made predictions WORSE (+0.004 Brier). Marcel early-prior confirmed at 0.0pp marginal. Player finishing xG calibration shows -0.00375 Brier but needs walk-forward. Key insight: the solver already fits to Pinnacle market odds, which IS the best prior. Path A (solver priors) is closed. Path C (shot-level xG) remains open.

SystemApr 13FIX

Stop Patching Symptoms: The Team Name Mapping Anti-Pattern

Eight settlement failures in 5 weeks, all from the same root cause: a hardcoded 100-entry team map sitting next to a verified 588-entry map that was never loaded. Fixed by loading the verified map at runtime and auto-generating it to 1,257 entries across 40+ leagues.

ResearchApr 13

Full Factorial Signal Decomposition: 2,048 Combinations Tested

Ran every possible on/off combination of 11 signals (2^11 = 2,048 configs). Best portfolio: regime + crossBtts + leagueExcl + finishingLuck at +10.4% entry ROI. Four signals confirmed dead or harmful. Biggest surprise: leagueExcl and contextXg destructively interfere — pick one, not both.

Signal TestApr 12REJECTEDauto

Signal marcel-early-prior: Rejected

Auto-registered by test-signal.ts. Did not pass approval gates.

infrastructureApr 12FIXauto

Deploy Pipeline Fix: When Your Status Light Lies

Every GH Actions deploy for sports-dashboard had been marked failed for weeks — 15+ consecutive red runs, false Discord ROLLBACK alerts, all while production ran fine. Root cause: the verify loop filtered by coolify.applicationId=UUID but that label is an integer ID, so every poll returned empty. The rollback was a silent no-op on top of the broken polling.

ResearchApr 12auto

Do Old Players Actually Decline? What Our Data Says

Scaled from 789 to 9,050 player-seasons (12 years, Wikidata birth dates). The aging signal is real — 11% decline from peak to 35 — but still doesn't improve predictions. Marcel's recency weighting already handles decline implicitly. Hardcoded Caley curves are definitively the worst option.

ResearchApr 12

When xG Models Disagree, We Should Pay Attention

Pointed four independent xG models at 3,282 matches. Multi-source consensus produces 91.2% regression rate. The strongest signal (91.5%) came from decomposing overperformance into shot quality vs finishing luck — when both are high, regression is near-certain.

ResearchApr 12

BTTS Workstream: What We Built, What We Learned, What's Next

Two weeks building a complete BTTS pipeline — shot-level xG model, dual-grid calibration, five per-match adjustment layers, 16K-bet backtest. The model finds +4.47% entry ROI against soft books but agrees with Pinnacle. The blocker isn't the model — it's execution venue.

infrastructureApr 12FIX

Solver Cache: Incremental Season Loading

Solver was loading all 16 seasons (6,000 matches, 35GB) into memory. Now loads incrementally per solve date — peak memory drops from 35GB to 500MB.

infrastructureApr 12FIX

Cloud Lab: Fire-and-Poll Architecture

Rebuilt cloud-lab job dispatch from 12-hour SSH pipes to 1-second fire-and-poll. Jobs survive SSH drops, server restarts, and network hiccups. Added master orchestrator, watchdogs, and solver OOM fix.

researchApr 12INVESTIGATION

Farm-Out Sprint: 18 Specs, 7 Signals Tested, 1 Promoted to Shadow

Five-day sprint across 3 workstreams. inter-model-disagreement scores 9/10 gates (+1.1% marginal) — promoted to shadow. /pod dashboard rebuilt with real alpha metrics. v3 regression test confirms noisy xG wins. Full 2048-combo factorial synced.

Signal TestApr 11REJECTEDauto

Signal overperformance-decomposition: Rejected

Decomposing overperformance into shot_quality (shotXG - matchXG) vs finishing_luck (actual - shotXG) predicts regression magnitude. Teams with high finishing_luck component regress faster than teams where overperformance is explained by shot quality.. no-signal

Signal TestApr 11REJECTEDauto

Signal inter-model-disagreement: Rejected

When v3 69-feature model and FotMob match-level xG disagree significantly about a team's rolling performance, it indicates unusual match characteristics that create prediction uncertainty. High |disagreement| matches are harder to predict — skip or size down.. not-significant

Signal TestApr 11REJECTEDauto

Signal marcel-early-prior: Rejected

Auto-registered by test-signal.ts. Did not pass approval gates.

Signal TestApr 11REJECTEDauto

Signal layered-threshold-variance: Rejected

Instead of one binary threshold, use per-source thresholds (fotmob 3.0, matchXg 2.5, v3 2.0). When 2+ sources independently flag regression, confidence is higher. Level 3 (all flag) should have highest per-bet ROI.. not-significant

Signal TestApr 11REJECTEDauto

Signal finishingLuck: Rejected

Auto-registered by test-signal.ts. no-signal

Signal TestApr 11REJECTEDauto

Signal contextXg: Rejected

Context-adjusted xG (venue, GK, squad, regime corrections) improves lambda estimation, producing more accurate edges and better bet selection. Did not pass approval gates.

SystemApr 11auto

Auto-Publish System Live

Blog posts are now auto-generated when signals are approved/rejected, models re-solve, or backtest results change.

Signal TestApr 11REJECTED

Testing Multi-Source xG Disagreement: 5 Signals, 0 Survivors

Batch-tested 5 multi-source xG disagreement signals through 10-gate approval. inter-model-disagreement (9/10 gates, +1.1pp marginal) is the most promising but fails bootstrap (p=0.24). layered-threshold (+0.3pp) too small. overperformance-decomposition broken by v3 data coverage. footystats-all and finishingLuck show zero marginal. v3 as variance filter replacement: identical results to MI lambdas.

Signal TestApr 7REJECTED

Better xG, Worse Bets: Why Shot-Level Accuracy Doesn't Help

A/B tested FotMob shot-level xG (82.1% regression) vs FootyStats match-level (75.6%) for non-Big-5 variance filter. Shot xG removed 524 profitable bets (entry-adj ROI +9.9%) and added 219 bad ones (ROI -9.5%). Net: -0.2pp ROI, -17u P&L. Better regression accuracy doesn't equal better bet selection.

Model ArchitectureApr 7ACCEPTED

Training an xG Model on 632K Shots: Does More Data Mean Better Regression Detection?

Retrained XGBoost xG model on 632K combined shots (3 sources, 22 leagues). V3 hits 89.4% regression rate (target >82.1%), beating v1 (Big 5 only) by +1.3pp. FotMob raw xG still wins at 90.8% — noisier xG is better for regression detection. Advanced metrics (npxG, xGA, PPDA, xG Chain) computed for all leagues.

Signal TestApr 7REJECTED

Testing Set-Piece Mismatch: The Filter That Was Really Measuring Data Coverage

Tested whether selecting matches with high backed-team SP xG AND high opponent SP xGA would sharpen AH edge. Passed 8/10 gates with 12/12 walk-forward folds positive, but threshold sweep (0.30→0.60) revealed the filter is a data-availability proxy — selectivity comes from whether shot data exists, not from mismatch magnitude. Marginal +0.8pp below +1pp spec, p=0.19.

Signal TestApr 7REJECTED

Shot-Level xG for Variance Regression: More Data, Same Nothing

Re-tested shot-level xG variance signal with 3.5x more data (3,489 matches, 3-4 seasons). Marginal ROI unchanged at +0.1% (bootstrap p=0.47). Shot-level and match-level xG produce equivalent regression candidates at 10-match window granularity. Hypothesis falsified: data volume was not the bottleneck.

Signal TestApr 7REJECTED

Testing Finishing Persistence: More Data, Same Dead End

Re-tested the finishing persistence signal after backfilling FotMob shots to 19 leagues (5,144 matches, 127K shots). Player count cleaned to 2,040 (dedup), split-half r improved to 0.411. Affected bets doubled (899 vs 427) but marginal ROI halved (+0.2pp, below +0.5pp gate). The finishing effect is real but a binary filter can't capture it — needs continuous lambda adjustment.

Signal TestApr 2

BTTS Signals Don't Help: The Problem Is Market Vig, Not Bet Selection

Tested variance filtering, GK changes, and HFA regime on 14,387 BTTS bets (+9.93% CLV, -0.67% ROI). None improve ROI. The edge is real but BTTS market vig eats it. Fix is execution (league selection, Pinnacle/exchange), not better signals.

System UpdateApr 2

When Three Data Sources Die in One Session: Building Pipeline Resilience

FotMob API died (404), CDN blocked (403), Sofascore blocked from server. Fixed GK PSxG via CDN underscore format, built Sofascore warm standby (270K shots in Supabase), added Discord alerting. Every critical data need now has 2+ sources except GK PSxG.

ResearchApr 1SHADOW

The xG Calibration That Didn't Matter (And the Switch That Did)

Multi-feature calibrated FootyStats xG (corr 0.35→0.51) has zero impact on variance regression betting outcomes. But switching variance filter from model lambdas to match-level xG gives +0.3-0.4% ROI lift. The real win: FS non-Big-5 shows +37.7u marginal — FootyStats xG creates value through finishing luck in soft markets, not through better regression detection.

Signal TestApr 1REJECTED

Cross-Market Surprise: The Strongest Signal That Couldn't Filter

BTTS and O/U specialist markets diverge from our Poisson model by +8.4pp and +12.3pp (both p<0.0001, 41K OOS). But when wired as binary filters on AH/O/U bets, marginal ROI is +0.1% (BTTS) and +0.2% (O/U) — both fail bootstrap significance. The signal predicts match texture, not who wins. Pivoting to direct BTTS value betting where the hit-rate edge translates directly to profit.

Model ArchitectureApr 1REJECTED

Testing is_home as an XGBoost Feature: Why the Model Already Knew

Added is_home to XGBoost feature set to learn venue x shot-type interactions. Walk-forward on 312K shots: Model A Brier 0.0787 (baseline 0.0785, delta +0.0002). Model B unchanged. The post-hoc venue calibration already captures the venue signal. XGBoost finds no additional home/away interaction effects at the individual shot level.

ResearchApr 1

Penalties Are Random: 486 Team-Seasons Prove Mean Reversion

Penalty frequency is heavily mean-reverting (corr=0.145). High-penalty teams drop 51% in the second half of the season. No table-position effect — penalties are luck-driven for all teams. Always use npxG.

ResearchApr 1

Attack = Defense for Regression, Except Set Pieces (71.9%)

Attack and defense overperformance regress at identical rates (68.3% vs 68.2%). The thesis that defense is 'luckier' is wrong. But set-piece defense regresses at 71.9% — the strongest single regression signal found. Now on gauntlet shadow with 12/12 walk-forward.

Signal TestApr 1

Testing 4 Market Mispricing Signals: Pinnacle Is Faster Than You Think

Tested manager bounce, promoted team arc, international breaks, and congestion×depth. Only promoted team arc confirmed (p=0.0001) but league-specific. Manager bounce rejected — Pinnacle reprices in 1-2 matches. International breaks: zero effect.

System UpdateApr 1

320K Shots Across 20 Leagues: The FotMob Page Scraping Breakthrough

FotMob pages embed shot x,y data in __NEXT_DATA__ for all leagues. Scraped 320K shots, 20 leagues, 3-4 seasons. Shot-level xG achieves 82.1% regression rate (matching Understat Big 5). Server cron running daily. The data quality gap for non-Big-5 is closed.

ResearchApr 1

Beating Understat's xG With a Simple Venue Correction

Understat doesn't adjust shot-level xG for home/away. A multiplicative correction (home ×0.934, away ×0.940) beats them on all 4 walk-forward folds. Edge is growing — home overprediction getting worse as HFA declines post-COVID. Set pieces worst: home set-piece xG overpredicted by 12.8%.

ResearchApr 1

FootyStats xG Is 3x Worse Than Shots on Target: The Data Quality Crisis We Fixed

FootyStats xG has corr=0.35 with goals — beaten by SoT×0.32 (corr=0.56) in ALL 20 non-Big-5 leagues. But for regression detection, noise is a feature: FootyStats xG (62.7% regression) beats our precise aggregate model (58%). The variance filter needs process, not outcome.

March 2026

InfrastructureMar 28DEPLOYED

The Infrastructure Overhaul: Tests, Correct Metrics, and a Better Solver

48-hour infrastructure overhaul after discovering entry-adjusted ROI was +5.1%. Built 125-test suite from scratch (settlement, Poisson math, devigging, sizing, bootstrap). Wired entry-adjusted ROI through all 10 approval gates. Graduated shadow solver (market-only + Dixon-Coles rho). Re-evaluated 7 signals, approved tc2-league-filter (+1.2% marginal). Kelly sizing research: quarter Kelly optimal, full Kelly catastrophic. Zero tests to full CI coverage in two days.

ResearchMar 27ACCEPTED

The Backtest Was Wrong: How We Found +5.1% ROI Hiding in Plain Sight

The backtest used closing odds, but we bet before close. CLV +5.3% means entry is ~8pp better than closing. AH entry-adjusted ROI: +5.1% (p=0.000, 7,190 OOS bets, 3/3 seasons stable). Two edge sources: entry timing (+8pp) and soft-book premium (+3pp). Calibration tax (-4.1pp) is the biggest lever — validates the solver research roadmap.

ResearchMar 26REJECTED

Testing Edge Floors: Why Raising the Minimum Doesn't Help

Bucketed 138K bets into 5 edge brackets and swept 25 floor thresholds. Higher edge does NOT predict higher ROI — the 3-5% bracket (-4.5%) outperforms 12%+ (-8.7%). DSR=0.000. The CLV→ROI gap is structural (calibration + market structure), not driven by edge size. AH near breakeven at all edges; 1X2/OU25 lose everywhere.

ResearchMar 26REJECTED

Same-Match Stacking: The Risk That Wasn't

Analyzed within-match outcome correlation across 30K matches. Same-match bets are ANTI-correlated (-0.153) — they hedge, not concentrate. 56% of pairs split (one wins, one loses). All 3 capping policies worsen ROI. The 40-60u match swings in live trading are a stake-sizing artifact, not structural correlation risk.

Signal TestMar 25ACCEPTED

Finding the Right Edge Threshold for Soft Markets

Swept 7 edge thresholds across 14 expansion leagues using BetExplorer AH odds. 7% + exclude -0.25 line maximizes OOS profit: +166u vs +136u current (+22% more). Deployed for all Tier 2/3 leagues.

Signal TestMar 23ACCEPTED

New Markets: BTTS and OU25 Side-Select — Adding 750 Bets/Year

Tested BTTS and OU25 per-league × per-side × per-season. 38 combinations tested, 3 stable: Segunda BTTS Yes (+7.5%), Bundesliga 2 OU25 Under (+2.0%), Belgian Pro OU25 Under (+8.8%). Year-over-year stability check caught EPL BTTS Yes and Championship BTTS No as unstable (1/3 years). ~360 additional bets/year deployed.

ResearchMar 22ACCEPTED

The Bayesian Kitchen Sink: 5 Techniques to Fix Over/Under

5 Bayesian techniques tested on 14,283 OOS matches. T5 (Model Averaging) wins: Brier 0.24882 vs MAP 0.25106, ROI -4.5% vs -5.2% (+0.7pp). Rebalances Over/Under split. T2 (Lambda3 uncertainty) second best. T3 (uncertain overdispersion) worst. Deploy T5 for OU25 at 0.25x sizing.

ResearchMar 20INVESTIGATION

Decomposing Every Source of Edge: What's Actually Making Money

Systematic leave-one-out analysis of every production feature. maxEdge=15% cap is +0.99pp (biggest improvement available). minEdge=7% is optimal. Variance filter slightly hurts. Defiance filter genuinely helps. CLV bug found and fixed (1X2 proxy inflated AH CLV by ~11pp). ROI (+29.69%) is real.

System UpdateMar 20DEPLOYED

The Shadow Model: Proving Improvements Before Deploying Them

Shadow model v1 launched alongside production. Contains market-only solver + DC rho correction (validated +1.25pp on backtest). Portfolio stack shows +2.92% ROI (p=0.064, all years positive). Shadow must prove itself on 100+ live bets before graduating to production.

ResearchMar 20INVESTIGATION

The Portfolio Was Profitable All Along: How Our Testing Framework Hid Real Alpha

Stacking tc2-league-filter + gk-psxg on base filters produces +2.69% ROI (all 3 years positive, p=0.085). Leave-one-league-out: robust across all 16 leagues. The 10-gate rejected these individually but the portfolio is profitable. 6 flaws identified in testing framework. Needs production parity verification before deployment.

ResearchMar 20INVESTIGATION

The Variance Filter Was Using Fake Data: How We Found a Bug in Ted's Core Signal

The variance filter compared goals to a constant 1.35 instead of real xG. Real xG improves dev (+0.99pp) but not holdout (+0.03pp). Disabling the filter is a coin flip. Ted's thesis needs a richer implementation — not a binary filter, but a multi-factor xG regression score. The data exists (269 match-xG files) but isn't being used properly.

Model ArchitectureMar 19ACCEPTED

The Grid Was Wrong About Draws: How Dixon-Coles Fixed Our Biggest Blind Spot

Applied Dixon-Coles tau correction (rho=+0.05) to the Bivariate Poisson score grid. Dev +1.60pp (p=0.025), holdout +0.80pp, production -2.98% to -1.72%. The grid overestimated 0-0/1-1 for AH markets. Combined with market-only: +1.25pp total, 189u saved.

System UpdateMar 19DEPLOYED

How the Model Actually Works: A Plain-English Guide

A plain-English walkthrough of the entire system: the MI Bivariate Poisson model (the engine), production filters (the transmission), and experimental signals (the steering). What's working (+7% CLV), what isn't (-2.2% ROI), and why the gap exists.

ResearchMar 19INVESTIGATION

From +29.7% ROI to -2.2%: The Journey from Illusion to Understanding

Paper trading showed +29.7% ROI at 58 bets. Rigorous backtesting showed -2.98% at 15,165 bets. Today we moved it to -2.19% by removing match results from the solver. The CLV was always real (+7%). The ROI was a lucky streak. Three sweeps, one winner, three bugs fixed, and a model that's 27% less negative.

Signal TestMar 19REJECTED

Phase 3: The Signals Still Aren't Enough

Re-ran top 2 signals through 10-gate approval on the new market-only baseline. Both failed again (6/10 gates). The base got better (-3.3% to -2.6%) but signal marginals unchanged (+0.9pp, +0.4pp). Walk-forward 1/4 and 0/4 folds. Individual Layer 3 signals can't close the remaining -2.2% gap.

Model ArchitectureMar 19ACCEPTED

Trust the Market, Not the Scoreboard: How Removing Match Data Made the Model Better

We removed outcome prediction and xG fitting from the solver loss function. Validated on holdout (+0.45pp, 3/3 years) and production (26 leagues, +0.79pp, +118u saved, 12/19 improve). The solver produces better calibrated probabilities when fitting ONLY to Pinnacle odds. First structural model change to improve AH ROI.

ResearchMar 19WRONG-DIRECTION

The Model's Best Bets Are Its Worst: Why Edge Shrinkage Failed

We tested edge shrinkage and minimum edge thresholds to filter overconfident bets. Every threshold made ROI worse -- monotonically. minEdge=10% costs -3.20pp vs baseline. The model's largest edges are its most overconfident. Wrong-direction discovery: max-edge-cap registered as reversed hypothesis.

ResearchMar 19REJECTED

The Solver Was Already Right: Why Tuning Form Weights Made Things Worse

We tested 5 configurations of recentFormBoost (1.5-3.0) and decayRate (0.005-0.015) to track within-season collapses faster. Every config was worse or flat vs baseline. RFB increase costs -1.10pp, decay alone is noise. Also found a bug: --decay-rate was never passed to data-prep. The solver already correctly weights form data.

StrategyMar 19INVESTIGATION

Fix the Engine, Then Paint the Car: Why We're Pausing Signal Testing

9/9 signals rejected through the 10-gate process. The gates aren't broken — they're correctly telling us the -3% base ROI can't be fixed by Layer 3 filters. CalGap (r=-0.922) points to the solver reacting too slowly to mid-season team collapses. We're sweeping recentFormBoost × decayRate to fix the engine before painting the car.

Signal TestMar 19REJECTED

Testing Manager Change Collapse: The Solver Handles Chaos Better Than Expected

Skip bets within 10 matches of a mid-season manager change? Marginal ROI = -0.1% — filter hurts. 877 changes loaded via Fotmob (529 mid-season). The solver reads Pinnacle odds which already price manager changes. Combined with the xG window test, manager changes are definitively priced. Case closed.

Signal TestMar 19REJECTED

Testing GK PSxG+/-: The Signal That Markets Already Priced In

Explored whether opponent goalkeeper quality (PSxG+/-) predicts AH outcomes. Expanded GK data from 4 to 22 leagues. Exploration found a +15.8pp ROI spread, but formal 10-gate approval rejected: marginal ROI +0.4pp (p=0.36), walk-forward decayed from +1.4% in 2023 to -9.0% in 2025. Markets appear to have adapted.

Signal TestMar 19REJECTED

Testing Manager Change Windows: Why the Rolling Lookback Already Handles It

We tested whether truncating xG histories at mid-season manager changes improves variance regression. Built the infrastructure, loaded 877 changes across 101 teams, ran the 10-gate approval. Result: zero marginal ROI. The 10-match rolling lookback already ages out old-manager data naturally. The system was self-correcting all along.

Signal TestMar 19REJECTED

Testing Line Movement: Why Capture Signals Can't Work in a Closing-Line Model

Filtering bets where Pinnacle moved ≥3pp toward our selection removes only 127/6,606 bets (1.9%) with zero marginal ROI. The model uses closing odds — line movement is already in the CLV. Shelved without full gate. This plus 7 other capture signal failures confirm: the CLV→ROI gap is calibration, not execution quality.

Signal TestMar 19REJECTED

Testing Sufferball Routing: Why Our Strongest Signal Failed the Gate

Our original strongest finding — Under 2.5 vs sufferball teams at +10.83% CLV, +7.63% ROI — was retested through the corrected 10-gate pipeline. 5/10 gates passed. Marginal ROI = -0.2pp (signal hurts the stack). Original N=262 was a pre-filtered artifact. Walk-forward: 2024 -4.2%, 2025 -8.6%. Shelved.

Post-MortemMar 19REJECTED

Four More Signals Just Failed Revalidation

Four signals failed under the corrected protocol: defensive-overperf, gameweek timing, fixture congestion AH shrinkage, and DGF motivation filter. All select 87,210/87,210 matches at minEdge=0. Zero marginal ROI. The existing filter stack already captures all four phenomena.

Post-MortemMar 19REJECTED

The Gate Killed Our Ted Knutson Signals: Both Failed the 10-Gate Process

Two signals from 36 Ted Knutson transcripts looked like portfolio-savers in ad-hoc testing (+2.09pp and +2.24pp marginal ROI). Both failed the canonical 10-gate approval. League filter: p=0.22, IS/OOS sign flip, walk-forward fails 2024-2025. Home AH rescue: p=0.36, marginal ROI only +0.4pp, walk-forward collapses to -9.2% in 2025. The formal process caught what custom analysis missed: both signals overfit to historical data.

ResearchMar 19REJECTED

Promoted Teams Are Overrated — But Not by the Model

We tested whether filtering bets backing promoted teams improves ROI. 339 promoted teams, 26 leagues, 5 seasons. The effect exists (+0.1pp) but is statistically insignificant (p=0.46) and temporally unstable. The MI solver reads Pinnacle odds, and Pinnacle already prices promotion correctly. Signal #42 tested, signal #42 rejected.

InfrastructureMar 19DEPLOYED

Did We Beat the Closing Line? Now We Know.

Every bet now gets a post-settlement execution verdict: did we get a better price than closing? The bridge between 'model found edge' and 'we captured that edge.' Three new fields, zero new data sources — just connecting dots that were already there.

ProcessMar 19DEPLOYED

After the Bet Settles: Our Post-Paper-Trading Analysis Process

What happens after the model makes picks and real money is on the line. Daily settlement, 3-layer health checks, loss classification (5 categories), regime change detection, kill switches, and the feedback loop that turns every loss into a new alpha hypothesis.

ProcessMar 19DEPLOYED

Every Loss Is a Hypothesis: How We Turn Failures Into New Edge

Post-deployment monitoring that classifies every loss (variance, model error, stale input, regime shift, execution leak), detects when edges erode vs when you're just unlucky, and generates new alpha hypotheses from failure patterns. Kill switches, re-enable gates, and the virtuous cycle that makes the system smarter every time it loses.

ProcessMar 19DEPLOYED

The 4-Minute Signal Test: How We Explore Fast and Deploy Slow

The complete signal testing workflow — from hypothesis to deployment in 4 minutes. Register, explore, analyze, gate. Designed for parallel terminals. 10 automated approval gates including per-league matchday interleave OOS, walk-forward validation, and practical significance checks. Nothing reaches production without passing.

ResearchMar 19REJECTED

The Model Works, The Execution Doesn't: Why Capture Signals Failed and What That Tells Us

We tested 4 capture signals (vig-aware, line movement confirmation, Pinnacle vs market gap, AH line shift) against 6,986 matches with complete open/close/average odds. All 4 rejected. The CLV→ROI gap isn't about execution quality — it's about calibration. High-vig bets actually have HIGHER CLV. 94% of our bets are already on the sharp side. The path to profitability is Track 2 (model improvements), not Track 1 (better execution).

ResearchMar 19REJECTED

Mining 36 Ted Knutson Transcripts: 2 Signals That Flip the Portfolio From -3.28% to +0.97%

We read 36 Ted Knutson transcripts, extracted every betting edge, and ran 8 new signals through the full testing protocol. Two survived: a league portfolio filter (+2.09pp marginal ROI, p=0.033) and a home AH conditional rescue (+2.24pp, needs live validation). Combined, they turn a -128.7u portfolio into +19.9u. Six signals failed. One wrong-direction discovery (new managers outperform) opens a new investigation.

ProcessMar 19DEPLOYED

We Were Testing the Tests Wrong: The Corrected Signal Protocol

Our testing infrastructure had a bug that made every signal look better than it was. runStandaloneSignal() never removed the 7% edge threshold — so all 40 'accepted' signals were validated on a pre-filtered pool. The fix was 2 lines of code. The damage was 10 days of false confidence. Here's the corrected protocol with 7 new hard rules.

Model ArchitectureMar 19ACCEPTED

Teaching the Model to Count: How a Totals Signal Made Our Sides Bets 24% More Profitable

We added a single loss term to the solver — asking it to match the O/U 2.5 market — and AH profits jumped +90u (+24%). The O/U bets themselves are still unprofitable. The improvement came from better lambda estimates. The paradox: we failed to fix totals, but the attempt made sides dramatically better.

ResearchMar 19INVESTIGATION

The Model Works, The Money Doesn't: Our 26-League Meta-Analysis

We ran the full evaluation pipeline across 26 leagues — 29,977 matches, 12 signal configurations, proper IS/OOS split. CLV is +11% everywhere (model is genuinely good). ROI is negative everywhere OOS (execution eats the edge). The odds cap is the only filter that matters (+4.2pp marginal). And our first attempt at this analysis was completely wrong — here's what we learned from that too.

Post-MortemMar 19REJECTED

The Gate Killed Our Darlings: How Two 'Validated' Signals Failed the Formal Process

We found two signals worth ~700u, validated them with Monte Carlo and walk-forward, deployed them, then ran the 10-gate process. Both failed — then we discovered the gate had a bug (wasn't toggling signals). Fixed it, re-ran: congestion +0.3pp (p=0.36), AH lines -0.1pp (p=0.54). Still rejected. Right answer, wrong path to get there.

Post-MortemMar 19REJECTED

Your Rejected Experiments Are a Gold Mine: How We Found ~700u Hiding in Plain Sight

We mined 39 rejected experiments and found two signals worth ~700u. One was a filter actively removing our best bets. The other was dismissed because the hypothesis was backwards. Both survived Monte Carlo bootstrap and walk-forward validation — but later failed the 10-gate approval process. See follow-up: 'The Gate Killed Our Darlings.'

System UpdateMar 18ACCEPTED

From 19 to 26: Expanding to Seven New Leagues

We deployed the biggest league expansion since launch — seven new leagues across three continents, validated through walk-forward backtesting on 14,000+ matches.