March 19, 2026|Research|REJECTED

The Solver Was Already Right: Why Tuning Form Weights Made Things Worse

We tested 5 configurations of recentFormBoost (1.5-3.0) and decayRate (0.005-0.015) to track within-season collapses faster. Every config was worse or flat vs baseline. RFB increase costs -1.10pp, decay alone is noise. Also found a bug: --decay-rate was never passed to data-prep. The solver already correctly weights form data.

Configs Tested

2D parameter sweep

Best Delta

+0.06pp

noise (p=0.84)

AH Bets

8,568

10 dev leagues

Verdict

Baseline Wins

rfb=1.5, d=0.005

The Solver Was Already Right: Why Tuning Form Weights Made Things Worse

We tested 5 parameter configurations to see if the MI Bivariate Poisson model could track within-season team collapses better. It couldn't. The current parameters are near-optimal, and along the way we found a bug that had been silently breaking every prior decay rate test.

The Question

The model has +6.2% CLV (it sees edge correctly) but -3.3% AH ROI (it can't convert). One theory: recentFormBoost=1.5 and decayRate=0.005 are too conservative. Teams like Rangers, Barcelona, and Sevilla collapse mid-season, but the solver keeps weighting their early-season performances equally. If we boost recent form weighting and/or decay old data faster, the solver should catch collapses sooner.

This theory came from a genuine finding: the calibration gap (CalGap) correlates with ROI at r=-0.922 across leagues, and within-season collapses are the primary driver.

What We Found

Every configuration was worse or flat. The current parameters are the local optimum.

Config	AH ROI	vs Baseline
Baseline (rfb=1.5, d=0.005)	-3.29%	--
rfb=2.5	-4.39%	-1.10pp
rfb=2.5, d=0.010	-4.02%	-0.73pp
rfb=3.0, d=0.010	-3.96%	-0.67pp
rfb=1.5, d=0.010	-3.23%	+0.06pp
rfb=1.5, d=0.015	-4.39%	-1.10pp

The gradient is clear:

RFB increase: -1.10pp -- boosting recent form weight is strongly negative
Decay alone: +0.06pp -- faster forgetting is noise, not signal
Aggressive decay (0.015): -1.10pp -- forgetting too much history destroys the model

The Nuance

Why Does More Form Weighting Hurt?

The intuition "weight recent form more" assumes the market isn't already doing this. But Pinnacle odds -- which are our training target -- already incorporate recent form. When we boost recentFormBoost, we're double-counting a signal the market already prices. The solver over-adjusts to recent results, creates false confidence in form-based ratings, and generates more bets that the market has already correctly priced.

Per-League Results Tell the Story

For the best-performing config (decay=0.010 alone):

League	Baseline	d=0.010	Delta
serie-b	-7.95%	-3.06%	+4.89pp
serie-a	-4.90%	-3.48%	+1.42pp
epl	+1.26%	+2.84%	+1.57pp
belgian-pro	-3.09%	-6.78%	-3.70pp
turkish-super	-2.82%	-5.50%	-2.68pp
eredivisie	+1.89%	+1.28%	-0.61pp

5 leagues improve, 5 worsen. No systematic pattern -- it's redistributing edge, not creating it.

The Sign Test

We checked how many league x year cells (30 total) each config beat baseline in:

rfb=2.5: 11/30 (37%) -- worse in nearly 2/3 of cells
rfb=2.5, d=0.010: 12/30 (40%)
rfb=3.0, d=0.010: 13/30 (43%)

None cleared the 50% threshold, let alone the 60% elimination bar.

What Didn't Work (and What We Actually Fixed)

The Bug

While running the sweep, we discovered that --decay-rate had been a no-op since it was first implemented. The CLI flag was parsed into the config object and used for cache key generation, but was never actually passed to prepareMarketMatches() -- the function that applies time-decay weighting to training matches.

The smoking gun: rfb-2.5-d10 (before the fix) produced byte-identical results to rfb-2.5 (same RFB, default decay). The config hash was different, so the solver cache was correctly invalidated, but the underlying data had identical decay weights.

Fix: Two lines of code in backtest-v2.ts and backtest-worker.ts, adding decayRate: baseConfig.decayRate to the prepareMarketMatches() options.

This means every prior experiment that used --decay-rate was actually running with the default 0.005. Any conclusions drawn from those experiments about decay rate effects were measuring RFB effects only.

The Methodology

We used a syndicate-style approach:

Stratified dev/holdout split: 10 dev leagues (8,568 AH bets) / 9 holdout leagues (6,597 AH bets)
Sequential elimination: 3 strategic configs in Phase 1, gradient-directed refinement in Phase 2
Bootstrap paired difference: block bootstrap on matchday-level ROI differences (10K resamples)
Sign test: per league x year cell comparison vs baseline

The holdout set was never needed -- no config survived elimination to reach validation.

What This Means

The model's form tracking is already correct. The solver, trained on Pinnacle closing odds, already captures form changes at the right rate. Attempting to amplify or accelerate this process double-counts what the market already knows.

The CLV-to-ROI gap is not a form-tracking problem. The -3.3% ROI doesn't come from stale ratings on collapsed teams. It comes from somewhere else -- likely structural calibration issues, line-specific biases, or market microstructure.

Infrastructure for future sweeps exists. sweep-rfb-decay.sh and analyze-sweep.ts are reusable for any parameter optimization with proper dev/holdout split, elimination rules, and statistical validation.

What's Next

With RFB/decay ruled out, the remaining paths to closing the CLV-to-ROI gap are:

Quarter-line regime: the exhaustive regime search found quarter-lines at +4.3% vs whole-lines at -7.8% (12pp spread). Line-type routing may be more productive than parameter tuning.
Edge-weighted sizing: weighting bets by edge magnitude showed +1.5pp ROI improvement OOS in shadow mode.
Calibration shrinkage: shrinking model probabilities toward market consensus (x0.4) reduced CalGap from 6.1pp to 1.5pp with ROI improvement.
Team-specific volatility: instead of a global form boost, model per-team rating variance to flag unstable teams.

The engine is correctly calibrated for form. Time to look at the chassis.

REJECTEDSignal: rfb-decay-sweep|2026-03-19