Testing Set-Piece Mismatch: The Filter That Was Really Measuring Data Coverage
Tested whether selecting matches with high backed-team SP xG AND high opponent SP xGA would sharpen AH edge. Passed 8/10 gates with 12/12 walk-forward folds positive, but threshold sweep (0.30→0.60) revealed the filter is a data-availability proxy — selectivity comes from whether shot data exists, not from mismatch magnitude. Marginal +0.8pp below +1pp spec, p=0.19.
Testing Set-Piece Mismatch: The Filter That Was Really Measuring Data Coverage
Set pieces are the most persistent attacking skill in football. Caley's research shows R²=0.39 year-over-year — meaning teams that are good at set pieces stay good, and markets that blend set-piece production with open-play numbers should systematically misprice dead-ball specialists. We tested whether selecting matches where the backed team attacks well from set pieces AND the opponent defends poorly against them would sharpen our edge.
The short answer: the signal direction is correct, but the implementation is a data-availability proxy, not a quality filter.
The Question
If Team A generates 0.5 SP xG/match from corners and free kicks, and Team B concedes 0.6 SP xGA/match, that's a persistent mismatch the market may not fully price. Our hypothesis: filtering for these high-mismatch matches would improve marginal ROI by 1pp or more.
We built setPieceMismatchFilter as a dual-threshold REQUIRE filter on AH bets — backed team's rolling 10-match SP xG/match must exceed attackMin, opponent's rolling 10-match SP xGA/match must exceed defenseMin.
What We Found
The signal passed 8 of 10 approval gates, which sounds impressive until you look at which two it failed:
- Bootstrap significance: p=0.19 (need p<0.10). The marginal effect is +0.8pp entry-adjusted ROI, but the confidence interval spans [-1.0%, +2.6%] — includes zero.
- Suspicious N: Near-identical population to
setpiece-xga-regression(185,846 vs 185,664), despite supposedly measuring different things.
The walk-forward was remarkable: 12/12 season folds positive, ranging from +1.5% to +15.2% entry-adjusted. This would normally be a strong signal. But the threshold sweep told the real story.
The Threshold Sweep That Broke It
We ran the signal at three threshold levels:
| Thresholds | Bets (with filter) | Bets (without) | Marginal |
|---|---|---|---|
| 0.30/0.30 | 11,654 | 16,167 | +0.8% |
| 0.40/0.40 | 11,654 | 16,167 | +0.8% |
| 0.60/0.60 | 11,620 | 16,167 | +0.7% |
Going from 0.30 to 0.60 — doubling the threshold — removed exactly 34 bets out of 11,654. The thresholds are doing nothing.
Why: When a team has no shot-level data (no Understat coverage, no FotMob backfill), getSetPieceXGPerMatch() returns 0. Zero is always below any threshold, so the bet gets filtered out. For teams that DO have data, nearly all exceed 0.60 SP xG/match. The filter is binary: "does Understat/FotMob cover this league?" vs "doesn't."
The identical N to setpiece-xga-regression confirms it — both signals gate on the same thing: shot data availability.
What Didn't Work
We considered three paths after the initial 8/10 gate result:
- Tighten thresholds — tried 0.40 and 0.60. No effect, as shown above.
- Continuous mismatch score — rejected because if selectivity is "has data," a continuous score is just a noisy version of the same binary split.
- Shadow it — rejected because the mechanism is clearly data availability, not set-piece quality. Shadowing wouldn't teach us anything new.
What This Means
Not deployed. The signal is parked permanently.
The set-piece infrastructure built for this test (rolling SP xG/match, SP xGA/match, mismatch scores as enrichment flags) remains useful. The setpiece-xga-regression signal uses some of the same data. But the mismatch concept specifically requires data density we don't have — a 10-match rolling window contains roughly 3 total SP xG, which is insufficient for meaningful team-level differentiation.
What's Next
The walk-forward consistency (12/12 positive) is a hint that *something* correlates with SP data availability and better betting outcomes — likely that leagues with Understat coverage also have sharper Pinnacle lines, better odds quality, and more efficient closing prices. That's a selection effect, not a causal signal. Not actionable, but worth noting when interpreting other signals that depend on shot-level data.