Sports Dashboard

MI Bivariate Poisson + Dixon-Coles + Elo

← Back to Blog
|Research

Beating Understat's xG With a Simple Venue Correction

Understat doesn't adjust shot-level xG for home/away. A multiplicative correction (home ×0.934, away ×0.940) beats them on all 4 walk-forward folds. Edge is growing — home overprediction getting worse as HFA declines post-COVID. Set pieces worst: home set-piece xG overpredicted by 12.8%.

The Question

Understat's xG model doesn't adjust for home vs away. A shot from 18 yards at home gets the same xG as the identical shot away. But home and away conversion rates differ systematically. Can we beat Understat by simply correcting for venue?

What We Found

Yes. A simple multiplicative correction beats Understat's raw xG on every walk-forward fold.

YearUnderstat BrierVenue-Corrected BrierDelta
2021/220.071850.07168**+0.00017 (WIN)**
2022/230.071880.07158**+0.00030 (WIN)**
2023/240.071960.07154**+0.00043 (WIN)**
2024/250.074520.07389**+0.00063 (WIN)**

4/4 walk-forward wins. The edge is growing — Understat's overprediction is getting worse each year as home-field advantage declines post-COVID.

Correction factors (learned from 2020-2022, validated on 2023-2024):

  • Home shots: multiply Understat xG by 0.934 (overpredicts by 6.6%)
  • Away shots: multiply by 0.940 (overpredicts by 6.0%)
  • Home set pieces: multiply by 0.872 (overpredicts by 12.8%!)

The Nuance

Per-league factors are unstable (SD 0.05-0.07 across years) except for the Premier League (SD 0.029). Global venue correction is safer than per-league.

Venue calibration becomes redundant when `is_home` is in the XGBoost model. When we retrained our model with is_home as a feature, venue calibration on top made Brier WORSE (+0.00031). The model already learned the venue effect. Stacking both double-corrects.

Venue calibration should ONLY be applied to raw Understat xG — not to our own model's predictions.

What This Means

The venue calibration is deployed as part of the context-calibration engine (lib/xg-model/context-calibration.ts) and the venue-calibration JSON (data/xg-model/venue-calibration.json). It's used for display/context on Understat xG values, not for the variance filter.

What's Next

The context-calibration engine has 7 adjustment types (venue, GK, squad, regime, CB absence, freeze-frame, set-piece). All are on gauntlet shadow. Venue is the only one with proven out-of-sample improvement over Understat.