mathsportsdata

Statistics Lesson: Using Premier League Data to Teach Probability and Expected Goals

ttheanswers

2026-02-28

9 min read

Turn Premier League and FPL data into a 3-session math lesson on probability, xG and predictive modeling — ready-to-run for 2026 classrooms.

Hook: Turn students' love of the Premier League into a math advantage

Teachers and tutors: tired of abstract probability problems that feel disconnected from real life? Students: bored of contrived word problems that don’t relate to your favourite teams or FPL picks? This lesson plan uses real Premier League and FPL statistics to teach probability, expected goals (xG), and basic predictive modeling — giving learners hands-on experience with modern sports analytics while meeting curriculum standards.

The big idea — why this matters in 2026

By 2026, sports analytics has moved from specialist journals into mainstream coaching, media and classroom practice. Teachers can leverage freely available and expanded datasets (public APIs, FBref, open xG feeds) released during late 2024–2025 to give students realistic data to explore. AI tools and simplified libraries now let middle and high school learners simulate matches and build simple models without heavy coding. Using Premier League and FPL data provides motivation, modern context and practical skills in probability, statistics and computational thinking.

"Using sports data in math class increases engagement and builds transferable data-literacy skills students need in a data-driven world." — classroom case studies, 2025–26

Learning objectives

Understand empirical probabilities from frequency data (goals per game, scoring rates).
Explain and compute expected goals (xG) for players and teams.
Model goal counts using the Poisson distribution and compute match outcome probabilities.
Run simple Monte Carlo simulations to predict match results and estimate uncertainty.
Interpret model results for FPL decisions and explain limitations (data quality, injuries, context).

Required materials and data sources

Class computers or tablets with Google Sheets or a Python environment (Google Colab recommended).
Datasets: small curated CSV of recent Premier League fixtures with columns: Date, HomeTeam, AwayTeam, HomeGoals, AwayGoals, Home_xG, Away_xG, HomeShots, AwayShots. (Teachers: prepare one CSV per pair of teams or let students pull live FPL stats.)
Optional: access to FPL site or a simple FPL exports plugin for lineups and ownership percentages for engagement.
Calculator for Poisson exercises or built-in functions in Sheets: =EXP(-lambda)*lambda^k/FACT(k).

Lesson plan overview (3 sessions, 45–60 minutes each)

Use three class sessions. Each is modular and can be extended or compressed based on the class level.

Session 1 — Probability fundamentals using FPL and scoring data

Time: 45–60 minutes. Goal: connect relative frequency with simple probabilities.

Warm-up (5–10 min): Ask students to predict the chance their favourite striker scores in the next match. Record guesses.
Data collection (10–15 min): Give each group a small dataset: five recent games for a chosen striker (minutes played, goals). Compute empirical probability: P(score in next match) ≈ (number of matches with at least one goal) / (total matches played).
Activity (15–20 min): Compare raw scoring probability to FPL point outcomes (e.g., probability of scoring 2+ points). Discuss sample size issues and confidence: a striker with 3 goals in 5 games looks hot but may be noisy.
Reflection (5–10 min): Students write a 3-sentence summary: how reliable is the striker’s scoring probability? What additional data would improve the estimate?

Session 2 — Introducing expected goals (xG) and converting xG to probabilities

Time: 45–60 minutes. Goal: compute and interpret xG, and model single-match scoring probabilities with Poisson.

Explain xG (10 min): xG scores each shot based on location, assist type and other shot features — it’s the probability that a given shot becomes a goal. Team xG for a match is the sum of its shots' xG values. Emphasize that xG measures chance quality, not the exact number of goals.
Class example (15 min): Provide a match with team xG values: Home_xG = 1.8, Away_xG = 1.2. Ask: what’s the probability the home team scores exactly 0, 1, 2, 3 goals? Introduce the Poisson pmf:

Poisson formula: P(K = k) = e^-λ * λ^k / k! where λ is the expected goals (xG) for the team.

Worked calculation (10–15 min): For Home_xG = 1.8, P(0) = e^-1.8 ≈ 0.165; P(1) = e^-1.8*1.8 ≈ 0.297; P(2) ≈ 0.267; P(3) ≈ 0.16. (Show how these are computed in Google Sheets with =EXP(-1.8)*1.8^k/FACT(k)).
Match outcome probabilities (10 min): Multiply independent probabilities for each team to build the full probability matrix (home goals vs away goals). Sum cells where home goals > away goals to get P(home win), etc. Discuss independence assumptions and alternatives (correlated models).

Session 3 — Predictive modeling and Monte Carlo simulation

Time: 60 minutes. Goal: run simulations, compare models and interpret results for FPL choices.

Set up expected goals per team (10 min): Use current season per-90 attack & defense xG rates to adjust baseline. Example method: team_expected = (team_attack_xG_per90 + opponent_defense_xG_per90) / 2 * (match minutes/90). Explain heuristics here — this is a classroom approximation.

Monte Carlo simulation (25–30 min): Guide students through a simple simulation in Google Colab (Python) or Google Sheets. Pseudocode (Python):

import numpy as np
N = 10000
home_goals = np.random.poisson(lam=home_xg, size=N)
away_goals = np.random.poisson(lam=away_xg, size=N)
home_win_prob = np.mean(home_goals > away_goals)
draw_prob = np.mean(home_goals == away_goals)
away_win_prob = 1 - home_win_prob - draw_prob

In Sheets you can simulate with =POISSON.DIST(k, lambda, FALSE) or run repeated RAND() draws coupled with inverse CDF sampling for small N.

Interpretation (10–15 min): Compare the Monte Carlo probabilities to the Poisson matrix method and to betting/FPL implied odds. Discuss sources of error: lineup changes, injuries, short-term form (use BBC-style team news updates to incorporate injury effects), and the role of variance in small-sample events.

Concrete classroom-ready examples

Example A — From xG to match probability (step-by-step)

Data: Home_xG = 1.8, Away_xG = 1.2.
Compute P_home(k) for k = 0..5 using Poisson pmf.
Compute P_away(k) similarly.
Construct 6x6 matrix M where M[i,j] = P_home(i)*P_away(j).
P(Home win) = sum of M[i,j] where i > j. P(Draw) = sum where i = j.

Result (approx): P(Home win) ≈ 44%, P(Draw) ≈ 28%, P(Away win) ≈ 28%. Use student calculations to show how favorites are not certainties and how xG drives nuance.

Example B — FPL use case: who to captain?

Given two midfielders: Player A with an average xG per 90 of 0.25 on a team expected to produce 2 goals, Player B with xG per 90 of 0.18 on a team expected to produce 3 goals. Teaching point: expected fantasy points combine individual xG, assist-xG (xA), minutes and bonus likelihood. Show a simple scoring expectation model:

Expected fantasy points ≈ minutes_fraction * (4*xG + 3*xA + set_piece_bonus + appearance_points)

Run sensitivity: if Player B plays 90 mins but Player A plays 60, who is the better captain? Encourage students to explain assumptions and show numeric comparisons.

Assessment tasks and sample answers

Compute the probability a team with xG = 2.1 scores at least 2 goals. (Answer: 1 - P(0) - P(1); calculate using Poisson.)
Using a provided 10-game dataset, estimate P(a striker scores in a game) and a 95% confidence interval. (Answer: use binomial proportion CI.)
Simulate a match 10000 times with home_xG=1.3 and away_xG=1.7. Which team wins more often? (Answer: away likely slight favorite; compute in class.)

Extensions for advanced students

Build a logistic regression to predict match outcomes using features: team xG, form (last 5 matches), injury-adjusted minutes lost, and home advantage. Use scikit-learn in Colab.
Introduce bivariate Poisson or Dixon-Coles corrections to account for low-scoring correlation between teams.
Use time-decayed xG (recent matches weighted more heavily) and test predictive performance on a reserved test set.

Classroom management, ethics and data literacy

Sports analytics can inspire gambling behaviour. Make clear that the lesson’s goal is statistical literacy, not betting advice. Discuss data provenance — where xG comes from, measurement error, and the difference between model predictions and certainty. Use BBC-style team news (injuries, suspensions) to teach how qualitative information updates quantitative models. Emphasize transparency: students should document assumptions and share code or spreadsheets for reproducibility.

Practical tips for teachers (quick wins)

Start with familiar teams and players to boost engagement (allow students to pick teams for mini-projects).
Use Google Colab notebooks pre-populated with data and clear cells for students to edit.
Provide scaffolded worksheets: arithmetic-first, then Poisson, then simulation.
Leverage live team news to practice updating predictions — e.g., if a key striker is out, reduce expected goals and rerun simulations.
Assess with project-based rubrics: clarity of assumptions (30%), correctness of calculations (40%), interpretation (30%).

2026 trends and predictions teachers should note

Recent years have shown three clear trends that affect classroom use of sports data:

Data accessibility: by late 2025 more community datasets and educational APIs made xG and event data easier to access for classrooms.
AI-assisted analytics: 2025–26 has seen mainstream educational tools embedding model-generation helpers (automated code templates, explainable model outputs) that reduce the friction of teaching predictive modeling.
Curriculum integration: more exam boards and curricula are accepting project-based assessments where sports datasets are allowed to demonstrate statistical competence.

Prediction for schools: in 2026, sports analytics will be a common real-world application used to teach statistics, probability and data science — especially where student motivation matters.

Common pitfalls and how to avoid them

Avoid overstating certainty: always show confidence intervals and simulation variability.
Watch for small sample bias: early-season xG averages can be misleading — teach students to check sample sizes.
Address data bias: some xG models differ by provider; be transparent which model (StatsBomb, FBref, Opta-derived) you're using.

Ready-to-run resources (teacher checklist)

Prepared CSV with 10–20 matches per team and xG columns.
Google Colab notebook with Monte Carlo cell and plotting (histograms of simulated scores).
Google Sheets template with Poisson formula cells for quick classroom computation.
Worksheet: three problems with increasing difficulty and model-interpretation prompts.

Actionable takeaway

Start small: pick a single Premier League fixture this weekend, extract the teams’ recent xG/90 and form, compute Poisson-based outcome probabilities in a 20-minute class. Use the result to spark discussion on model limits: injuries, weather and red cards — all of which provide real teachable moments about uncertainty and model updating.

Call to action

Want the full lesson pack (datasets, Google Colab notebook, worksheets and rubric)? Download the free teacher kit and get weekly updates on new Premier League datasets and classroom-ready activities. Use sports to teach stats — your students will thank you.

theanswers

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Up Next

Advanced Strategies for a Resilient Answers Pipeline in 2026: Ops, Trust Signals, and Human-in-the-Loop Workflows

community•8 min read

The Evolution of Expert Communities in 2026: Hybrid Moderation, Contextual AI, and New Revenue Paths