Football

Assignment

For Fall 2025 the Football lab will be used as a potential Final Project.

Before submission of your report, you should be sure to review the Project Policy page.

On Canvas, be sure to submit both your source .ipynb file and a rendered .html version of the report.

  • Your report is due on Tuesday, December 16.

Background

American football is a team sport played between two teams of eleven players each on a rectangular field with an “end zone” at each end. The game is divided into four quarters, with the objective of advancing the ball into the opponent’s end zone to score points via a touchdown or field goal.

Play is organized into discrete segments called “downs,” with each team given four attempts (downs) to advance the ball at least ten yards. If successful, they are awarded a new set of downs. If they fail, possession of the ball is given to the opposing team. Each down begins with a “snap” from the center, and the team with the ball (offense) attempts to advance while the other team (defense) tries to prevent them from doing so. The game emphasizes strategy, with each play being a carefully planned and executed attempt to gain yards or disrupt the opponent’s progress.

“Going for it on fourth down” involves a high-risk gamble where a team opts to attempt to advance the ball rather than punting or kicking a field goal. Success grants a new set of downs and a chance to score, but failure hands over possession to the opponent (a turnover) with favorable field position, potentially leading to a quick score. This decision can shift game momentum and is influenced by factors like field position and game context.

In recent history, more and more teams are being aggressive and choosing to take risks on fourth down.

Specifically, in a fourth down situation, the offensive coordinator, a coach that specializes in real-time game decisions for the offense, often called play calling, has three choices:

  • Punt
    • Punting is in some sense a surrender, giving up, and purposefully giving possession to the opposing team. However, this can be strategically advantageous because if successful, it will place the opposing team far from the end zone, significantly reducing their chances of scoring.
  • Kick a field goal
    • Kicking a field goal is a low-risk, low-reward proposition. With favorable field position, kicking a field goal provides a high probability of scoring some points (three) but forfeits the chance to score a touchdown (at least six points).
  • Go for it
    • “Going for it” on fourth down is a risky decision that uses the last of the four downs in an attempt to move the ball the remaining yards needed to achieve a first down (retaining the ability to score a touchdown) or simply score a touchdown. Should the offense not advance the ball the required number of yards, they turnover the ball to the opposing team, in a much better position than had they punted.

The game situation, including the field position, time remaining, and score, are all considered when making decisions like this.

Ever wanted to watch an hour-long video about the sadness of punting in football? Probably not, but you should seriously consider it.

To punt is to give up, and in the 21st century, NFL teams have given up nearly 50,000 times. Most of those punts were reasonable decisions. But a few were so cowardly, and in such defiance of all reason, that they must not be forgotten. In this episode of Chart Party, it’s our mission to find them.

In this work of art, Jon Bois creates a bespoke statistic, the surrender index, in an attempt to find the saddest, most cowardly punt of the 21st century. If you are not a football fan, or even if you are, you might think this sounds ridiculous. But grab a snack, sit down, enjoy the smooth jazz, and let Jon use his unique brand of data journalism to tell a fascinating story.

If you are a football fan, Jon has some additional work that you’ll enjoy:

For much of football history, decisions such as these were largely made by gut instinct. In the modern game, football organizations employ data science teams that use data to inform in-game decisions.

Modern NFL teams are likely to have a sophisticated win probability model that evaluates and guides each in-game decision, expressed as the effect on the probability to win the game. In this lab, you will develop a model that assesses the risk-reward of “going for it” on fourth down.

Diagram showing potential decisions (and their estimated probabilities) that affect an NFL team's probability of winning a game given the game situation.

The above diagram, from an article on using so-called “next-gen” statistics to inform game decisions such as attempting to convert on fourth-down, shows potential decisions (and importantly their estimated probabilities) that affect an NFL team’s probability of winning a game given the game situation. This illustrates that individual models, like a model to predict converting on fourth down, are part of a larger system, which gives context.

Additional information:

Scenario and Goal

Who are you?

  • You are a data scientist working for the front office of an NFL team.

What is your task?

  • You are tasked with creating a model that estimates the probability that an attempt to convert a fourth down is successful, given game-state information such as yards-to-go, yards-to-goal, and the type of play (run or pass) considered. This model will be used within a larger system that allows the offensive coordinator to evaluate play-calling decisions and how they affect the overall probability of winning the game.

Who are you writing for?

  • To summarize your work, you will write a report for your manager, who reports to the offensive coordinator. You can assume your manager is very familiar with football strategy, and is reasonably familiar with machine learning.

Data

To achieve the goal of this lab, we will need data about many previous fourth-down attempts in the NFL. The necessary data is provided in the following files:

Source

The data used in this lab was acquired using the nflreadpy package. This package sources data from the nflverse-data repository. The nflreadpy package has an R analog, nflreadr, which contains a searchable data dictionary as a part of its documentation for play-by-play data.

We are providing a modified version of this data for this lab.

Data Dictionary

Each observation in the data contains information about a fourth-down conversion attempt in the NFL.

The variable descriptions listed below are available in the Markdown file variable-descriptions.md for ease of inclusion in reports.

Variable Descriptions

Response

converted

  • [object] result of fourth-down conversion attempt. One of No or Yes.
Features

togo

  • [float64] distance in yards from either the first down marker or the end zone in goal down situations. Distance needed to successfully convert the fourth-down attempt.

yardline

  • [float64] distance in yards from the opponent’s end zone. Distance needed to score a touchdown.

play_type

  • [object] type of play. One of Pass or Run. Pass plays include sacks. Run plays include scrambles.

posteam

  • [object] the abbreviation for the team with possession of the ball.

defteam

  • [object] the abbreviation for the team on defense.

Data in Python

To load the data in Python, use:

import pandas as pd
football = pd.read_parquet(
    "https://lab.cs307.org/football/data/football.parquet",
)

Tips and Tricks

You have access to a game_date variable, but you should not use this variable as a feature. Instead, consider it as a guide for data splitting. In particular, you should consider creating a test set based on the most recent year of data, and train on previous years.

The two team variables can be tricky to encode. Consider a TargetEncoder.

If you find this lab interesting, consider participating in the NFL Big Data Bowl!

Back to top