Calculate Expected Value in Python

Calculate basketball expected values using Python.

Simpon's Paradox

The idea for this tutorial was inspired by a tweet from Sean Carroll discussing Simpon's Paradox in relation to NBA shot percentages, which leads into expected value.

In the NBA and most levels of basketball, regular shot attempts are worth either 2 points or 3 points if the player is behind the 3 point line. These shots are categorized as field goals. Field goal percentage (FG%) is determined by the average of both 2 point and 3 point attempts.

As of this writing, Steph Curry is shooting 50% 2 point FG and 37.6% 3 point FG while Russell Westbrook is 47.4% and 30.1% respectively. Without following the NBA, the assumption might be Curry is shooting a higher overall FG%. However, Westbrook has a higher FG% than Curry despite worse 2pt and 3pt percentages. How?

The answer is a result of Simpson's paradox (a subject worth its own in-depth tutorial). Simpson's paradox is a statistical effect that occurs when the sub-groups of a data set are imbalanced. In this case, the sub-groups are 2pt FG attempts and 3pt FG attempts.

Percentages do not tell the whole story. The distribution of field goal attempts is the missing variable. 63% of Curry's field goal attempts are from 3pt compared to 22% for Westbrook. Naturally a 3pt shot is more difficult, since it is a further distance from the basketball hoop, so 3pt percentages are lower.

Nonetheless, the goal in basketball is not necessarily to shoot the highest possible FG%. The objective is to score more points.

Expected Value

A key statistical measure in basketball, as well as many other domains, is expected value. Expected value is essentially a weighted average. In the context of basketball, given a shot attempt value (2 points or 3 points) and shot make percentage what is the expected amount of points scored on average.

If a player is successful on exactly 50% of their 2pt field goal attempts on average, their expected value is 1 point for a 2pt shot. The formula is pretty simple: expected value = shot attempt value * shot make percentage.

For the purposes of this tutorial, we'll use the publicly available data from Basketball Reference for the 2021-2022 season to calculate expected value (approximately 70% of the season is complete which offers a suitable sample size). While Russell Westbrook is shooting a higher FG% this season, Steph Curry generates a greater expected value from his overall shot attempts.

On a side note, models used by NBA teams to calculate expected value include more data than static percentages from past results. On top of historical results, real-time player tracking enables predictive modeling for a given shot based on a variety of conditions, like spot on the court, space from a defender, time on the shot clock, etc. The percentages generated from the predictive models can be used as the input to determine the expected value of a field goal attempt.

Thus far into the season Curry is shooting 189/378 2pt and 242/643 3pt. Curry's 2pt percentage is conveniently 50%, like the aforementioned example. On the other hand, Westbrook is shooting 330/696 2pt and 58/193 3pt.

Calculating the expected value from 2pt and 3pt for both Curry and Westbrook is a straightforward exercise.

Note: I encourage using semantically meaningful variable names such as curry or westbrook rather than c and w, but in the code example below we'll use the latter for brevity.

# Steph Curry
c_2pt_makes = 189
c_2pt_attempts = 378
c_2pt_percentage = round((c_2pt_makes / c_2pt_attempts), 3)
c_2pt_exp_value = round((2 * c_2pt_percentage), 2)

c_3pt_makes = 242
c_3pt_attempts = 643
c_3pt_percentage = round((c_3pt_makes / c_3pt_attempts), 3)
c_3pt_exp_value = round((3 * c_3pt_percentage), 2)

# Russell Westbrook
w_2pt_makes = 330
w_2pt_attempts = 696
w_2pt_percentage = round((w_2pt_makes / w_2pt_attempts), 3)
w_2pt_exp_value = round((2 * w_2pt_percentage), 2)

w_3pt_makes = 58
w_3pt_attempts = 193
w_3pt_percentage = round((w_3pt_makes / w_3pt_attempts), 3)
w_3pt_exp_value = round((3 * w_3pt_percentage), 2)

print(f'{c_2pt_exp_value:.2f}')  # Curry 2pt Expected Value: 1.00
print(f'{c_3pt_exp_value:.2f}')  # Curry 2pt Expected Value: 1.13
print(f'{w_2pt_exp_value:.2f}')  # Westbrook 2pt Expected Value: 0.95
print(f'{w_3pt_exp_value:.2f}')  # Westbrook 3pt Expected Value: 0.90

Python Application

Furthermore, we want to find the average expected value of all field goal attempt from Curry and Westbrook. You can run the Python example application. The code is broken into several functions to handle segments of functionality and reduce code duplication. The application computes the 2pt, 3pt, and average expected values for Curry and Westbrook then outputs the results for display.