Trust the Process: Predicting Philly’s Success in the 2020–2021 NBA Season

David Cortes
6 min readJan 27, 2021

After the 2019–2020 NBA season, the Philadelphia 76ers weren’t exactly seen as a threat. They finished sixth in the Eastern Conference and were swept by the Boston Celtics in the first round of the playoffs. Their coaching staff and front office weren’t looking great either; the Sixers fired head coach Brett Brown right after they were eliminated from the playoffs. 76ers GM Elton Brand realized the franchise needed some big changes.

Fast forward a few months, and the Sixers are currently the first team in the east and fourth overall in the league. The team looks like a well-oiled machine, Joel Embiid seems like a legitimate MVP contender, and Ben Simmons was even able to make a three-pointer. So how was this previously troubled franchise able to do a 180 and find early success in the new NBA season? It all stems from their new front office.

The 76ers made a lot of moves during the offseason, but most of them were actually off the court. Not only did they hire Doc Rivers to be their new head coach, but they also added Peter Dinwiddie and Prosper Karangwa to their basketball operations executive leadership. However, the most notable move, and arguably the most important one, is bringing in Daryl Morey to be the new president of basketball operations.

Before joining Philadelphia, Morey was the Houston Rockets’ general manager for 13 years. During his tenure there, the Rockets were able to make eight consecutive appearances in the playoffs, which is the league’s longest streak. Morey’s analytics driven approach changed the way basketball is played in the NBA. It turns out that the most efficient spots to take a shot from are just beyond the three-point line and right under the basket. Morey’s play style, called “Moreyball”, takes advantage of these spots in order to maximize the number of points scored per possession. As a result, the Rockets broke the single-season record for most three-pointers made during the 2017–2018 NBA season.

Now that Morey has brought his talents to Philadelphia, how will his presence influence the way the Sixers play basketball? That’s exactly what I’m going to try to find out; I’m going to build a regression model with Houston Rockets data from the 2017–2018 season in order to predict how many three-point field goals the Philadelphia 76ers make in the 2020–2021 NBA season.

The first step of building this predictive model is gathering data. I’ll be scraping this data from Basketball-Reference.com, a website that has basketball statistics and history for the NBA, ABA, WNBA, and European leagues. For those who don’t know, web scraping is the process of collecting data from websites. I’ll begin by scraping the game logs for each player on the 2017–2018 Houston Rockets roster; this will be used as the training data. Then, I’ll scrape the game logs for each player on the 2020–2021 Philadelphia 76ers roster; this will be used as the testing data.

Luckily, there is a Python package called basketball_reference_scraper that streamlines the scraping process and aggregates statistics on NBA teams, seasons, players, and games from Basketball-Reference.com. Using this package, I’ll create two dataframes: one containing the game logs for each Rockets player during the 2017–2018 NBA season and another containing the game logs for each 76ers player during the 2020–2021 season.

Installing and Importing Libraries

The first step is to install basketball_reference_scraper via pip.

pip install basketball-reference-scraper

Then, I’ll import Python libraries needed for web scraping, manipulating dataframes, and visualizing data.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import requests
import unidecode
from bs4 import BeautifulSoup
from basketball_reference_scraper.players import get_game_logs

Gathering Training Data: 2017–2018 Houston Rockets

Now that the necessary Python libraries are installed and imported, the next step is creating a list containing the names of the players on the 2017–2018 Houston Rockets roster.

rockets_17_18_roster = ['Ryan Anderson', 'Trevor Ariza', 'Tarik Black', 'Bobby Brown', 'Markel Brown', 'Isaiah Canaan', 'Clint Capela', 'Eric Gordon', 'Gerald Green', 'James Harden', 'Nenê Hilário', 'R.J. Hunter', 'Aaron Jackson', 'Demetrius Jackson', 'Joe Johnson', 'Luc Mbah a Moute', 'Chinanu Onuaku', 'Chris Paul', 'Zhou Qi', 'Tim Quarterman', 'P.J. Tucker', 'Briante Weber', 'Troy Williams', 'Brandan Wright']

Then, I’ll create an empty dataframe that contains the column titles for the game logs.

rockets_17_18_logs = pd.DataFrame(data=None, columns=['PLAYER', 'DATE', 'AGE', 'TEAM', 'HOME/AWAY', 'OPPONENT', 'RESULT', 'GS', 'MP', 'FG', 'FGA', 'FG%', '3P', '3PA', '3P%', 'FT', 'FTA', 'FT%', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS', 'GAME_SCORE', '+/-'])

With the roster list and empty dataframe in place, I’ll build a for loop that gathers 2017–2018 game logs for each Rockets player and concatenates them in the empty dataframe.

# Aggregating 2017-2018 game logs for each Rockets playerfor player in rockets_17_18_roster:
try:
# Retrieve 2017-2018 game logs for each player
log = get_game_logs(player, '2017-10-17', '2018-04-11', playoffs=False)

# Add a Player column to the front of the dataframe
log.insert(0, 'PLAYER', player)

# Concatenate rockets_17_18_logs and log
rockets_17_18_logs = pd.concat([rockets_17_18_logs, log])
except AttributeError:
print(f"{player} has an AttributeError")
print('')

After running the for loop, the dataframe looks like this:

Five rows are shown; entire dataframe is 464 rows x 29 columns

Gathering Testing Data: 2020–2021 Philadelphia 76ers

Now that we have our training data, we can go ahead and gather the testing data. Just like I did for the Rockets roster, I’ll create a list containing the names of the players on the 2020–2021 Philadelphia 76ers roster.

sixers_20_21_roster = ['Tyrese Maxey', 'Danny Green', 'Dwight Howard', 'Ben Simmons', 'Joel Embiid', 'Tobias Harris', 'Shake Milton', 'Matisse Thybulle', 'Seth Curry', 'Isaiah Joe', 'Mike Scott', 'Furkan Korkmaz', 'Tony Bradley', 'Terrance Ferguson', 'Paul Reed', 'Vincent Poirier']

Then, I’ll create another empty dataframe that contains the game log column titles.

sixers_20_21_logs = pd.DataFrame(data=None, columns=['PLAYER', 'DATE', 'AGE', 'TEAM', 'HOME/AWAY', 'OPPONENT', 'RESULT', 'GS', 'MP', 'FG', 'FGA', 'FG%', '3P', '3PA', '3P%', 'FT', 'FTA', 'FT%', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS', 'GAME_SCORE', '+/-'])

Now, I’ll build another for loop that gathers 2020–2021 game logs for each Sixers player and concatenates them in the empty dataframe.

# Aggregating 2020-2021 game logs for each 76ers playerfor player in sixers_20_21_roster:
try:
# Retrieve 2020-2021 game logs for each player
log = get_game_logs(player, '2020-12-23', '2021-01-23', playoffs=False)

# Add a Player column to the front of the dataframe
log.insert(0, 'PLAYER', player)

# Concatenate sixers_20_21_logs and log
sixers_20_21_logs = pd.concat([sixers_20_21_logs, log])
except AttributeError:
print(f"{player} has an AttributeError")
print('')

The Sixers dataframe ends up looking like this:

Five rows are shown; entire dataframe is 83 rows x 29 columns

Next Steps

Now that I have both the training and testing data, I can move on to exploratory data analysis and modeling. In the follow-up to this post, I’ll go over:

  • Feature engineering
  • Data visualizations
  • Correlations between features
  • Linear, Lasso, and Ridge regression

Thanks for reading!

--

--