NBA Salary Prediction
Predicting NBA player salary with ML/DL models trained with years of stats and salary data.




Tools
Overview
Objective: Accurately predict 2025–26 NBA player salaries using models trained by players' traditional per-game statistics- points/Assists/Rebounds per game, etc.- and salary data.
Two data scopes: single-season (2024–25) data for smaller linear regression and random forest models; 15 seasons (2010–2025) to feed deep learning models.
Data & Preparation
Collected player stats and salary data from Basketball Reference. If there was a need for extra data, scrapped from other sites that provide NBA-related data.
The data prep notebook (inside GitHub link) includes codes handling data scraping, cleaning, joins, type fixes, standardization, and output CSVs ready for modeling.
Feature Selection
Reduced multicollinearity using VIF and removed low-signal features. As a result of VIF test with salary as dependent variable, Linear Regression model used a compact set of stats: PTS, AST, REB, STL, BLK, Age.
Random Forest and Deep Learning models consumed entire set of stats with scaling among variables.
Models & Training
Linear Regression baseline trained on 2024–25 data only to minimize multicollinearity.
Random Forest compared against GBM, XGBoost, and Extra Tree models; RF outperformed even before heavy tuning, then improved further with hyperparameters.
Deep Learning model (PyTorch) used a fully-connected network (50 epochs, standard scaling).
Evaluation & Results
Metrics: RMSE (absolute error in dollars) and R² (explained variance).
Linear Regression model — RMSE ≈ $9.24M, R² ≈ 0.526
Random Forest model — RMSE ≈ $4.20M, R² ≈ 0.744
Deep Learning model — RMSE ≈ $4.93M, R² ≈ 0.650 (±0.02 between runs)
Conclusion: Random Forest provides the best overall evaluation scores while remaining relatively interpretable through feature importance.
Sample Predictions & Reproducibility
Python 3.8+ with pandas, numpy, matplotlib, seaborn, scikit-learn, xgboost, plotly, torch.
Run instructions: download `data/` and `notebooks/` inside GitHub link, open a notebook based on preferred model (LR/RF/DL), and execute the final Prediction Function cell.
Each notebook includes a prediction function to query by player name; example narrative outputs for Harden, LeBron, and Ty Jerome are shown in the figure.