March Madness 2022

15 minute read

Import Libraries and Data into the Notebook

import pandas as pd
import numpy as np 
column_names = ["Team", "Conference", "Games Played","Games Won",
          "Adjusted Offensive Efficiency", "Adjusted Defensive Efficiency",
          "Power Ranking", "Effective Field Goal %","Effective Field Goal % (D)",
          "Turnover %", "Turnover % (D)", "Offensive Rebounds", "Defensive Rebounds",
          "Free Throw Rate", "Free Throw Rate (D)", "2-PT%", "2-PT% (D)",
          "3-PT%", "3-PT (D)", "Adjusted Tempo", "Wins above Bubble", "Postseason","Seed in Tournament", "Year"]

data = pd.read_csv("testdata.csv", header = None, names = column_names, skiprows = 1)
data
Team Conference Games Played Games Won Adjusted Offensive Efficiency Adjusted Defensive Efficiency Power Ranking Effective Field Goal % Effective Field Goal % (D) Turnover % ... Free Throw Rate (D) 2-PT% 2-PT% (D) 3-PT% 3-PT (D) Adjusted Tempo Wins above Bubble Postseason Seed in Tournament Year
0 Indiana B10 36 29 121.0 89.7 0.9692 54.7 44.0 19.3 ... 27.0 52.0 43.2 40.3 30.4 67.8 7.8 S16 1 NaN
1 Gonzaga WCC 34 31 118.9 90.2 0.9599 54.9 44.9 17.2 ... 29.9 55.0 42.1 36.5 32.9 65.1 7.6 R32 1 NaN
2 Kansas B12 37 31 111.6 86.2 0.9514 53.3 41.5 20.3 ... 32.0 52.9 39.3 36.4 30.3 67.7 7.5 S16 1 NaN
3 Louisville BE 40 35 115.9 84.5 0.9743 50.6 44.8 18.3 ... 34.9 50.8 43.4 33.3 31.8 67.1 9.0 Champions 1 NaN
4 Georgetown BE 32 25 107.6 85.0 0.9381 51.1 43.0 20.1 ... 35.3 50.2 41.4 35.3 30.7 62.5 6.6 R64 2 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
403 Creighton BE 28 20 114.4 94.3 0.9025 55.7 46.9 15.8 ... 25.7 56.3 46.1 36.7 32.1 69.1 3.5 S16 5 NaN
404 Oregon P12 26 20 113.1 98.2 0.8350 54.4 50.1 16.8 ... 27.1 52.9 49.9 37.9 33.6 67.2 3.0 S16 7 NaN
405 Loyola Chicago MVC 26 24 108.5 88.3 0.9136 56.3 46.7 18.5 ... 22.4 58.0 45.5 35.7 32.5 63.9 1.7 S16 8 NaN
406 Syracuse ACC 25 16 112.8 97.6 0.8402 50.7 48.5 16.0 ... 25.1 50.9 49.3 33.7 31.6 69.4 0.5 S16 11 NaN
407 Oral Roberts Sum 23 16 107.0 107.1 0.4981 53.6 50.4 15.7 ... 33.1 49.7 49.0 38.8 35.6 71.4 -5.1 S16 15 NaN

408 rows × 24 columns

data = data.drop(columns =["Year"])
data = data[data["Postseason"] != 'R68']
data
Team Conference Games Played Games Won Adjusted Offensive Efficiency Adjusted Defensive Efficiency Power Ranking Effective Field Goal % Effective Field Goal % (D) Turnover % ... Free Throw Rate Free Throw Rate (D) 2-PT% 2-PT% (D) 3-PT% 3-PT (D) Adjusted Tempo Wins above Bubble Postseason Seed in Tournament
0 Indiana B10 36 29 121.0 89.7 0.9692 54.7 44.0 19.3 ... 45.8 27.0 52.0 43.2 40.3 30.4 67.8 7.8 S16 1
1 Gonzaga WCC 34 31 118.9 90.2 0.9599 54.9 44.9 17.2 ... 40.8 29.9 55.0 42.1 36.5 32.9 65.1 7.6 R32 1
2 Kansas B12 37 31 111.6 86.2 0.9514 53.3 41.5 20.3 ... 39.5 32.0 52.9 39.3 36.4 30.3 67.7 7.5 S16 1
3 Louisville BE 40 35 115.9 84.5 0.9743 50.6 44.8 18.3 ... 40.0 34.9 50.8 43.4 33.3 31.8 67.1 9.0 Champions 1
4 Georgetown BE 32 25 107.6 85.0 0.9381 51.1 43.0 20.1 ... 36.8 35.3 50.2 41.4 35.3 30.7 62.5 6.6 R64 2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
403 Creighton BE 28 20 114.4 94.3 0.9025 55.7 46.9 15.8 ... 26.9 25.7 56.3 46.1 36.7 32.1 69.1 3.5 S16 5
404 Oregon P12 26 20 113.1 98.2 0.8350 54.4 50.1 16.8 ... 26.8 27.1 52.9 49.9 37.9 33.6 67.2 3.0 S16 7
405 Loyola Chicago MVC 26 24 108.5 88.3 0.9136 56.3 46.7 18.5 ... 30.7 22.4 58.0 45.5 35.7 32.5 63.9 1.7 S16 8
406 Syracuse ACC 25 16 112.8 97.6 0.8402 50.7 48.5 16.0 ... 28.1 25.1 50.9 49.3 33.7 31.6 69.4 0.5 S16 11
407 Oral Roberts Sum 23 16 107.0 107.1 0.4981 53.6 50.4 15.7 ... 27.8 33.1 49.7 49.0 38.8 35.6 71.4 -5.1 S16 15

388 rows × 23 columns

data = data.replace({
    
    'Champions': 6,
    '2ND': 5,
    'F4': 4,
    'E8': 3,
    'S16': 2,
    'R32': 1,
    'R64': 0
})
data
Team Conference Games Played Games Won Adjusted Offensive Efficiency Adjusted Defensive Efficiency Power Ranking Effective Field Goal % Effective Field Goal % (D) Turnover % ... Free Throw Rate Free Throw Rate (D) 2-PT% 2-PT% (D) 3-PT% 3-PT (D) Adjusted Tempo Wins above Bubble Postseason Seed in Tournament
0 Indiana B10 36 29 121.0 89.7 0.9692 54.7 44.0 19.3 ... 45.8 27.0 52.0 43.2 40.3 30.4 67.8 7.8 2 1
1 Gonzaga WCC 34 31 118.9 90.2 0.9599 54.9 44.9 17.2 ... 40.8 29.9 55.0 42.1 36.5 32.9 65.1 7.6 1 1
2 Kansas B12 37 31 111.6 86.2 0.9514 53.3 41.5 20.3 ... 39.5 32.0 52.9 39.3 36.4 30.3 67.7 7.5 2 1
3 Louisville BE 40 35 115.9 84.5 0.9743 50.6 44.8 18.3 ... 40.0 34.9 50.8 43.4 33.3 31.8 67.1 9.0 6 1
4 Georgetown BE 32 25 107.6 85.0 0.9381 51.1 43.0 20.1 ... 36.8 35.3 50.2 41.4 35.3 30.7 62.5 6.6 0 2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
403 Creighton BE 28 20 114.4 94.3 0.9025 55.7 46.9 15.8 ... 26.9 25.7 56.3 46.1 36.7 32.1 69.1 3.5 2 5
404 Oregon P12 26 20 113.1 98.2 0.8350 54.4 50.1 16.8 ... 26.8 27.1 52.9 49.9 37.9 33.6 67.2 3.0 2 7
405 Loyola Chicago MVC 26 24 108.5 88.3 0.9136 56.3 46.7 18.5 ... 30.7 22.4 58.0 45.5 35.7 32.5 63.9 1.7 2 8
406 Syracuse ACC 25 16 112.8 97.6 0.8402 50.7 48.5 16.0 ... 28.1 25.1 50.9 49.3 33.7 31.6 69.4 0.5 2 11
407 Oral Roberts Sum 23 16 107.0 107.1 0.4981 53.6 50.4 15.7 ... 27.8 33.1 49.7 49.0 38.8 35.6 71.4 -5.1 2 15

388 rows × 23 columns

y = data["Postseason"]
data = data.drop(columns = ["Postseason", "Team","Conference","Games Played","Games Won"]) # Drop Columns we dont need
data.head()
Adjusted Offensive Efficiency Adjusted Defensive Efficiency Power Ranking Effective Field Goal % Effective Field Goal % (D) Turnover % Turnover % (D) Offensive Rebounds Defensive Rebounds Free Throw Rate Free Throw Rate (D) 2-PT% 2-PT% (D) 3-PT% 3-PT (D) Adjusted Tempo Wins above Bubble Seed in Tournament
0 121.0 89.7 0.9692 54.7 44.0 19.3 20.9 39.0 31.4 45.8 27.0 52.0 43.2 40.3 30.4 67.8 7.8 1
1 118.9 90.2 0.9599 54.9 44.9 17.2 20.8 37.8 29.8 40.8 29.9 55.0 42.1 36.5 32.9 65.1 7.6 1
2 111.6 86.2 0.9514 53.3 41.5 20.3 18.4 33.8 29.3 39.5 32.0 52.9 39.3 36.4 30.3 67.7 7.5 1
3 115.9 84.5 0.9743 50.6 44.8 18.3 27.0 38.2 33.3 40.0 34.9 50.8 43.4 33.3 31.8 67.1 9.0 1
4 107.6 85.0 0.9381 51.1 43.0 20.1 22.4 30.4 31.0 36.8 35.3 50.2 41.4 35.3 30.7 62.5 6.6 2

Standardize

data = ( data - data.mean())/data.std()

X = data

X
Adjusted Offensive Efficiency Adjusted Defensive Efficiency Power Ranking Effective Field Goal % Effective Field Goal % (D) Turnover % Turnover % (D) Offensive Rebounds Defensive Rebounds Free Throw Rate Free Throw Rate (D) 2-PT% 2-PT% (D) 3-PT% 3-PT (D) Adjusted Tempo Wins above Bubble Seed in Tournament
0 1.524498 -1.276534 1.046617 0.872923 -1.613747 0.928357 0.745842 1.804037 0.959349 1.725474 -1.010380 0.172274 -1.263146 1.774169 -1.250704 -0.010877 1.343025 -1.632740
1 1.186605 -1.176837 0.985872 0.946052 -1.220337 -0.181259 0.703565 1.521991 0.426549 0.856488 -0.499170 1.170414 -1.656514 0.265353 -0.033504 -0.880728 1.298588 -1.632740
2 0.012026 -1.974414 0.930352 0.361024 -2.706553 1.456745 -0.311085 0.581840 0.260049 0.630552 -0.128984 0.471716 -2.657813 0.225647 -1.299392 -0.043094 1.276369 -1.632740
3 0.703901 -2.313384 1.079929 -0.626209 -1.264049 0.399968 3.324746 1.616007 1.592048 0.717451 0.382226 -0.226983 -1.191624 -1.005229 -0.569072 -0.236394 1.609649 -1.632740
4 -0.631579 -2.213687 0.843480 -0.443388 -2.050869 1.351067 1.379999 -0.217289 0.826149 0.161300 0.452738 -0.426611 -1.906839 -0.211116 -1.104640 -1.718362 1.076401 -1.416675
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
403 0.462550 -0.359321 0.610950 1.238565 -0.346093 -0.921003 -0.564748 -1.533501 -0.106251 -1.559292 -1.239544 1.602942 -0.226085 0.344764 -0.423008 0.407940 0.387624 -0.768479
404 0.253378 0.418317 0.170058 0.763231 1.052699 -0.392614 0.492180 -0.710868 -0.439251 -1.576671 -0.992753 0.471716 1.132822 0.821232 0.307312 -0.204177 0.276530 -0.336349
405 -0.486768 -1.555686 0.683452 1.457950 -0.433517 0.505646 1.041782 -1.251456 -2.503850 -0.898863 -1.821265 2.168555 -0.440649 -0.052293 -0.228256 -1.267328 -0.012312 -0.120284
406 0.205108 0.298680 0.204023 -0.589645 0.353303 -0.815325 0.703565 -0.287800 1.658648 -1.350735 -1.345311 -0.193711 0.918258 -0.846407 -0.666448 0.504590 -0.278936 0.527912
407 -0.728120 2.192925 -2.030485 0.470717 1.183835 -0.973842 -0.395640 -1.909562 1.492148 -1.402874 0.064923 -0.592968 0.810976 1.178584 1.281071 1.148924 -1.523179 1.392172

388 rows × 18 columns

Machine Learning Algorithm

from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor

randTree = RandomForestRegressor(min_samples_split=20, random_state=5);
randTree.fit(X, y)
RandomForestRegressor(min_samples_split=20, random_state=5)

We can use this Trained Model to test

  • Lets test data from the 2019 tournment!!!!
column_names = ["Team", "Conference", "Games Played","Games Won",
          "Adjusted Offensive Efficiency", "Adjusted Defensive Efficiency",
          "Power Ranking", "Effective Field Goal %","Effective Field Goal % (D)",
          "Turnover %", "Turnover % (D)", "Offensive Rebounds", "Defensive Rebounds",
          "Free Throw Rate", "Free Throw Rate (D)", "2-PT%", "2-PT% (D)",
          "3-PT%", "3-PT (D)", "Adjusted Tempo", "Wins above Bubble","Postseason","Seed in Tournament"]


test_data_2019 = pd.read_csv("cbb19.csv", header = None, names = column_names, skiprows = 1)
test_data_2019
Team Conference Games Played Games Won Adjusted Offensive Efficiency Adjusted Defensive Efficiency Power Ranking Effective Field Goal % Effective Field Goal % (D) Turnover % ... Free Throw Rate Free Throw Rate (D) 2-PT% 2-PT% (D) 3-PT% 3-PT (D) Adjusted Tempo Wins above Bubble Postseason Seed in Tournament
0 Gonzaga WCC 37 33 123.4 89.9 0.9744 59.0 44.2 14.9 ... 35.3 25.9 61.4 43.4 36.3 30.4 72.0 7.0 E8 1.0
1 Virginia ACC 38 35 123.0 89.9 0.9736 55.2 44.7 14.7 ... 29.1 26.3 52.5 45.7 39.5 28.9 60.7 11.1 Champions 1.0
2 Duke ACC 38 32 118.9 89.2 0.9646 53.6 45.0 17.5 ... 33.2 24.0 58.0 45.0 30.8 29.9 73.6 11.2 E8 1.0
3 North Carolina ACC 36 29 120.1 91.4 0.9582 52.9 48.9 17.2 ... 30.2 28.4 52.1 47.9 36.2 33.5 76.0 10.0 S16 1.0
4 Michigan B10 37 30 114.6 85.6 0.9665 51.6 44.1 13.9 ... 27.5 24.1 51.8 44.3 34.2 29.1 65.9 9.2 S16 2.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
348 Alcorn St. SWAC 27 10 89.0 112.6 0.0628 45.7 52.7 24.1 ... 30.5 36.5 45.0 55.3 31.3 32.1 67.1 -16.7 NaN NaN
349 New Hampshire AE 27 5 83.7 106.1 0.0613 44.0 51.5 18.4 ... 21.9 38.0 39.4 52.1 32.6 33.6 67.1 -20.2 NaN NaN
350 Chicago St. WAC 30 3 88.5 117.3 0.0380 44.2 57.8 22.5 ... 33.1 33.9 43.5 57.9 30.7 38.5 71.9 -20.9 NaN NaN
351 Delaware St. MEAC 29 6 84.3 112.2 0.0358 40.0 52.4 19.0 ... 25.5 39.2 37.7 52.6 29.0 34.7 71.6 -21.7 NaN NaN
352 Maryland Eastern Shore MEAC 30 7 85.7 114.4 0.0346 43.5 54.4 20.7 ... 28.3 36.6 44.5 53.2 27.9 37.3 64.5 -19.9 NaN NaN

353 rows × 23 columns

Adjusting our 2019 Data

# Lets Drop the teams that didn't make the tournment in 2018 and who ever didnt get into the round of 64

test_data_2019 = test_data_2019.dropna()

test_data_2019 = test_data_2019[test_data_2019["Postseason"] != 'R68']
 
## Lets replace the POSTSEASON with the number of wins

test_data_2019 = test_data_2019.replace({
    
    'Champions': 6,
    '2ND': 5,
    'F4': 4,
    'E8': 3,
    'S16': 2,
    'R32': 1,
    'R64': 0
})

# Lets grab the teams that made the Tournment
team_names = test_data_2019.get("Team")

# y
actual_outcomes = test_data_2019.get("Postseason")

# Lets drop the Columns like we did above to the dataset
test_data_2019 = test_data_2019.drop(columns = ["Postseason", "Team","Conference","Games Played","Games Won"])

# Lets finally standardize the data! 

test_data_2019 = (test_data_2019 - test_data_2019.mean())/test_data_2019.std()

test_data_2019
Adjusted Offensive Efficiency Adjusted Defensive Efficiency Power Ranking Effective Field Goal % Effective Field Goal % (D) Turnover % Turnover % (D) Offensive Rebounds Defensive Rebounds Free Throw Rate Free Throw Rate (D) 2-PT% 2-PT% (D) 3-PT% 3-PT (D) Adjusted Tempo Wins above Bubble Seed in Tournament
0 2.042572 -1.132484 1.073843 2.460166 -1.566239 -1.786817 0.010948 0.338659 -0.258338 0.390565 -1.264927 3.016431 -1.347685 0.262195 -1.161525 1.304614 0.951017 -1.614218
1 1.972384 -1.132484 1.068618 0.941607 -1.361502 -1.929228 -0.645904 0.058170 -0.736583 -1.103902 -1.164796 0.040748 -0.543228 1.656936 -1.875394 -2.729147 1.772165 -1.614218
2 1.252960 -1.260798 1.009833 0.302213 -1.238660 0.064530 0.186108 1.384121 0.663992 -0.115625 -1.740545 1.879653 -0.788062 -2.135016 -1.399481 1.875766 1.792193 -1.614218
3 1.463523 -0.857526 0.968031 0.022479 0.358290 -0.149087 -0.295583 1.307624 -1.624753 -0.838755 -0.639113 -0.092990 0.226254 0.218609 0.313805 2.732493 1.551857 -1.614218
4 0.498442 -1.920698 1.022243 -0.497028 -1.607186 -2.498873 -0.426954 -1.395277 -0.941545 -1.489571 -1.715513 -0.193294 -1.032897 -0.653104 -1.780211 -0.872903 1.391633 -1.398989
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
61 -1.905487 0.847215 -1.982937 -0.377142 0.726817 -0.077881 1.893923 -0.808798 0.322389 0.077209 1.839111 -0.962291 0.890806 0.872394 0.218622 -0.444539 -0.971671 1.398989
62 -1.010593 1.928718 -2.121406 0.781758 1.013449 -0.220292 -0.339373 -1.981755 1.278879 1.113695 -0.839373 0.408529 1.100664 0.872394 0.408988 0.126613 -1.452343 1.614218
63 -0.905312 2.368651 -2.436881 0.262251 2.282819 -1.217172 -2.003398 -2.415239 -0.565781 -0.356669 -1.114731 -0.092990 1.870145 0.436537 1.979500 -0.658721 -2.173351 1.614218
64 -1.150969 2.331990 -2.649158 0.062441 1.750502 -0.077881 -0.208003 -1.293281 1.073917 0.752130 -0.614080 0.274791 1.065688 -0.260833 1.789134 1.268917 -2.533854 1.614218
66 -1.273797 2.606948 -3.008395 0.541986 1.627660 1.631054 0.536429 -0.171322 1.927926 0.486983 -0.388787 -0.460771 1.415452 1.918449 1.170448 -0.123266 -2.313546 1.614218

64 rows × 18 columns

Now we can use our Model on it!

outcomes = randTree.predict(test_data_2019)

Lets look at the MSE and R Squared Values

R Squared: is the proportion of the variance in the dependent variable that is predictable from the independent variable(s).

Range from 0 to 1, closer to 1 is the best! ( 1 being Perfect) MSE: Means Squared Error! (lower number the better)

The sklearn library has MSE and R Squared metrics build in, lets first import them!

from sklearn.metrics import mean_squared_error, r2_score
print('Mean squared error:', mean_squared_error(actual_outcomes, outcomes)) 
print('Coefficient of determination:', r2_score(actual_outcomes, outcomes))
Mean squared error: 0.3543465603139662
Coefficient of determination: 0.8027716386674812
  • .80 Coefficent of Determination is solid!!

Running Our Model on 2022 Data

column_names = ["Team","Adjusted Offensive Efficiency", "Adjusted Defensive Efficiency",
          "Power Ranking", "Record", "Effective Field Goal %","Effective Field Goal % (D)",
          "Turnover %", "Turnover % (D)", "Offensive Rebounds", "Defensive Rebounds",
          "Free Throw Rate", "Free Throw Rate (D)", "2-PT%", "2-PT% (D)",
          "3-PT%", "3-PT (D)", "Adjusted Tempo", "Wins above Bubble","Seed in Tournament"]

data_2022 = pd.read_csv("2022_adj.csv", header = None, names = column_names, skiprows = 1)
                    
data_2022.head()
Team Adjusted Offensive Efficiency Adjusted Defensive Efficiency Power Ranking Record Effective Field Goal % Effective Field Goal % (D) Turnover % Turnover % (D) Offensive Rebounds Defensive Rebounds Free Throw Rate Free Throw Rate (D) 2-PT% 2-PT% (D) 3-PT% 3-PT (D) Adjusted Tempo Wins above Bubble Seed in Tournament
0 Gonzaga 121.405 88.3685 0.974731 26-3 59.4 43.2 15.9 17.0 29.0 23.0 29.7 22.2 60.9 41.6 37.9 30.7 72.7093 6.980363 1.0
1 Kansas 119.798 93.1202 0.947700 28-6 54.1 46.9 17.8 18.4 33.4 28.9 32.8 27.8 54.5 47.9 35.5 30.1 69.0450 10.116105 1.0
2 Arizona 118.578 93.2390 0.940735 31-3 55.9 44.4 18.0 17.7 34.5 28.3 35.1 22.8 57.5 41.9 35.4 32.7 72.2560 9.063691 1.0
3 Baylor 117.053 91.7082 0.943009 26-6 52.9 47.8 18.2 22.9 36.3 28.4 28.5 26.9 53.5 49.5 34.6 29.9 67.4767 8.628828 1.0
4 Duke 120.018 95.1092 0.935540 28-6 55.6 47.0 14.9 16.1 31.9 28.5 28.6 18.9 55.8 46.9 36.8 31.4 67.6327 6.480068 2.0

Cleanup Data

# Drop teams that didnt make it 

data_2022 = data_2022.dropna()

# GRab TEam names
team_names = data_2022.get("Team")

# Drop columns

data_2022 = data_2022.drop(columns = ["Team", "Record"])

# Standardize he data

data_2022 =(data_2022 - data_2022.mean())/data_2022.std()
data_2022.head()
Adjusted Offensive Efficiency Adjusted Defensive Efficiency Power Ranking Effective Field Goal % Effective Field Goal % (D) Turnover % Turnover % (D) Offensive Rebounds Defensive Rebounds Free Throw Rate Free Throw Rate (D) 2-PT% 2-PT% (D) 3-PT% 3-PT (D) Adjusted Tempo Wins above Bubble Seed in Tournament
0 1.904057 -1.591607 1.222213 2.568833 -2.065698 -0.808917 -0.614141 -0.334286 -1.307253 -0.535455 -1.252683 2.780897 -2.070024 1.088533 -0.715641 2.463788 1.185066 -1.648667
1 1.631395 -0.555475 1.031333 0.657779 -0.309891 0.107417 -0.071630 0.702858 0.817306 0.266551 -0.074587 0.743341 0.241243 0.181995 -1.020038 0.921680 1.933918 -1.648667
2 1.424396 -0.529570 0.982149 1.306816 -1.496247 0.203873 -0.342886 0.962144 0.601249 0.861588 -1.126458 1.698445 -1.959963 0.144222 0.299016 2.273018 1.682589 -1.648667
3 1.165647 -0.863368 0.998207 0.225087 0.117198 0.300329 1.672155 1.386430 0.637259 -0.845909 -0.263924 0.424973 0.828232 -0.157958 -1.121504 0.261667 1.578739 -1.648667
4 1.668723 -0.121763 0.945464 1.198643 -0.262436 -1.291197 -0.962898 0.349286 0.673268 -0.820037 -1.946918 1.157220 -0.125624 0.673036 -0.360511 0.327319 1.065590 -1.431478

Now we can use our trained model to make predicitons

outcomes = randTree.predict(data_2022)
team_names = team_names.to_numpy()

Predicting the Number of wins

for i in range(len(team_names)):
    print(team_names[i], outcomes[i])
Gonzaga 4.298450225081188
Kansas 3.3442196063367895
Arizona 2.8776303454815193
Baylor 3.3032326512844143
Duke 3.0235155961572433
Kentucky 3.046358917236067
Villanova 2.0461495232866493
Auburn 2.2014452057013236
Purdue 2.3204202532996803
Tennessee 2.710680094457952
Wisconsin 1.1603186666452192
Texas Tech 3.085000231714144
UCLA 2.860326343327394
Illinois 1.2330048020144462
Providence 0.8866655310039284
Arkansas 1.4588779583526954
Iowa 2.5911571179344737
Houston 4.090685790210517
Connecticut 0.8193585395350703
Saint Mary's 1.4544321557302027
Alabama 1.0773849084622729
Colorado St. 0.9204878962394024
Texas 1.3948944618441117
LSU 0.5968029485256523
Ohio St. 0.5588504011052923
Michigan St. 0.40531079854572005
Murray St. 0.9958768235705858
USC 0.4046188105875423
North Carolina 0.7319186823626208
Boise St. 0.513611232777687
Seton Hall 0.543717539557223
San Diego St. 0.38283954193826675
Memphis 2.019966984733968
Marquette 0.9110003967595597
TCU 0.5333828842756806
Creighton 0.5397218035252996
Davidson 0.6462122840870752
Miami FL 0.6425294259986538
San Francisco 1.277724887272622
Loyola Chicago 0.5786369140050699
Virginia Tech 1.2454582488421115
Michigan 1.1163138835408595
Iowa St. 1.1006914444528788
Notre Dame 0.9083503259174012
Rutgers 0.24443489708614496
UAB 0.3803614474751469
Richmond 0.39421107910334025
New Mexico St. 0.10003135313117716
Wyoming 0.15089250386005165
Indiana 1.0072659826219392
South Dakota St. 0.7654512567742268
Vermont 0.37248574466916684
Chattanooga 0.458839900837815
Akron 0.04589294258373206
Colgate 0.266381915407217
Longwood 0.22944397759103632
Yale 0.014089238638599254
Montana St. 0.07187918514234304
Delaware 0.027849486247588276
Jacksonville St. 0.07756203007518797
Cal St. Fullerton 0.15801090880796761
Saint Peter's 0.11311555360903186
Norfolk St. 0.008501003344481606
Georgia St. 0.31069314716980634
Bryant 0.13281253300475268
Texas Southern 0.026032138875617138

Finding Importance Features in our Model

Importances = randTree.feature_importances_
print('Features | Coefficients:')
print('-------------------------------')
for i in range(len(Importances)):
    print(data_2022.columns[i], ":", Importances[i])
Features | Coefficients:
-------------------------------
Adjusted Offensive Efficiency : 0.020837837067937272
Adjusted Defensive Efficiency : 0.019878321896035844
Power Ranking : 0.6352742363889717
Effective Field Goal % : 0.02354902907386864
Effective Field Goal % (D) : 0.00834636668418663
Turnover % : 0.017603147031684505
Turnover % (D) : 0.024662559002028416
Offensive Rebounds : 0.029780766422328686
Defensive Rebounds : 0.030720724599930433
Free Throw Rate : 0.03103466259250237
Free Throw Rate (D) : 0.0112768168324684
2-PT% : 0.01959143305651005
2-PT% (D) : 0.02504113475899021
3-PT% : 0.011447156047409724
3-PT (D) : 0.015981116861988772
Adjusted Tempo : 0.01668413131261128
Wins above Bubble : 0.033070407747852616
Seed in Tournament : 0.025220152622694322

Predictions For Final 4

Gonzaga

"insert"

Baylor

"insert"

Houston

"insert

Kansas

"insert"

Final Bracket

"insert bracket"

Updated: