Out of curiosity, I wanted to test how GPT-4 would perform in predicting the NCAA Men’s Division I Basketball Tournament (March Madness) compared to my own picks.
I don’t watch much college basketball throughout the season, but read that there was substantial parity this year (perhaps more than most years).
The differential in talent and/or consistency of play between the top teams (#1 seeds) and lower seeds was low relative to previous years (based on my analyses).
If you don’t believe me, go look at any top-ranked team (#1 seed) and look at number of losses, the specific teams to which they lost, and/or strength of schedule.
The Final Four teams serve as evidence to support my hypothesis of high parity: FAU (#9 seed); SDSU (#5 seed); Miami (#5 seed); UConn (#4 seed).
The seed total of the “Final Four” teams is 23 – the second-highest since seeding began in 1979 (and highest since 2011 – which had a total of 26).
March Madness Predictions (2023)
When making my predictions for March Madness, I considered: (1) KenPom rankings; (2) matchups; (3) injuries; (4) coaching; (5) recent tournament performance (past few years); (6) which team I liked (just went with the team I liked more in coinflip-type matchups).
I filled out my bracket in ~20-30 minutes (didn’t spend too much time overthinking or overanalyzing) – and then completed a bracket with GPT-4 as a comparison.
I did NOT expect magical predictions from GPT-4. Why? Because it’s not necessarily any more accurate than humans in predicting March Madness brackets.
March Madness tournament game outcomes have a lot of randomness.
Even if GPT-4 used an “optimal” prediction algorithm, the accuracy of a single bracket could be worse than a bracket completed by someone who makes picks based a combination of tournament seeding and the mascot they like most.
Experienced basketball analysts, statisticians, and coaches are regularly shocked by the unpredictability of the March Madness tournament.
Really all it takes is one team having an “off game” (e.g. cold shooting, low energy, fouls, etc.) and/or another team “getting hot” – and brackets are busted.
GPT-4 Bracket Predictions
Below are the predictions made by GPT-4 for each section of the bracket and each round of the tournament.
South
Round 1 (Winners)
- Bama (#1)
- WVU (#9)
- SDSU (#5)
- Creighton (#6)
- Baylor (#3)
- Mizzou (#7)
- Arizona (#2)
Round 2 (Winners)
- Bama (#1)
- UVA (#4)
- Baylor (#3)
- Arizona (#2)
Round 3 (Winners)
- Bama (#1)
- Baylor (#3)
Midwest
Round 1
- Hou (#1)
- Iowa (#8)
- Miami (#5)
- Indiana (#4)
- Iowa St. (#6)
- Xavier (#3)
- Tex A&M (#7)
- Texas (#2)
Round 2
- Hou (#1)
- Indiana (#4)
- Iowa St. (#6)
- Texas (#2)
Round 3
- Hou (#1)
- Texas (#2)
East
Round 1
- Purdue (#1)
- Memphis (#8)
- Duke (#5)
- Tennessee (#4)
- UK (#6)
- K-State (#3)
- Mich St. (#7)
- Marq (#2)
Round 2
- Purdue (#1)
- Duke (#5)
- K-State (#3)
- Mich St. (#7)
Round 3
- Purdue (#1)
- Mich St. (#7)
West
Round 1
- Kansas (#1)
- Illinois (#9)
- Mary’s (#5)
- UConn (#4)
- TCU (#6)
- Gonzaga (#3)
- N’West (#7)
- UCLA (#2)
Round 2
- Kansas (#1)
- UConn (#4)
- Gonzaga (#3)
- UCLA (#2)
Round 3
- Kansas (#1)
- Gonzaga (#3)
Final Four
- Bama (#1) vs. Purdue (#1)
- Hou (#1) vs. Gonzaga (#3)
Championship
- Purdue vs. Gonzaga
Champion: Gonzaga
How did I perform vs. GPT-4 in March Madness predictions?
GPT-4 beat me in points (48 vs. 44), but we tied in total number of correct picks (33 games predicted correctly).
If you aren’t familiar with the bracket scoring – points increase each round (such that it’s way more valuable to predict the champion correctly than an early round matchup).
GPT-4 Bracket
- Total points: 48
- Correct picks: 33
Drew’s bracket
Final 4: Alabama (#1) vs. Tennessee (#4) & Houston (#1) vs. Kansas (#1)
Championship: Alabama (#1) vs. Houston (#1)
Champion: Alabama (#1)
- Total points: 44
- Correct picks: 33
The point differential is basically explained by Gonzaga beating UCLA.
GPT-4’s bracket had Gonzaga beating UCLA – whereas my bracket had UCLA beating Gonzaga.
Neither bracket can earn additional points for the remainder of the tournament – neither have teams in the Final Four.
We shared 2 of the Final 4 teams in our predictions (Alabama & Houston) but our championship matchups were different (Alabama vs. Houston for me & Purdue vs. Gonzaga for GPT-4).
How did GPT-4 make bracket predictions? (Methods)
I fed GPT-4 KenPom data for all teams in the tournament, the bracket setup/layout & seedings, coaching, and injury information (notable injuries).
I instructed GPT-4 to use its own judgment as well with the data it has from prior tournaments in the past 10-20 years.
- KenPom data: All publicly available data from KenPom – a statistical website.
- Bracket setup: Exact bracket setup, matchups, and locations of each game.
- Injury information: Information about injuries to critical players on each team and the severities of those injuries.
- Previous winners: I gave GPT-4 information about teams that made deep runs in the tournament over the past several years (as it only had information up to 2021).
- Tournament experience: GPT-4 factored in total tournament experience of various coaches (e.g. Tom Izzo from MSU) when making certain predictions – it did this unprompted.
- Use own judgment: I instructed GPT-4 to make predictions based on what it perceives to be the most accurate method/way to predict the NCAA D1 Men’s basketball tournament in the modern era (past 20 years).
Note: I did NOT feed GPT-4 information about recent trends (such as the last 10 games played by each team). I’m not sure whether this would’ve helped… I also did NOT feed GPT-4 entire schedules & scores for every team (would’ve taken too much time).
How a better prediction model might look…
According to GPT, it might be advantageous to develop a prediction algorithm for the March Madness tournament using a combination of statistical models and machine learning algorithms to analyze past tournament data & make predictions about future outcomes.
Developing a prediction algorithm for March Madness would involve:
- Data collection: Regular season performances, conference tournaments, past NCAA tournament performances. Data would include: offensive/defensive efficiency, strength of schedule, and other advanced statistics.
- Clean & preprocess data: Ensure that the data is in a clean & consistent format with missing values handled appropriately. Transform data to be used in prediction model.
- Build prediction model: Choose a machine learning algorithm and train it on historical data. Algorithms for sports predictions include: logistic regression, decision trees, and random forests.
- Validate the model: Test the model on historical data to validate accuracy and adjust as necessary.
- Make predictions: After the model has been validated, use it to make predictions for the present-year’s tournament. Combine predicted probabilities of each game to create a bracket and simulate the tournament to generate outcomes.
- Refine predictions: As the tournament progresses, refine predictions based on results of each game (obviously cannot do this once tournament starts, but could do this for a second-chance bracket in the Sweet 16).
That said, even a perfect prediction model would work well in March Madness for a variety of reasons: variability in performance, single-elimination format, intangibles (not in the data), etc.
GPT-4 Sweet 16 Predictions (Second-Chance Bracket)
I had GPT-4 predict the Sweet 16 in a second-chance bracket on ESPN.
I compared this to my predictions that were made mostly based on eye-test of games I’d seen thus far (I wasn’t taking the bracket overly seriously).
GPT-4: 200 points (~69%)
Me: 120 points (~36%)
GPT-4 predictions (second-chance)
- Alabama, Creighton, Tennessee, KSU, Houston, Texas, UConn, Gonzaga
- Alabama, Tennessee, Houston, Gonzaga
- Alabama, Gonzaga
- Gonzaga
My predictions (second-chance)
- Alabama, Creighton, Tennessee, Michigan St., Houston, Texas, UConn, UCLA
- Alabama, Tennessee, Houston, UCLA
- Alabama, Houston
- Alabama
Final 4 Predictions
At this point, there are only 4 teams left – and I’m doing a “third chance” comparing my picks to GPT-4 (which is kind of ridiculous).
I fed GPT-4 box score data from the past 2 games and updated KenPom data (which accounts for tournament performance).
GPT-4
- UConn vs. Miami: UConn
- FAU vs. SDSU: SDSU
- Champion: UConn
My predictions
- UConn vs. Miami: UConn
- FAU vs. SDSU: SDSU
- Champion: UConn
I made my predictions prior to asking GPT-4 for its picks… in no way are GPT-4’s predictions influencing my own… I’m going by what I’ve seen thus far.
I think Miami has a chance to beat UConn if they maintain their high shooting % and level of play – but UConn has made pretty solid teams look weak (crushing them).
FAU and SDSU are the 2 weakest teams left in the tournament (in my opinion), and although SDSU probably has better overall defense – FAU is big and athletic and not intimidated by any team… both of these games are challenging to predict.
Which team do I hope wins the title? No preference. If pressed to pick one team… I’d probably go with Miami because I appreciate their shooting – but FAU winning would be crazy.
Update: GPT-4 predicts: 2023 NCAA Men’s Basketball Final.
Why it’s challenging to make accurate predictions in March Madness
The tournament is referred to as “March Madness” for a reason… excitement, unpredictability, and complete chaos with the single-elimination setup.
Nobody has ever had a perfect bracket – the odds of getting a perfect bracket are so low that it is sometimes referred to as a “quintillion-to-one” long shot.
For reference, a quintillion = 1 followed by 18 zeros… If there are 68 teams in the tournament and each game has a 50/50 chance of being won by either team (not entirely accurate), the odds of getting a perfect bracket are 1 in 9.2 quintillion.
If every person on Earth filled out one bracket per second, it would still take over 292 billion years to generate a perfect bracket… pretty insane to ponder.
Others using GPT to predict March Madness (2023)
I found a few other individuals (Reddit, New York Times, Torrey Leonard) who utilized the power of GPT to make March Madness predictions.
Torrey Leonard put in some serious work and shelled out ~$28 to get all of his data fed & processed into GPT-4… and his bracket ended up being one of the worst I’ve seen haha.
Reddit user: A user (u/itsBrandteous) fed GPT information about each team in the March Madness tournament and let GPT make predictions. He posted the exact information he fed to GPT-4, which included: entire rosters, player statistics (e.g. shot %, minutes played, etc.). (R)
- Final 4: Alabama, Purdue, Houston, Kansas
- Championship: Alabama vs. Houston
- Champion: Alabama
NYT: Jonathan Ellis & Michael Beswetherick used Bing Chatbot (GPT-4) to make tournament predictions. Bing’s GPT-4-based chatbot has access to all updated information about each team. (R)
- Final 4: Baylor, Duke, Texas, Gonzaga
- Championship: Baylor vs. Texas
- Champion: Baylor
Perhaps the prompting was somewhat questionable and/or vague. My guess is that NYT authors could’ve been more specific in instruction – make predictions as accurately as possible based on updated information with an algorithm you perceive as ideal for this tournament.
Torrey Leonard: Some guy on Medium used ChatGPT to fill out a bracket. Not sure whether he used GPT-4 or the GPT-3.5. He trained a model and used a proof-of-concept prediction script. (R)
- Final 4: Charleston, Kentucky, Iowa, Kansas
- Championship: Kentucky vs. Iowa
- Champion: Kentucky
I did a double-take when looking at this bracket to verify that Charleston was even in the tournament… the combined seed total in this Final 4 was 27… a bit high, but this year was crazy.
Most of the predictions in this bracket were atrocious, but the model may improve with additional refinement.
GPT for next year’s March Madness
I plan to use GPT again next year to make March Madness predictions and compare against my own.
Perhaps GPT-4.5 or GPT-5 will be released by then and it’ll be able to gather all statistics it needs from the internet, develop its own algorithm/model, and increase in accuracy.
That said, even with an excellent prediction model, March Madness is mostly luck (especially if you limit yourself to filling out just one bracket – rather than hundreds like some people).