Hey Everyone,
These are some of the coolest models I have ever made. PLEASE, if you enjoy this content → share, RT, like, all that good stuff.
Judah Fortgang, otherwise known as ‘King Judah’ or ‘The SGP King’ is very good at predicting tail events. He creates stories of upsets and fringe/tail events using PFF data and his EDP models. The rest of the Sportsfolio Kings do the same, but Judah, to my knowledge, is the OG and the best at it. I adore this idea, but I am not in the same class of pure data analyst that Judah is. However, I am pretty good at statistical modeling.
So, I set out to create a model/process to replicate this methodology. For the uninitiated, this is a weird and difficult problem. It isn’t quite as simple as “good linear prediction” … I broke the problem down into 3 main things we care about:
What does a tail event for a team look like?
What is the probability that a team achieves their tail event in the current game?
How does a team respond to being pushed towards a tail event?
For the statistically curious, I am going to breakdown each of these steps and talk about how I attacked it.
What does a tail event for a team look like?
This one is probably the easiest, but still required some fun data wrangling that will help us at every step of this process. First, we need to create variables, and boy did we create a lot of variables. I focused on 4 week moving averages, but incorporated some season long mins and maxes where relevant. I created variables like:
Series Conversion Rate: Probability a team converts one set of downs into another
Neutral PROE and Neutral PROE allowed: tells us how teams try and attack or how they are attacked
PROE when winning and PROE when losing: super helpful and tells us how a team either steps on the gas or takes their foot off
Pace of Play: tells us how fast teams are willing to play and potentially hold onto the ball
All Team PFF grades: tells us how good or bad teams are. very stable.
Other Stuff: aDoT for and against, explosive play rates, etc… There are more but can’t give away the whole model ya know
Once we have all these variables for Team A we can join it to their opponent! Now, we have (almost) a complete picture of both the Team we are predicting and their opponent for the game we are predicting.
Lastly (for wrangling), we need a variable that we care about predicting (our response variable). In our case, we are going to choose Passing Yards. This works out well since Passing Yards can really only be accumulated by one player on the team and our projections are team based. Once we map passing yards to our independent variables, we are ready to create a model!
The model I chose was an Elastic Net. I’m not going to give you a detailed breakdown of it, but the basic benefit of using an elastic net is that it is VERY good at finding the variables that matter for the thing you care about predicting without overfitting. So you can throwing 10000 variables into it and it will find the variables that matter (this is an oversimplification, but that’s the idea).
We let R do the work with some k-fold cross validation on our training set and bada bing bada boom → .15 R^2 on our testing data. This is not AMAZING by any means, but its really not half bad for a weekly prediction for a random week.
Wait…. this is just the Predicted Value? this isn’t a tail event? NOW we gotta figure out the tail event. The good news is we can use the variables the Elastic Net selected and plug those variables into a Quantile Regression predicting the 85th percentile outcome.
Boom, NOW we have a tail event prediction.
Probability of Tail?
This is easy. 85th percentile means 15% of the time this event should happen. done.
well…. yes and no? yes that statement is true but no, any given game has the ability to push a tail event more than another. When playing the Chiefs or Lions, that may have an impact on the probability of a team having a tail event more than if you are playing the Titans or Bears.
Some teams or team interactions naturally influence tail games and some teams are willing to push themselves into tail events more than others. The Vikings are a good example of this imo. When they played the 49ers they SLUNG it. knowing their only path to victory was play action air raid (and it worked). Other teams may have tried to just do what they do and lose slowly.
So now, what is a tail event? I classified a tail event as Actual Passing yards within 10% of 85th Percentile Prediction.
We use our dataset from earlier and logistic regression on this classification → boom probability of a team achieving their tail event.
How does a team respond to being pushed towards a tail event?
This one is super easy but important I think. I call it “Potential for Exceptional Performance”
We take the 85th Percentile Prediction and subtract it from the Elastic Net prediction.
Boom we now have an idea of teams that can/are willing to air it out if pushed.
This give us an idea of a teams passing potential. I also created a random forest to predict this number on our training data to get an idea of what is important in creating this difference. Here are the top 10 most important variables (.x means the team we are predicting. .y means the opponent of that team):
^^ this model had a .54 R^2 (its not predicting anything in the future).
Interpretation of top 5:
Opponents worst run defense grade of the year: THIS IS SO INTERESTING. My interpretation is that if a team thinks they can run it, they will. which ultimately will limit a teams passing upside.
Team’s worst series conversion rate: less interesting but makes sense. sets a bar for the worst a team can be. a bit surprised this isnt the max.
Team’s max QB grade: Make s ton of sense.. Good QB → high potential
Opp’s worst coverage grade: intuitive. whats the worst their coverage can be.
4 week offense grade moving avg: how good has the offense been? again… makes sense.
Okay! That’s all of the “training and math talk” lets look at how this has turned out in practice for 2023 (The training data is 2015-2022).
How has 2023 Looked:
So I looked at teams that finish in the top 10 for our Elastic Net, Quantile, and Probability. If they get top 10 in all 3, I consider them a great tail betting opportunity. If they get top 10 in only 2, I consider them good. It is important to note that this is TEAM passing yards not individual.
I start in week 4 since that’s when we have enough data for the models.
Week 4:
There were five 300 yard passers. 3 out of 5 were identified as good or very good! We are off to a great start.
Week 5:
There were seven 300 yard passers. 5 out of 7 were identified as good or very good! The #1 and #2 teams were good or very good!
Week 6:
There were six 300 yard passers. 2 out of 6 were identified as good or very good. Not terrible but not great.
Week 7:
There were five 300 yard passers. 2 out of 5 were identified as good or very good. This includes the #1 passing team this week!
Week 8:
There were seven 300 yard passers. 4 out of 7 were identified as good or very good. This includes the #2 passing team this week!
Week 9:
There were four 300 yard passers. 2 out of 4 were identified as good or very good. Not terrible at all! but not special.
Week 10:
There were eight 300 yard passers. 6 out of 8 were identified as good or very good. This is fantastic! All teams classified as very good had 300 yard games. We also captured the #1 and #2 team!
From week 4-10 there were 42 300 yard passers, we identified 24 of them (57%!). Without doing any manual analysis whatsoever!!!! I believe this % could be increased by thoroughly dictating who we choose to bet on or identifying teams with special edges in our selection. Around 9 teams each week get a classification of good or very good. Assuming this to be true, you have about a 38% chance of picking a 300+ yard passing team if you are simply randomly choosing from the models output.
THAT IS HUGE
I will be posting these model results for free on my Twitter, but please consider supporting my Patreon! and subscribing to THIS free Substack.
I see this very similar to Josh’s Buy Low model in the fact that I do not know which team will explode, but I do know that creating a shortlist of teams with legit explosion potential and knowing a certain % will hit is a very good thing for SGPs and GPPs in DFS.
Thank you!
oh… Here is Week 11’s model output!
SF is VERY close to being ‘very good’.
And there you have it!
Please share this on Twitter if you think its cool and useful. It took a lot of work and its easily one of the cooler models I’ve ever made.