Background
For 130 years, pitchers have thrown a baseball overhand, and for 130 years, doing so has hurt them. Starter or reliever, left-handed or right-handed, short or tall, skinny or fat, soft-tossing or hard-throwing, old or young—it matters not who you are, what color your skin is, what country you’re from. The ulnar collateral ligament (UCL), a stretchy, triangular band in the elbow that holds together the upper and lower arms, plays no favorites. If you throw a baseball, it can ruin you.
-Jeff Passan, The Arm: Inside the Billion-Dollar Mystery of the Most Valuable Commodity in Sports (Amazon link)
In February 2016, I presented my Tommy John surgery predictability findings from a half year of studying pitching, injury, geographic, and physiological data from thousands of Major League Baseball player-years. The results were simple and largely intuitive: Throwing really hard is dangerous for your elbow, doubly if you recently hurt your elbow.
After another year and a half of work, I’m proud to present an update to my injury research, and while the takeaway this year is perhaps more vague, the effectiveness of the model is more certain. Without further ado (the further ado section comes later), I present the results.
Results
The following table presents the Tommy John Surgery risk for MLB players entering the 2017 season, as well as the 2018 given 2016 through October 2017 data.
Risk+ is the player’s percentage above or below average the predicted TJS Power (explained below) score. The average TJS Power score is about 0.3, so a player with a 0.6 prediction would have a Risk+ of 100%. Link to standalone Tableau.
Because of playing time limits, some players will not appear in a 2016 or 2017 column even if they pitched in the given year. It is important to note that the 2016 column is estimating the pitcher’s TJS risk given his data from the 2016 season. So 2016 predicts a 2017 injury. The 2017 data predicts injury risk given the (partial) 2017 data, thus striving to predict a player’s injury risk heading into the 2017 playoffs and 2018 season.
Analysis
At first, the results from the previous TJS study appeared quite promising (if we can define a catastrophic injury as “promising”). Carter Capps, on March 8, 2016, became the first major leaguer of the 2016 season to have season-ending Tommy John Surgery (TJS). Carter ranked No. 10 on my list of high-risk pitchers (with 156% risk above average or Risk+) going into the season.
When word reached me that Capps was going under the knife, my stomach sank. Part of me hated doing research that might negatively impact a person’s career, their reputation, and their confidence — especially if the predictions proved unreliable. Another part of me feared my predictions might be correct, and my work might not lead to better injury prevention, but rather a hangman’s march of certainty.
It is my hope that any tidbits of information I have uncovered with this ongoing effort will pay dividends in reducing arm injuries, though that is a task that will necessarily occur on the other end of this baton.
As the season went along, the real-world results of the research oscillated between successful and questionable. In early April, Manny Parra (-48% Risk+) succumbed to TJS. Four days later, another unlikely candidate, Felix Doubront (-75% Risk+), got the bad news. Then Chris Bassitt (5% Risk+) and Carson Smith (35% Risk+) went down.
All told, from February 2016 through August 2017, the players who fared worse in reality had higher risk rates in the formula:
Had TJS in 2016? | Average of Risk+ | Number of Players |
---|---|---|
FALSE | -2% | 410 |
TRUE | 26% | 25 |
Astute readers will note that 921 pitchers took the mound since Opening Day 2016, not the 435 that are observed above. This is in part due to sampling data size requirements that might forever hinder this kind of research until complete, accurate, and public minor league PITCHf/x data exists.
Where this leaves us now is looking for room to improve. After almost two years of continuous work — cleaning data, writing code, updating data, re-rewriting code, giving up three times, receiving divine inspiration four times, and updating the data again – I am proud to present a revised model that I believe improves upon the model, and improves it considerably.
Using the same input time frame – the years 2007 through 2016, in order to predict UCL injuries in a random subset of that group – I have found an algorithm producing these results:
Had TJS in Following Year? | Average of Risk+ | Number of Players |
---|---|---|
FALSE | -3% | 1104 |
TRUE | 90% | 40 |
First, let’s start with what we believed was working with the last model:
There were some relative surprises, but also some intuitive ideas. Being old lessened the odds of TJS (though not hugely, and possibly because retirement or ineffectiveness is just as likely to hit the following season, especially if an elbow is already barking). Also, throwing fastballs was more dangerous than breaking balls, a finding confirmed in other studies, but not in baseball convention. Being left-handed seemed to matter, for the better, though the impact was small.
What mattered a great deal, and what makes the most intuitive sense, is the presence of recent arm trouble. This variable not only proved to be a reliable predictor of future Tommy John surgery (TJS), but also the most powerful (an important concept in good modeling). This is why, at the close of my article, I bemoaned the issues that plagued my limited injury and PITCHf/x database:
Without comprehensive dumps from the PITCHf/x data at Brooks Baseball or the Baseballic.com injury database, and without good information on late-career UCL injuries that result in retirement instead of TJS, and without medical records from these players themselves, we will always be playing catch-up with our prediction models.
This year, I am happy to announce we managed to lease data from Corey Dawkins’ baseballic.com database. Some may remember Dawkins for his work adding granular injury data to the player pages at Baseball Prospectus. And while that impressive dataset only spanned several recent seasons, the data available at baseballic.com extend as far back as the 1980s in some cases and all the way up to present day. It is the most impressive baseball injury database outside the server rooms of the Major League teams.
Add to that, in June 2017, FanGraphs added the superb Pitch Info data – the heart of the Brooks Baseball PITCHf/x data – to their filterable and downloadable leaderboards. These two developments were nothing short of a breakthrough for this research, and specifically, it caused a major change to the model:
Previous injury history began to matter less. Intuitively, this does not make sense. One would think a previous arm injury – which is going to necessarily decrease the arm’s ability to resist future injuries – should lead to possible UCL injuries. And while that still may be the case, the model could not find a consistent relationship with previous arm injuries and the specific UCL injury. Previously, I found that when I combined the injury database classifications for wrist, forearm, elbow, and shoulder injuries into a single bucket, they loosely forecast a UCL injury – but I had previously found the same injury database confusing a broken leg with a UCL injury. Since it was programmatically scraped from disabled list feeds, and not edited or reviewed after the fact, it was liable to have a number of UCL or even Tommy John events misclassified as elbow tightness or a shoulder injury – especially if a player initially hit the DL for something other than the eventual TJS.
So in other words, it’s quite possible that the old injury database was catching self-fulfilling prophecies. Joe Pitcher hits the 15-day DL and 30 games in 2011 with forearm tightness. Turns out, he needs TJS in the offseason. So he misses 2012 with TJS, even though his UCL blew up in 2011. A simple DL scraper is not going to catch that nuance – and since I’m most interested in finding out in 2010 that Joe Pitcher is going to have elbow fireworks in 2011, bad DL data undermines that effort enormously.
All this to say: Injury history did not play into this model.
There are 19 factors in total within this model, and they are:
There are three important points of interest here: 1) The four different risk quartiles all sit fairly closely to each other, and the correlation with following-year TJS is – at its strongest – quite low. Extremely low. We need to remember that if there were obvious, intuitive connections to UCL injuries, we would have already found them. The connections here are all loose, complicated, and non-linear.
2) That said, the low-risk group has a lot of yellow boxes – meaning they have the most middle values of the three groups. They led the pack in vertical release point standard deviation (vert_stdev), and they threw the ball the hardest (HardVelo), though not by much (0.2 mph faster than the middle group). That is especially curious because previous model suggested fastballs were bad for an elbow’s overall health, and here we see guys with electric fastballs disconnecting from TJS risk. One possible explanation is that, when healthy, fastball velocity is up, and when hurt, it dips lower. This makes a compelling case for examining velocity changes too, and that is the next area I intend to push this inquiry (even though it showed little value in the previous study). But other than that, the low-risk group mostly fell between the two other risk groups. They barely had the worst xFIP-, but barely had the second best FIP-.
3) The high-risk group had a lot of soft-tossing inning-eaters. It is important to remember that relievers and starters are mixed together here, as well as pitchers who pitched partial seasons. Starters, by the very merit of pitching more innings, are logically at a higher risk of UCL (and all other) injuries, but since many injury-prone starters end up in the bullpen, it is not necessarily an obvious linear relationship. In fact, the medium risk group threw fewer innings than low risk group.
However, the average fastball velocity for the high-risk group was over a full mph slower than the next closest group. They also struggled to induce whiffs out of the zone, leading the group in out-of-zone contact rates (O-Contact% (pi)). Taken altogether, the model seems to suggest that pitchers who did not have great fastballs, but used them consistently and ate innings were at higher risk of TJS.
Speaking in these kinds of generalities, of course, does little good when we look at the actual list of high-risk players forecast in 2017 – with names like Andrew Miller, Kelvin Herrera, and Zach Britton in the highest-risk groups. Consider this an emphasis on how the actual model is exploring non-linear relationships. At minimum, the model is… complicated:
Grueling Math Details
The previous model employed linear regression, with “TJS Power” as the dependent variable. I devised TJS Power when thinking of a way to test for red flags in the years leading up to a UCL injury rather than the single year preceding the event. For the five years leading up to a UCL tear, I assigned the pitcher a numerical value counting up to the year before the injury. So, 2012 Gavin Floyd — who would go under the knife May 7, 2013 — had a TJS Power of 5. His 2011 season had a TJS Power of 4, and so on until he reached 0 back in 2007. For players that would never have TJS, they received a 0 across all years.
There are certainly flaws in this methodology, but the guiding principle in my research is: “It has to work.” To borrow a phrase from Ben Lindbergh and Sam Miller’s 2016 book of the same title: The only rule is it has to work. And from what I can tell, the TJS Power method works.
Believing the TJS Power method helped a great deal in the previous project, I continued using it. The wrinkle I added – beyond the vastly improved data – was changing the simple linear regression into a tree of regressions, specifically a classification and regression tree (CART). The methodology creates a massive decision tree based on linear regressions and allows the machine learning algorithm to find non-linear relationships that might otherwise elude us.
It comes with its risks – and I’m sure some machine learning experts spit coffee on the monitor when they saw my enormous tree above. One of the biggest dangers of this method is over-fitting – allowing the algorithm to find minute and inconsequential variables with minimal correlations and make them into bigger deals than they are in reality. For instance, at one point in the model, the tree splits based on Total Batters Faced (TBF), and those who saw greater than 823 TBF were then further split based on how many innings they had pitched (IP). Since IP and TBF measure almost the same thing, it is hard to understand why or how such a hair-split would be useful or meaningful.
The effect of overfitting is diminished predictive power, but the model was derived with a random subset of the data and tested against a separate subset (representing a third of the pitchers in my 2007 through 2005 database). This represents the predictive power presented above – a 90 percent above average risk rate. I then used the same model to test against what 2017 UCL injuries have already occurred, and I got the following result:
Had TJS in 2017? | Average of Risk+ | Number of Players |
---|---|---|
FALSE | -4% | 427 |
TRUE | 144% | 11 |
So far, it appears to follow the platinum rule: It works. But since the model is attempting to forecast TJS injuries in the following five years, it merits continued study and improvement. One concern I have is the volatility in year-to-year Risk+ rates. Pitchers — who on surface do not change their results considerably from one year to the next — can bungee from high risk numbers to minimal risk. Did they make just the right changes to their pitch mix, innings totals, and contact rates to ease the pressure on their UCL? Apparently so.
But what does that one year of high risk mean for the following years? In theory, the latest model results should reflect the current reality. But logic suggests a full year of abuse on the UCL is still a full year of abuse. Eventually, that should create problems, even if the pitcher’s current pitching approach has a lower risk rate.
Players
Let us take a moment to examine some of the specific players identified here.
- Anthony DeSclafani tops the list with the worst average Risk+ rating at 707% above average (a predicted TJS Power of 2.5). While that is not the highest single-season Risk+, it is noteworthy because he had no 2017 data to pull that high number down. Why? Because he has missed the whole season with elbow tightness, later diagnosed as a UCL strain. While he has not undergone TJS, it appears the model correctly predicted his impending elbow troubles.
- If you click the “1” button under the “TJS Next Year?” filter in the Tableau embedded above, you will find the list of players who underwent TJS in this 2017 season. Looking at their projected risk using the 2016 numbers, we see the model was very correct on Michael Pineda, Joe Ross, and Will Smith. It missed big on Drew Storen, Edinson Volquez, Shelby Miller, and Trevor May. This is worth reinforcing the idea that (a) the model will certainly miss a lot, but (b) it should ultimately hit better than it misses.
- When you filter “1” on the “TJS Next Year?” section, you will also note none of the 2017 season predictions appear. That is because they use 2017 data to predict 2018 injuries — which have not yet happened.
- Jeff Samardzija had one of the worst Risk+ numbers from the 2016 data, but his 2017 data suggests below-average risk. Looking at this pitch repertoire, he has thrown way fewer four-seam fastballs and cut fastballs this season, pushing his curveball across the plate at almost a 15% rate, according to Pitch Info stats. In 2016, he threw curves at a rate of 8%. He has also allowed less medium-strength contact, though I still struggle to find the intuitive connection between hitter contact strength and pitcher elbow injuries.
- That weirdo R.A. Dickey — who physically lacks a UCL in his right elbow — has been stricken from all the data, and did not influence the construction of the model.
- Martin Perez has earned two grim predictions in consecutive years. If I could persuade anyone to get a precautionary MRI or see a doc, it would be the Rangers’ lefty.
- Chad Green throws — almost exclusively — a 95 mph fastball. I would intuitively think he’s at risk of TJS, but the model assures me he’s the paragon of good elbow health. Maybe the model has discovered he’s an easy gas type, the sort of fellow who doesn’t have to work hard for his fastballs. I don’t know. But he ranks as the safest player on the list, and therefore the canary in the coalmine. If Green gets a UCL tear, it bodes quite poorly for the model.
Next Steps
While the development of this model spans almost two full seasons and an offseason, the late addition of Pitch Info data — and the ever growing wealth of on-field data — means there are still many rocks to turn over. If hitter contact truly does foresage TJS, as the model here suggests, then HITf/x and FIELDf/x data might also provide greater clarity into future UCL injuries. Adding minor league PITCHf/x data would more than quadruple our dataset, too, and could provide powerful weight to the model, as well as possibly unlock knowledge about young players suffering from UCL injuries.
Also, even the data used in this dataset can be further spliced and prodded — looking at repertoire changes, velocity and movement changes, and so much more. The truth is an infinity of variables can and should be tested in the blank-slate vacuum of a new model. Until we can predict UCL injuries better than tomorrow’s weather, there is no variable too bad to be left off the table.
More specifically, biometric data — accurate heights and weights, fluctuations in year-to-year weight, bench press maxes, long jump numbers, stretching and flexibility measurements — all deserve examination. Unfortunately, none of that data exists in the public sphere, or possibly at all.
Another room for improvement, and something I hope to test more thoroughly next year, is the use of neural networks and other machine learning mechanisms that might uncover further non-intuitive connections to Tommy John Surgery. There are many risks with moving further from human hands and into the cold efficiency of machine learning. For one, we risk losing the “lesson,” so to speak. Last year, it appeared obvious that throwing lots of fastballs after recent arm injuries increased injury risks. This year, the moral of the story is more hazy.
But these advanced techniques deserve consideration because, frankly, our intuition has failed to predict these injuries. And if we truly want to improve our treatment and prevention of these injuries, we first need to accept that the only rule is that it has to work.
A big, enormous thanks to Corey Dawkins from baseballic.com for opening the storehouse of his injury database to us. Also, a big thank you to FanGraphs’ ongoing support of free, publicly available baseball data. They pay a pretty penny for data from BIS, Pitch Info, and the like. Research like this is not possible without it. A thanks to Jeff Zimmerman and Jon Roegle for hosting and updating their respective injury and TJS databases.