With the last of the arbitration hearings officially in the books, we can now officially report that this was the most accurate year that the MLB Trade Rumors Arbitration Model has ever had. The model estimated salaries within ten percent of salaries for 69% of cases – breaking the previous record of 65% and well above the 54% low point just three years ago.
When I began working on this model way back in 2011, I defined success based on how often my model was within ten percent of the actual arbitration salary for all arbitration-eligible players who signed one-year deals. The initial goal was to be within ten percent for half of such cases. For the 2011-12 arbitration season, the model was within ten percent on 55% of all cases. The model has consistently been in that range or higher, peaking at 65% in the 2014-15 arbitration season, while only dipping below it once with 54% in 2019-20. It averaged 58% over its first nine years.
Over that time, I repeatedly ran tests on the model, considered new modeling techniques, and had discussions with agents and others with experience in the arbitration space about how to improve the model. There were steps forward, although after picking each piece of low-hanging fruit, the gains were smaller. Ultimately, I pivoted to a focus on more accurate and cleaner data. This was initially something that Bryan Grosnick helped with behind the scenes, and Darragh McDonald took over last year. They both helped tremendously.
One important process change that I incorporated into model updates in recent years is checking which players would have been the “biggest misses” after updating the model. In many cases, the salaries that “missed” were not reflective of the actual salaries earned. Yet the model was awkwardly contorting itself to fit those purported outcomes. Some of the process of improving data quality was just a matter of finding typos. But in many cases, it was about correctly identifying the “true” arbitration salary a player received. When players avoid arbitration via settlement, they often get performance bonuses, signing bonuses, options for future years, or multi-year agreements. These cases are incorporated into the modeling process where appropriate, but sometimes the “salary” a player literally earned was not really intended to account for the actual arbitration award he would have gotten at a hearing. Cleaning the data involved some subjectivity, but it was designed to better record the intended salary that teams and agents were treating as a baseline when they negotiated more complicated agreements.
More tedious updates to data accuracy are not the most thrilling part of model building. Coming up with creative mathematical methods or just innovative variables to utilize is a more rewarding intellectual exercise for the researcher. But the truth is that better data is often more important than a slightly smarter model. I will continue to evolve the model based on the relevant statistics and factors utilized in the arbitration process, but in recent years I ultimately improved the model more with better data without structuring it differently.
As a result, the model should be more accurate in future years than it has been in the past. See below for a graph showing the performance of the model each year.
Lefty_Orioles_Fan
When I began working on this model way back in 2011, I defined success based on how often my model was within ten percent of the actual arbitration salary for all arbitration-eligible players who signed one-year deals
Well, it was brought to you by the power of the Swartz
AverageCommenter
Congratulations Matt! Really puts into perspective how much goes into the content on this website, and that it’s not some nobodies making random guesses.
Sa'ed Faoul
bravo
BeansforJesus
69, nice.
Also, I love the generic excel graph. No frills, just info.
Johnny Shoe
I give you a well-deserved congratulations sir.
DogDays2
Now if we could just do something about the fans commenting!
Lol joking
Fraham_
Matt Swartz you’ve been missed for three years.
Kapler's Coconut Oil
Holy cow you’re right! It really has been that long
AdmiralPatton
Awesome!
In Seager/Hader We Trust > the 70 MM DH Ohtani
Great site! Congratulations! I do wonder if you overestimated salaries on average or underestimated them.
Saint Nick
Weird flex..
wmurphy24
Well done! Thank you for your contributions and keep up the great work!
raisinsss
Vik parchuri does a fun tutorial on war prediction using machine learning / ridge regression for dataquest.
I adapted it for ERA, and determined that Trevor Bauer’s 2020 was the most unexpectedly good season in the past 5 years while Matt Boyd’s 2020 was among the worst. Sss caveat here.
This could easily be changed to predict arb numbers. Uses pybaseball for stats, which is really cool. I don’t care enough to do it myself, but maybe it’s another angle.
vtadave
Can’t argue with a 69.
jmoff
Outstanding job of data analysis and model refinement. You’re now the expert across MLB for projecting arbitration salaries.. As a fellow engineer and geek, I applaud your efforts and respect the great work you’ve done..
Congratulations!
Poster formerly known as . . .
When you add the extra penny, 70%, it’s even more impressive. You’ve got a right to be proud.
lakeg
Another feather of excellence on your headdress!
Doug Dueck
Congratulations Matt Swartz! Job well done.
Jaysfan1981
We better not close comments when the article has to be made about former Cardinals employees supplying MLBTR with proprietary Arb software
In Seager/Hader We Trust > the 70 MM DH Ohtani
Censorship has been rather sporadic on this site. I wonder if directly commenting about the site gets you censored.
Steve Cohen Owns You
I think the writer mistakenly posted their annual performance review here. I wonder what the model predicts his pay raise will be?
krillin89
Congrats! You deserve it. You do great work
Braves Fan 85
The Swartz is with you
jorge78
On the app there is no chart down below or link thereof
jorge78
Found it on the app. Why not a breakdown on who player won most accurate and same for team. It could be a total and a percentage also the same for the 31% is that to much to ask?