How AI-Based Sports Analytics Is Changing the Game
What's the best job in the world if you're an analytical software developer with a nose for statistics and a desire to work with cutting-edge technologies -- and a sports nut to boot?
You might start with being director of data science at STATS LLC, a prominent sports analytics company that counts some of the most popular sports teams and biggest media companies in the world among its customers.
"I feel very lucky to do what I do," said Dr. Patrick Lucey, who holds that position. "I tell people what I do, and, you know, I feel very lucky to work in this domain and work with very talented people."
Those people include seven direct reports in his data science group, all with Ph.Ds, which speaks to the caliber of programmer required to excel in a burgeoning field that leverages some of the hottest technologies in use. That includes machine learning and -- clocking in at No. 1 on the hotness scale right now -- artificial intelligence (AI).
"We've kind of maxed out what we can do with tracking data. We have a lot of great things, but kind of going beyond that into more granular data, I think is the next thing on the horizon in terms of sports analytics."
Dr. Patrick Lucey, STATS LLC
That latter much-hyped technology is already prevalent throughout the tech arena, and, according to pundits far and wide, will become an even much more pervasive force likely to change the world in ways not yet imagined. One arena that it's surely impacting right now is sports-related analytics, which in turn is changing the business of sports itself.
"It's at the forefront of artificial intelligence," Lucey said of the sports analytics field in an interview with ADTMag.
STATS has been around for 35 years, well before the advent of mainstream AI, of course, though the company was quick to leverage the technology as it matured. STATS started out just collecting sports data and subsequently branched out into other areas, aided by new computing innovations.
Now it claims to serve more than 800 teams, leagues and brands, delivering near-real-time data from more than 600 sports leagues. It provides solutions for both fan engagement -- supplying data feeds to media outlets, for example -- and team performance, such as player tracking, athlete monitoring and video analysis.
It All Started with Moneyball
The sports analytics craze was jumpstarted by the publication of Moneyball in 2003. The book, written by Michael Lewis, depicted how the Oakland Athletics baseball team used obscure performance statistics to make data-driven personnel decisions rather than relying upon the subjective evaluations of minor league scouts.
Since then, the sports analytics arena has exploded, with dozens of companies getting in on the action. Along the way, innovations in computer vision, sight-tracking, machine learning, AI and so on have provided a business advantage to analytics firms that can best leverage cutting-edge technologies and techniques, some of which have only emerged in past couple of years.
A key goal of using those technologies and techniques is establishing better representations of collected data, Lucey says. That provides a solid foundation upon which new techniques can be used to glean insights, drive decisions and even redict outcomes through more efficient analytics.
But obtaining those foundational data representations is no trivial task. Some of the best minds in the business are working on that problem every day.
"The biggest challenge is actually getting the right representation," Lucey told ADTmag. "And by representation, I mean, 'What is that underlying data source. What is that representation, what form do we have to get that raw data in to make a computer understand and actually do comparisons?'"
While working with structured data is comparatively easy enough, things get problematic when working with unstructured data, such as tracking the various movements of players or balls. The complexity of the applied analytics grows with the complexity of the questions being asked, Lucey says. Structured play-by-play feeds are fine for answering simple, coarse questions about overall performance, for example, but for more complex inquiries, analysts need to work with more granular, unstructured data. And therein lays the data representation challenge. "If you get that initial representation wrong, you're basically trying to fit noise," Lucey says.
He said STATS has been at this a long time, so the data science team has come up with ways to better represent data for more accurate and efficient analysis to answer more complex questions.
What kind of questions, exactly? A research paper co-authored by Lucey -- "Learning Fine-Grained Spatial Models for Dynamic Sports Play Prediction" -- provides some clues.
"Under what circumstances is a ball handler likely to take a shot or pass to a teammate?" the paper asks in the introduction. "What defensive formations best deter a player's preferred actions? Such questions are of central importance in the study of decision making in team sports, or 'sports analytics.' In particular, we are interested in developing interpretable predictive models that can efficiently predict (or forecast) the outcomes of various game situations."
STATS encourages its data scientists to publish research papers such as the above, and the company earlier this year was selected as a finalist in the 2017 MIT Sloan Sports Analytics Conference Research Track for two research papers co-authored by Lucey: "Body Shots: Analyzing Shooting Styles in the NBA Using Body-Pose Attributes" and "Data-Driven Ghosting Using Deep Imitation Learning."
The paper on body pose attributes represents some of the more advanced research being done in the field. "The way I like to think about sports data is that it's a reconstruction of the match being played," Lucey said. "In terms of storytelling, the deeper and more granular data that we have, the better reconstruction. We can form a better reconstruction of that match. And to do that, we have to start looking at body pose."
Going from Analyzing to the Next Frontier: Predicting
And along with reconstruction, predictive capabilities are at the forefront of AI-based sports analytics.
"We have the ability now to -- using all of the tracking data that we have -- to kind of simulate what players will do in a certain situation, so that's really exciting," Lucey said. That new capability lets analysts ask very specific questions, Lucey said, such as "how often does a team do this? How often does a player do this? What's the likelihood of a player making a shot in that situation? What if I switch that player with another player? What if I want to simulate how that team would react in that situation?"
Peeking into the future of the sports analytics field with newly available predictive capabilities, Lucey said, unsurprisingly, that it all depends on data. Despite the huge data sets available -- and more being streamed all the time -- data scientists will never have enough, he maintained.
Although admitting that the above claim may sound bizarre, "in terms of granular data, we're never going to have enough," Lucey said. He said the field has basically maxed out on what it can do with tracking data, so it needs to go beyond that to work with more granular data. He said that's the next big thing on the horizon of sports analytics.
"So to help teams find that winning edge, to get better context, we need to be able to have methods which can start to synthesize and generate new examples," Lucey continued. "So we think we have enough to do that, but the idea of generative models, which is kind of following on from a lot of work in deep learning, being able to synthesize more examples.
"And there's so many different examples in machine learning and artificial intelligence which shows that we can actually do better predictions based on that. So that's the next thing. Being able to synthesize new examples, which will give us more data to work with, that's what I think is the key. Not only getting more granular data, but being able to synthesize more data. I think that's the key point, because, again, we're very, very good at contextualizing that data, but once you kind of slice and dice the data, you really haven't got enough examples, and will never have enough. We need to be able to synthesize more examples."
So better generative models that can synthesize more examples lead to better predictions -- the next frontier of sports analytics.
Data Science/AI Tools Used
In the meantime, Lucey and fellow data scientists are cranking away at predictive capabilities and other new developments in the field.
I asked Lucey what tools are typically involved in the day-to-day pursuit of sports analytics nirvana.
"At STATS, we use a varied number of languages -- our predominant one for prototyping is Python with Sci-kit learn (Numpy, Scipy and Matplotlib) and we use standard IDEs for that (Pycharm, Jupyter Notebooks and so on)," Lucey said.
He also expounded on how those tools are used in conjunction with cutting-edge AI technologies, technqiques and approaches.
"We refer to AI as emulating what a human domain expert does."
Dr. Patrick Lucey
"Within our group, we refer to AI as emulating what a human domain expert does," Lucey said. "Generally, through a machine learning lens, that could be seen as a supervised learning problem where given a large amount of data, we map the input features X to a label y (which is annotated by a human expert). In simple problems, an off-the-shelf classifier works quite well (SVM, Random Decision Forest), but for a majority of the time we are dealing with complex signals so using techniques which can learn the non-linearities (such as deep neural networks) are required.
"However, a caveat on that is we want our results to be interpretable so just applying these 'black boxes' is not enough, so we have to utilize the various tools we have in our tool kit (that is, decision trees, various forms of clustering -- aka unsupervised learning) to help explain our predictions."
How Can Developers Get In on the Action?
So how does a stats/sports-oriented programmer get to tap in to such excitement in the field of sports analytics -- or even become a data scientist?
"A good way to start is actually try to answer a specific question," Lucey said. "A lot of people who get jobs in these industries have written a blog or written an article kind of showcasing their technical chops."
Those chops could be put to use in busting a sports myth, Lucey suggested, or reaffirming a commonly held beliefs about sports. "A recommendation that I have is just to start with a simple problem" Lucey said.
He also provided some skilling advice for developers who would become highly paid, much-in-demand data scientists -- a position continually touted as one of the best jobs in America and was named "the sexiest job of the 21st century" back in 2012 by Harvard Business Review.
"Get a working interactive demo/visualization tool which can quickly allow you to debug and see if your model is doing what is expected."
Dr. Patrick Lucey
"There are two big things" that developers can do to skill up for data science, Lucey said. "First, with machine learning it's crucial to get a grounding in the basics. There are a lot of really good online courses available that can get you started in that area. Deep learning is a big area as well, but it is not everything depending on the type of data you have. Learning what tool to use and when is very important.
"The second thing is to get a working interactive demo/visualization tool which can quickly allow you to debug and see if your model is doing what is expected."
Lucey acknowledged that skilled data scientists can be hard to find, but indicated if candidates are good enough, companies like STATS will find a place for them. They might even be good enough to join the team led by Lucey, who repeatedly counts his blessings for being able to do sports analytics at STATS.
"I don't think there's any more interesting data than what we have in sports," Lucey said. "Sports is so much fun, in my opinion."
Mine too. Now if I can just learn to apply some machine learning and AI techniques when setting up next fall's fantasy football lineups, I might be a repeat league champion. Go team!
More Coverage of Data Science
Posted by David Ramel on July 18, 2017