May 14, 2021 Sports Betting

Sports Data 101


Stephen Crystal, founder of SCCG Management, writes on driving data and opportunity in the US sports betting sector.

The availability of timely, high-quality sports data has never been more critical to gaming industry operators and consumers. Consider, though, the massive number of sporting events played every day around the world. It would be an unrealistically huge investment for even the most prominent sports media companies to cover and capture statistics from each of these events in real-time. This reality has created a new industry in real-time sports data firms, which can scale the collection of game events and sell access to that data to multiple consumers.



Not all data is considered equal. In the sports betting industry expansion across the US, there has been a lot of discussion on the control over data and what data can be legitimately used by sports betting operators to consummate wagers. Leagues have used these arguments to create additional revenue streams for their organizations by monetizing their data. While the organizations themselves can't participate directly in sports wagering revenue, integrity fees imposed upon the operators have been the league's preferred methods to drive revenue.

Some jurisdictions have made distinctions beyond the binary, official vs. not, creating tiers. For example, in Illinois:

  • Tier 1 sports wagers are made before the sports event has begun and determined solely by a final score or outcome of a sports event.
  • Tier 2 sports wagers are any other sports wagers. These are most commonly in-play sports wagers and proposition bets.

This complexity leads to the need to understand who owns responsibility for ensuring what data is being used in each jurisdiction. Does the data feed provider include metadata within the results message indicating whether the information is official, Tier 1, or Tier 2? Does the operator bear the responsibility for interpreting the data on their side of the process?



Most real-time sports event data, such as scores, are collected by human beings working for a data feed service. These employees or contractors are sometimes referred to as "scouts." These people enter this data into specialized software provided by the data feed service. Quality assurance processes manage this manual inflow of data using a combination of supervisors and systems that can flag data entries that fall outside likely norms.

Automation is also a significant part of this data collection process. Internet bots, or robots, are software applications that run scripts to perform tasks on the internet. It's estimated that half of all traffic on the internet is bot traffic. Sports data collection bots can scrape HTML from webpages containing the data. When that data isn't available directly on a crawled webpage, the bot can use specialized requests for data, such as Ajax crawling, which causes the website to respond with the data for which the data feed service needs. There are several other different methods for pulling this data from third-party web pages. In most cases, these are just technically different methods for requesting data from a web service. These still result in a response message containing structured data or a rendered webpage that can be scraped or read using optical character recognition, also called OCR.



Whether a human being or a system collects this data, it is then standardized and saved to a massive database. The database is connected to a web service, making this data available through the internet. This web service hosts the application programming interface, better known as an API, which returns specific data as requested from a third party – the customers of the sports data firm.

The API is essential as the customers of the sports data firms aren't always looking for a massive blob of all available data associated with a game. Instead, they're looking for a particular set of data needed for a business purpose, supported by the API rules defined in the API request. From simple requests like, "What is the current Game score?" to "How many Games did Team A lose with a score of '0'." These are all examples of pull requests for data.

In addition to requests for specific API data, we have Push APIs, which don't wait for requests for information. They send time-sensitive information to a customer as soon as the information is available. Push APIs know to send this information to customers because customers subscribe or connect to the feed as a known client. These Push messages are essential for sportsbook operators and sports media companies who need this information immediately to resolve a bet or display an outcome to viewers.

From the database to the web service, to the application programming interface, all of this technical infrastructure makes up the Content Delivery Network (CDN). The CDN enables the casino gaming, sportsbook, and igaming industry efficiently and at a massive scale. What remains undiscussed here is the engineering required to make this system react with the shortest delays possible.



The CDN databases normalize and store data to quickly and efficiently make data requests as fast, and easy, as possible for its APIs. Customers of the data feed services often need this historical data aggregated in different ways. The origin of the phrase, "Lies, damn lies, and statistics," is unclear, but its sentiment is that even perfectly accurate statistics require analytical thinking to use meaningfully. Analysts often push back on traditional practices by asking, "Why do we believe what we have always believed?" In this way, statistical analysis has identified flaws in conventional wisdom to improve decision-making. The movie Moneyball famously brought this thinking to the mainstream and has driven decision-making in areas such as:

  • Identifying players who over perform their peers in non-obvious ways and using those players to their peak effectiveness based on these talents
  • Predicting opponent behaviour by using historical data to understand future decision-making
  • Improving the way decisions are made, such as the timing for sending relief pitchers in baseball or when to rest players
  • Ranking players to determine line-ups



The sheer volume of data generated by sports is astounding. Sources including the Wall Street Journal and major sports leagues have collected statistics on action generated within games across all major sports:

  • Soccer, the leader in action throughout a game, racks up almost an hour of activity (50%) in a two-hour match.
  • Close behind is hockey, with an average of 60 minutes of action (43%) in a two-hour and twenty-minute game.
  • A three-hour baseball game contains almost 18 minutes of activity (10%).
  • American football, around 11 minutes of action in a three-hour match (6%).

In some cases, this may seem like relatively short windows of action, but consider that almost every relevant action taken by each player, every moment of the game, can create a fact or statistic about the game:

  • How far did the player run?
  • At what speed did the player run?
  • Who scored, what is the current score and how many points has the scorer made in this period?
  • Who carried the ball, for how long and how far?
  • Did an action result in a warning?

Any 10-second period in a game can generate dozens statistical data points, many of which the data feed provider captures through the fastest possible human and system processes. These systems deliver these facts on demand and are pushed in real-time to their customers, on average between seven and 11 minutes faster than those displayed to television and OTT channel viewers.

This invisible infrastructure empowers analysts, marketers, salespeople, broadcasters and fans worldwide. Demand for this data shows no signs of leveling off and continue to expand into new use cases for cryptocurrency oracles and AI. As such, we expect to see new industries that we haven't even begun to understand spin off from the ready availability of this high-quality real-time fountain
of sports data.

SCCG Management is a consultancy that specializes in sports betting, igaming, sports marketing, affiliate marketing, technology, intellectual property protection, product commercialization, esports, capital formation, M&A, joint ventures, casino management, and governmental and legal affairs for the casino and igaming industry.

Stephen Crystal has spent over 25 years in all aspects of the casino and gaming technology industry and its intersection with esportsand igaming. He has served as an attorney representing public and private gaming companies, a gaming executive employing thousands of employees, and an investor and advisor on over 1 billion dollars of project finance, mergers, and acquisitions in the casino gaming technology space.