Free Data: Hudl Statsbomb Release Five Top Women's Leagues
At Hudl Statsbomb, our commitment to the football analytics community is deeply connected with our dedication to elevating the women's game. To ensure that analysts and enthusiasts continue to have access to elite-level data, we are thrilled to announce our latest free public data release: five complete domestic women’s leagues, spanning 771 matches, 62 teams and 1500 players.
This major release features the complete 2023 NWSL season, alongside the full 2023/24 seasons for the Women’s Super League (WSL), Serie A Women, Frauen-Bundesliga, and Liga F.
Empowering the Women’s Football Community
Open-access data has always been the lifeblood of progress in football analytics. It sparks innovation and serves as a sandbox for aspiring analysts. However, access to comprehensive, public data for women’s football has faced significant challenges recently, leaving a distinct void for those looking to study the tactical and statistical nuances of the elite women's game.
We believe that the data community shouldn't have to take a step backward. Providing this data for free is our small way of standing up as active advocates for both the analytics community and women's football.
This is not a new frontier for us; for nearly a decade, Hudl Statsbomb has run a dedicated initiative providing women's teams across the globe with free access to our analytics platform and data for their own league. Today, we are proud to be partnered with dozens of elite women’s football teams around the world, all of whom utilise the exact same unrivalled event data that we are making completely open to the public today.
The Data
This release represents one of our largest single drops of free football data to date, delivering the exact same high-spec event data used by professional clubs worldwide. Across 771 matches, you will have access to detailed data for every on-pitch action - including passes, carries, shots, and pressures - alongside our industry-leading Expected Goals (xG) model.
The seasons included in this release captured some of the most historic moments in modern women's football, offering a rich field for analysis. In the WSL, you can dissect the final, title-winning season of Emma Hayes’ legendary tenure at Chelsea. In Germany, there was an invincible campaign for FC Bayern, who won 19 and drew three of their 22 Frauen-Bundesliga games, while in Spain, Barcelona won the Liga F title as one of their four trophies in their Quadruple season.
How to Access the Data
Accessing the data is straightforward. You can pull the full dataset directly into R or Python using our open-source packages, StatsbombR and statsbombpy.
The specific Competition and Season IDs for this release are outlined below:
- Women’s Super League (Comp ID: 37. Season ID: 281)
- Serie A Women (Comp ID: 131. Season ID: 281)
- Frauen-Bundesliga (Comp ID: 135. Season ID: 281)
- Liga F (Comp ID: 182. Season ID: 281)
- NWSL (Comp ID: 49. Season ID: 107)
To pull all five leagues simultaneously and clean the event data, you can use the following script:
# Install the package if you haven't already# remotes::install_github("statsbomb/StatsBombR")library(StatsBombR)library(dplyr)# Fetch all available free competitions and matchesComp <- FreeCompetitions()Matches <- FreeMatches(Comp)# Filter for the specific competitions and seasons in this releaseTargetMatches <- Matches %>% filter(competition.competition_id %in% c(37, 131, 135, 182, 49)) %>% filter(season.season_name %in% c("2023", "2023/2024"))
# Download and clean all event data (Note: Pulling all 771 matches in one go may take some time)events <- free_allevents(MatchesDF = TargetMatches, Parallel = TRUE)events <- allclean(events)events <- get.opposingteam(events)
If you are working in Python, you can loop through the competition and season ID pairs using statsbombpy to compile the matches and events:
# Install the package if you haven't already# pip install statsbombpyimport pandas as pdfrom statsbombpy import sb# Define the target competition and season ID pairstarget_leagues = [ {"competition_id": 37, "season_id": 281}, # WSL {"competition_id": 131, "season_id": 281}, # Serie A {"competition_id": 135, "season_id": 281}, # Bundesliga {"competition_id": 182, "season_id": 281}, # Liga F {"competition_id": 49, "season_id": 107}, # NWSL]
# Compile matches across all specified leaguesall_matches = []for league in target_leagues: matches_df = sb.matches(competition_id=league["competition_id"], season_id=league["season_id"]) all_matches.append(matches_df)
combined_matches = pd.concat(all_matches, ignore_index=True)
# Example: Fetching events for the entire dataset# Note: Pulling all 771 matches sequentially may take some timeall_events = []for match_id in combined_matches['match_id']: try: match_events = sb.events(match_id=match_id) all_events.append(match_events) except Exception as e: print(f"Error fetching match {match_id}: {e}")
combined_events = pd.concat(all_events, ignore_index=True)
We recommend keeping the event data specification handy while working with the data. These contain a list of all column names and variables in the data, with definitions. And to further help you navigate the data, we created the Using Hudl Statsbomb Data In R and Using Hudl Statsbomb Data In Python guides — ideal for those just getting started.
We hope this dataset serves as a welcome boost to the football analytics community, and we remain fiercely committed to championing the women's game. Whether through providing elite clubs with the data they need to gain an edge, or opening up our archives to the public, we will continue to actively push for greater visibility, equity, and insight across every level of women’s football.
We cannot wait to see what you build with this data.
The Hudl Statsbomb Team