The Full Boston Marathon Cutoff Time Tracker Dataset

In November, I launched the Boston Marathon Cutoff Time Tracker to project the expected cutoff for the 2026 Boston Marathon. Since then, I’ve made regularly updates to incorporate additional race results as they become available.

For those of you that are interested in making your own prediction and who are technically savvy, I’ve cleaned up the dataset and shared it on Kaggle. I will update this dataset each week, adding new results from the previous weekend’s races.

Please note that this is a large, raw dataset. Performing a useful analysis will require familiarity with a coding language like Python or R or with a data visualization tool like Tableau.

If that’s you, keep reading below for the details. If not, you may want to stick with the tracker. I plan on making a few enhancements over the summer so that you can explore how various assumptions could impact the final outcome.

Where’s the Dataset and What’s In It?

You can access the dataset on Kaggle here.

The dataset includes over a million individual results from marathons in the qualifying periods for the 2025 and 2026 Boston Marathons.

The dataset is comprised of three files:

  • Races
  • Results
  • BQStandards

The Race File

The races file contains a list of the individual races contained in the sample. Each race includes the name, year, date, and location of the race.

My original sample included races in the United States and Canada with more than 200 finishers, as well as London and Berlin. There are some additional races listed, and those additional races may or may not have matching results. But there is a column (Include) indicating whether that race was included in the core analysis.

There’s also an age band column. Some races report a precise age for a runner, while others only report an age group – effectively providing a five year timespan for a runner’s possible age. This is relevant if you are attempting to match multiple race results to a single individual or to estimate the impact of a runner aging up before Boston.

There are some additional dimensions in this file (country, continent) which are currently null. However, I plan to add that data in a future update if/when I added some additional international races.

Note that some of the races from the 2024 qualifying period are listed here – but the results are not included in the results file. I may add them later, but they require some additional preparation to play nicely with the data from the 2025 and 2026 qualifying periods.

The Results File

The results file includes the actual race results. Each row contains the name, age, gender, and finish time of an individual runner.

Each result is identified by a race and a year, and that can be used to match the results file to the races file. I find it useful to first filter the race list to only include the races that I want to focus on and then join that to the results file to effectively filter the results as well.

The chip time for each runner is listed in seconds. A few people may have a “0” for their finish. This indicates that they did not finish.

Age is an integer with the runners age, while age group is a five year age group that matches up with the Boston Marathon qualifying times.

Runners are generally identified as M (Male), F (Female), or X (Non-binary). A handful of runners are identified as U, and their gender is unknown. They remain in the results, but you may want to filter them out.

The BQ Standards File

The BQ Standards file includes the three most recent sets of Boston qualifying times.

The gender and age group columns can be matched to a runner’s individual result. Note that the open age group for Boston is under 35, but I’ve broken it out into smaller categories. Those four categories (Under 20, 20-24, 25-29, and 30-34) all have the same qualifying times.

The 2013BQ column contains the qualifying times that were effective from 2013 through 2019. They would be relevant if you were comparing with older sets of results. Note that the non-binary standards were not actually applicable at this time. There were not implemented until 2024.

The 2020BQ column contains the qualifying times that were effective from 2020 through 2025. You should use these qualifying times when identifying qualifiers from last year’s qualifying period (9/1/2023 to 9/13/2024).

The 2026BQ column contains the qualifying times that were effective beginning with the 2026 Boston Marathon. You should use these qualifying times when identifying qualifiers from this year’s qualifying period (9/1/2024 to 9/12/2025).

So What Do I Do With This?

Well, that’s up to you.

I’ve published a notebook on Kaggle to demonstrate how to filter the data, load the results, and identify qualifiers. That should walk you through the basics of getting started. It’s written in Python.

With Kaggle, you can create your own notebook to directly interact with a dataset on their website. Kaggle notebooks are basically a cloud version of Jupyter notebooks.

You can also download the full dataset as a set of csv files. You could then use your own development environment to analyze the data in Python, R, or another language of your choosing.

Tableau Public is a free data visualization tool, and that’s another option. You can use the csv files as a dataset, perform the necessary calculations in Tableau, and then create visualizations to support your analysis. This is what I used for the tracker – although I prepared the data in Python first to make the Tableau portion simpler.

I typically update the dataset on Monday or Tuesday each week. I’ll add a note here when races are added. I share a weekly update on Threads, and I also include the updates in my weekly newsletter which goes out every Sunday morning.

Feel free to use this data any way that you see fit. If you end up publishing on line, give me a shout on Threads – and I’d appreciate a link back here.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.