Critique of a Research Study of Strava Data and What It Means for Marathon Runners

Towards the end of last year, a group of researchers published a study based on a large sample of runner’s Strava data: The Training Intensity Distribution of Marathon Runners Across Performance Levels by Daniel Muniz-Pumares et al.

I saw this article shared in a couple subreddits, including r/Marathon_Training. I’m pretty sure there was a thread in r/AdvancedRunning, too, but I can’t find it now. And there was some media coverage, including articles in Runner’s World and Outside Online.

On the one hand, the study is great. It offers some insight into training patterns – which we don’t have a lot of data on. And it correlates that to race performance to give us a little peek into what does and doesn’t work.

On the other hand, this study – like all research – is easy to misinterpret when it gets into a layperson’s hands. Especially if you haven’t read the source material, it’s easy to read a journalist’s perspective and then run with that mediated perspective – which highlights the wrong things.

So today I wanted to take some time to critique this study – not because it’s bad, but because the way that the results have been understood by some runners is faulty. I want to highlight some of the shortcomings of the study and the dangers of misinterpreting it, and then I’ll highlight what I think are the appropriate lessons to take away from this.

Weekly Mileage – A Simple, Misleading Chart

In this case, much of the conversation focused on the average weekly mileage of the runners and how it was relatively low compared to what most training plans recommend.

The study divided the runners into groups, based on their finish times, and then calculated a variety of data points for each group. The groups spanned thirty minutes – i.e. 2:00 to 2:30, 2:30 to 3:00, 3:00 to 3:30, etc.

One of the stats they shared – a stat that the average runner can easily understand – was the average weekly mileage for each group. For the journalist covering the article, it’s easy to put this into a chart and say that runners with x:yy time ran z miles per week. And for the reader, it’s an easy thing to latch on to.

In this case, the 2:30 to 3:00 group ran an average of 42.6 miles per week. That seems a bit low.

There could be reasons for that. And a critique of the study can help illuminate some of those.

But if the lesson runners take away is that, “Oh, I only have to run z miles per week to reach x:yy time,” then they’ve taken away the wrong lesson. And not just because this is an observational study, and it deals in correlation and not causation.

Is This Sample Representative?

The first question you should ask when you read a research study is, “Is this sample representative?” And a related follow up is, “Is this an appropriate sample to draw transferrable conclusions from?”

On one level, this study is more representative and transferrable than many others. Many studies about running deal with trained, elite runners – and there are good reasons to question whether amateur runners should simply replicate the training of the elites.

In this case, the sample included approximately 120,000 runners who use Strava to record their marathon training from 2014 to 2017.

Here are a few observations about this sample.

It’s old. The time period – 2014 to 2017 – is from a different era in running. This was pre-COVID and it was before super shoes. The sport has gone through major changes in the last five years, and that is problematic for drawing conclusions that are directly transferrable to today.

The sample is 75% men. Although the running world still tilts male, the American marathon running population is closer to 60-40. The rest of the world might be closer to 65-35 or even 70-30. But the sample is more heavily tilted towards men than a random sample.

A majority of the sample (55%) are over 40. The age distribution varies around the world, but this definitely skews older than the American running population in 2014-17.

If you zoom in on the men – the largest part of the sample – and take a look at their time distributions, these are also uncharacteristically fast. About 10% of the men finished under three hours, 70% finished under four hours, and only 15% finished above five hours.

That distribution is much faster than the actual distribution of finishers in that time period. In this detailed analysis I did of American runners, the top 10% mark is around is around 3:10 and the median (50%) mark is about 4:20. The slowest 25% of men finished above 5 hours.

The point is … the group of runners in this sample overemphasizes a) men, b) older runners, and c) faster runners.

Does This Study Look at the Whole Picture?

The study is looking at sixteen weeks of training data.

Yes, that’s the time period that we’d normally consider “marathon training”. But it’s not the full equation.

Zooming out and looking at a runner’s history is important. The sixteen weeks leading up to a race is about sharpening what you’ve built over years. The miles you’ve put in over the year leading up to that training block are equally important – as are the previous years, if you have a long history of running.

Consider for a moment that this sample includes a lot of men in their 40’s and 50’s. They have likely built up years of training. At this point, they could potentially be banking on that previous training – putting in fewer miles today to maintain the fitness that they’ve built over the years.

If someone is trying to go from 3:30 to 3:00, training at 40 miles per week probably isn’t going to get you there. But if you spent your 30’s training hard and you could easily go sub-3:00 … then you likely will have an easier time maintaining that fitness into your 40’s on lower mileage.

The training data also doesn’t include cross training. And as runners age, it’s common to try to reduce stress on the body by swapping out running for something like cycling. Again, it may not be the best way to build fitness. But it’s a good way to maintain it.

Take, for example, the FIRST marathon training plan. It includes three hard workouts per week. But it also includes lots of cross training, and I suspect it’s a lot more effective with experienced runners who are cutting back than with newer runners who are trying to build themselves up.

As Paul Harvey would say, you need to know the rest of the story. And the sixteen weeks of training documented Strava is just the beginning.

Did These Runners Reach Their Potential?

Finally, there’s the question of how the study defines marathon performance.

It correlates these training metrics to finish time. This is a simple, easy to understand metric. And the correlations work out – higher training volume leads to better times.

But that’s not really the best way to measure whether training is effective or not.

It’s impressive (to most people) to run a marathon in under three hours. But what if you’re capable – with proper training – of running 2:30 or 2:45? Is that sub-3:00 still a success?

One way to measure this is to look at equivalent race times. Based on a 5k or 10k race, you can predict what marathon time a runner should be capable of. If they fall short, then there’s a good chance they didn’t put in sufficient mileage and their training was sub-par.

For the younger runners, especially, this is a critical question. A young guy with an athletic background – say a former high school and/or college athlete – may be able to knock out a quick 5k with minimal training. Converting that time to a full marathon is more difficult, and these runners in particular benefit the most (and quickest) from the usual advice to just run more.

Conversely, if they have sufficient speed, a young guy can achieve a decent result – say sub-3:00 – with subpar training. If you can run a 17:XX 5k, then yeah. You can probably eke out a sub-3:00 marathon with less mileage than someone who spends a few years knocking their 5k down from 20:00 to 19:00 to 18:00.

This wasn’t the focus of the study, so I won’t knock the authors for not looking at it. But they do have the data. They used runners’ best performances at other distances to calculate their critical speed. They could do the same to predict what their marathon pace is expected to be – and then use that prediction to understand whether the training was adequate.

I suspect that if you modified the study in this way and correlated training volume to both a) marathon finish time and b) percent of potential marathon time, the average weekly mileage of the more successful runners would be higher.

Here Are the Key Takeaways For Me

The goal of the study was not to measure and determine the mileage required to reach a certain time. That’s just a convenient byproduct of the study – and one which readers tended to latch on to.

Here are my key takeaways from the research study – things that I think hold true even if there are questions of how representative this sample is.

More Volume Correlates to Faster Times

The authors took every measure of training volume – total mileage, total time, total runs, total long runs – and they all correlated to faster finish times.

So rather than focusing on how many miles it took to get to x:xx time, the takeaway is that more is better. Pretty much always. As long as you stay healthy.

Here, I’m going to take a small logical leap. But the time you actually reach is going to be very much dependent on a) your total training history and b) your speed before you start your training block. And total training volume is the key determinant as to how much of that potential you can eke out.

Time Spent In Zone 2 and Zone 3 Stayed Consistent

The goal of the study was to look at the amount of time runners spent in different zones. They used a three zone model, which you an basically understand as a) easy running, b) around threshold running, and c) faster than threshold running.

When they graphed the percent of time spent in each zone, things look a little funky. But when they graphed the total time spent in each zone, it made perfect sense.

Whether runners were fast or slow, they spent a similar total amount of time at threshold and at faster than threshold paces. The key variable was how much additional time they spent on easy running.

The basic idea of 80/20 running works fine if you run 6-7 days a week. At that volume, spending 20% of your time at effort makes sense. But if you run significantly more or less than that, it doesn’t hold up.

Often, I see people ask, “If I can only run three to four days a week, how hard should those days be?” And the answer is probably mostly hard. Trying to back off and spend four out of four days on easy running isn’t going to be effective. With only four days of running, it still makes sense to put in two hard days and keep the other two days easy.

Training Volume and Intensity Distribution Changed Over Time

One of the findings in the study that wasn’t discussed much in the reporting was that the key variables changed a lot over time.

They broke each variable out into four week periods – 13-16 weeks pre-race, 9-12 weeks, 5-8 weeks, and 1-4 weeks. Early in the training cycle (or still in a base period), at the peak, and then during the taper.

The trends here are interesting.

On the one hand, volume goes up as you get into the peak period. From 13-16 to 9-12 to 5-8, training volume goes up – especially among the fastest runners.

At the same time, the training distribution shifts towards more easy running. At 13-16 weeks, the amount of time spent doing moderate or hard running is higher. And as they get into peak training, they shift towards more easy running.

What Do You Think About This Study?

Have you read the study? Or have you read some of the reporting around it?

What are your thoughts?

At the end of the day, I think this is a useful study that adds some context to what people are actually doing. I’d love to see a similar study done today – in the post-COVID, super shoe era. And I’d love to see some more nuance to the data, breaking things out further into subgroups and disaggregating further along age and pace.

But perhaps the single most important takeaway for me is that many amateur runners are undertrained – and they would do better if they ran more.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.