2021 U of T Training Study, p/b TrainerRoad (2022 update: results posted!)

For me, the key was “on the trainer”. So I said let than FTP because honestly I think mentally I would have a hard time holding my FTP inside for that length of time.

Outside is a whole different story. I’ve done multiple 40ish minute climbs and averaged my FTP over the entire time.


That was my rationale. I’m a relatively fast twitch cyclist. Not a track sprinter, but definitely not a time trialist. If you asked me to write down a range myself, I’d write 95-102% FTP. I don’t think I can maintain my FTP estimate for 60 minutes, and I would guess that I actually can’t. The question asked about 40 minutes. It obviously has to be less than 105% of my FTP, because one of the common estimators of FTP is 95% of your 20 minute power. I think I responded 95-99%, as that’s the closest option to the range I gave.

What I really wonder about is the people who responded 105-109%. Again, we think the average person can do about 105% of their FTP for 20 minutes. Some of us think we can do that or more for 40 minutes. Now, I bet there have got to be some people who can do 105% of their actual FTP for 40 minutes. I know one cyclist in my area who is basically made of badass and slow twitch muscle. She can ride all day, just not super duper fast, but literally all day. She has said that she probably has one fast twitch fiber in her whole body. That sort of athlete could perhaps pull off 105% for 40 minutes. The thing is, that means the 20 minute FTP test might mis-estimate her actual FTP.

1 Like

Hey this study was mentioned in Canada’s largest national newspaper…The Globe and Mail

Paywayll (article below)


Back in November, a University of Toronto doctoral candidate in exercise science named Michael Rosenblat started recruiting cyclists for [a study on interval training. Of course, none of them could actually come into the lab – but that was no problem. Subjects from around the world could train in their own basements using their own smart trainer and a piece of software called TrainerRoad, while Rosenblat monitored their performance from afar.

Other technologies are also making it easier to take research out of the lab. A study of the Apple Watch’s heart rhythm detection app, for example, collected data from over 400,000 participants. For a field like exercise science that has long been plagued by unreliable studies with a dozen or fewer subjects, the workarounds necessitated by the pandemic may turn out to have long-lasting benefits.

1 Like

Herd Seiler mentioning this study in a recent podcast (FastTalk). Have their been any updates on the study results?


Have not seen anything on Jem’s blog. He has posted a few 2022 articles on the blog, and appears active on Twitter (I don’t tweet but he has a feed on the blog page). Maybe he’ll pop in here and let us know if the study is on track and if data are still being generated or if they are on to the analysis phase.


Blog Link: https://sparecycles.blog



I haven’t listened to the FastTalk podcast with Seiler, but Michael Rosenblat (https://www.evidencebasedcoaching.ca/) has been hard at work on this project. Expect to hear something in the next couple of weeks.

Yes I’m more active on twitter these days where I post preliminary analyses & anecdata from my PhD work with NIRS & muscle oxygenation


Jem, Thanks for hopping in and best of luck with your ongoing research!!

1 Like

Hi everyone,

Michael (Dr. Rosenblat) just posted a video summary of this study

And a summary article posted to his website

Stephen Seiler just posted about it on twitter

Thanks again to everyone who participated as a subject, and big thanks to @Nate_Pearson and Corey for their help setting things up on the back end.

I think the biggest takeaway for us was learning the many logistical challenges to setting up a remote randomised study with enough scientific rigour to control as many variables as possible, but maintain a high ecological validity.

The challenge we saw and ultimately the biggest limitation to this study was a high subject drop-out rate. We received very good feedback from some of you who did and did not complete the study, but I’d be interested in hearing from anyone willing to share here (or private DM me) how do you think we can improve the subject participation experience for next time?

If you participated, whether you completed the study or not, what were some of the hurdles you encountered? And any advice for us for how to overcome those hurdles?

Would you be interested in participating in future virtual training research? Under what conditions? (eg. only outside of my competitive season; only if I got individual feedback that I could use in my training; only if I could perform the training sessions outside, etc.)

Also happy to discuss anything else. This is crowdsourced science! The best way to improve the process is by getting more input from you the subject-stakeholders.



As someone in the field of research, I am very excited to see this collaboration. Way to go!

I recently read an article that, Long covid may set you back a decade in exercise gains, and thought it would be cool if the TR team would work with Matthew Durstenfeld and their team to integrate an Informed Consent agreement and additional survey questions if opted-in to be able to provide more long-term data towards these types of health initiatives within the health community.



Just as a caveat - I haven’t had a chance to watch the video above yet

I participated in the study and had several (maybe 4 total?) 30 minute zoom calls with Michael both before, during, and afterwards. I also participated in a case study for him that tested a different methodology

I had no issues following the protocol - although when I saw the cohort I was randomized into I wasn’t particularly stressed by the workouts I had to complete - they were very easy for me

While I had no issues - and would happily try another trial of this type during my winter months - I would suggest that you be as open and communicative as possible both about the purpose of the trial, the importance of strict adherence, and what, if any, flexibility was allowed before the data became meaningless (i.e. if you miss an interval session in a given week - that’s ok, but it can only happen once during the trial period vs. if you miss a session once we cannot use the data)

Happy to discuss in more detail - if you’d like PM me here or Michael has my contact information.


That’s really useful feedback, thanks.

Yeah, I think up-front making it clear what the ‘costs’ are for dropping out is crucial. Because there is really no cost to the subject (and there shouldn’t be, for ethical voluntary participation reasons) for dropping out, but the cost is quite high for the research. So we need to communicate what the obligations are for participation, and where the boundaries are for acceptable deviations from those obligations.

I think we would have benefited from a better face to face selection/filtering process for subjects. Maybe the process was too automated? It worked well for subjects who reached out to us with questions, but not for the majority of ‘quiet quitters’ (if I can appropriate that term :slight_smile:). That also shouldn’t be an obligation of the subjects, so it’s something we need to be more proactive on.

1 Like

Are all the spots filled? The link seems not to be working

Sorry, yes this study was conducted in 2020-2021. We have just produced the results, hence resurrecting the thread. I’ve edited the title now


Interesting, though a bit disappointing in that lack of power constrained the possibility of good inter-group comparisons. However, I thought that this was a good first try, with plenty of ‘learnings’.

Three issues come to my mind:
[1] Why was there a target for the number of participants? I understand the need for a minimum, but the maximum could surely be more flexible. Traditional methods for this kind of research impose high costs on the researcher of having more participants + the cost to participants is high, since they have to turn up to a lab for their training sessions. But in this kind of research, the major cost to the researcher is clerical [recording the data, keeping track of people] together apparently with some one-to-one discussion.
[2] I presume that the ethics committee was worried about older people having intervention-induced health issues. Is that so? But surely those concerns can be overcome by participants getting some sign-off from their doctor. [I speak as someone close to twice your age limit.] Given the existing interest in age-related training effects, it would seem to me valuable to try such an extension, though of course any one study can only test so many variables.
[3] You have the traditional female participation problem. That seriously needs to be addressed in future work of this kind.

And finally, a question: what was the locational distribution of the participants? I mean: country of domicile.

1 Like

Jem (and Michael if reading),

Congrats on giving this a shot. As a clinical trialist in a former life, am all too familiar with the difficulties of studying humans. A few comments / suggestions after reading the summary. Apologies if the questions are answered in other areas but am not following the socials.

The dropout rate, or perhaps differently termed, the non-initiator rate post sign-up, is very large. It looks like at least half of the signed up individuals simply disappeared? It would be worth trying to contact those folks to find out what happened. You guys likely have tried, but if not, perhaps provide an anonymous way for non-initiators to tell you why they didn’t start. It’s going to be a combination of best intentions not panning out, life intervened, protocol too hard, injury, etc. But knowing the answer rather than guessing, might help with future design.

Am wondering if non-initiation could be partially age related? e.g. perhaps a study of older individuals (45-65 or something) might provide a more stable or comitted cohort. Was there an IRB/ethics reason for capping age at 45, or was that a study specific decision to try and limit variability? If the IRB would permit it, and you guys are up for study #2, give it a shot with an older group and see what happens! (TL;DR Kids are unreliable and retired people have more time to do things like crowd sourced physiology studies).

Non-centralized trials, site independent trials, other flavors of studying people without requiring a ton of site visits, are a hot topic in drug development (and other forms of medical research). While drug development is not ex phys, if you do additional studies in the future, would suggest reaching out to some of the companies with growing expertise in the non-centralized study arena.

Human touch matters. Designing in interaction with a study coordinator or PI before starting and during the study for updates could be useful. One of the things we hear from patients in drug studies is they want to know more about what is happening. Blinding is critical for those studies as is avoiding bias and other issues. But your study is not going to a health authority, and you need completers foremost, so consider that engagement and completing is critical and think of ways to enable and encourage.

While pure altruism is great, incentivizing is even more great (LOL). Consider offering something of extremely low monetary value but extremely high “swag” factor to participants for initiating and completing. For example, a T-Shirt at the end for folks who initiate, complete the tests and complete an end of study questionnaire. A reasonable IRB should permit that type of thing and it wouldn’t be hard to find a sponsor for the reward.

There are other ways to approach the reward and engagement topic that doesn’t require any physical object. Will save those thoughts for another day though as this is getting long already.

For cyclists, I think the original question posed is important, and it would be great to see some guidance. Null hypothesis is that any six week VO2max block in a reasonably trained person will yield a similar benefit. But wouldn’t it be fun to have actual data of say 30/15 vs 6x3 vs The Empirical cycling Vo2max block are similar or actually different? With data collection including (a) completion/compliance difficulty (eg the best block ever created is useless if riders can’t/won’t complete the training block) and relative benefit on the measure of interest (VO2max surrogate like a 5 min test, wPeak, etc). You don’t need 100’s of subjects to do this, you need maybe 50-75 motivated subjects who will complete the study. Depends where you set alpha and what you assume for effect size.

Tl;DR - You guys are definitely onto something decent here. It might be that random crowd sourcing is not effective and need to do some preselection to find subjects more likely to complete.

I hope you’ll give it another try.

Good luck and best success in your studies and academic careers.

-Darth (LOL, have to preserve the anonymity but Jem has my contact info from a prior PM discussion)



Well said! What is the point of an age restriction. Also it’s usaully Dudes they want, and us females don’t get a look in :frowning_face:


How does the intensity in the table correspond to FTP (roughly)?

1 Like

From the link to study on Evidence Based Coaching website:

The IET in the current study incorporated shorter stage increments (12.5 watt increases every 30-seconds). Interval work-bout intensity was programmed for each participant at a percentage of the difference between Wpeak and the WTT (15% for groups 1 and 2, and 30% for group 3 and 4); where WTT was used as a proxy for the metabolic steady-state threshold. The group mean power output for WTT was 70% ± 4% of Wpeak. This is consistent with CP and MLSS as shown in previous literature where CP can occur at 67% of Wpeak [17] and MLSS at 70% of Wpeak [18]. The mean power output during the interval sessions for all subjects in the current study was 77% ± 4% of Wpeak (75% ± 3% for groups 1 and 2, and 79% ± 3% for groups 3 and 4).

They used 40-min time trial power - TT in that screenshot - as proxy for metabolic steady state.

If you pull up Table 4:

HIIT work intensity by group, using Pre values of watts for TT (FTP) and Wpeak (MAP/pVO2Max)

Group TT / FTP Wpeak / MAP / pVO2max HIIT Interval Target HIIT as % “FTP”
Group1 250 344 264 105.6%
Group2 221 329 237 107.2%
Group3 257 371 291 113.2%
Group4 242 345 273 112.8%

Check my math, I’m on a conf call and whacked that out.


  • about 106% FTP for the 4x6-min (4-min RBI) and 12x2-min (2-min RBI) groups 1 and 2
  • about 113% FTP for the 12x2-min (2-min RBI) group 3 and 16x30/15-sec group 4

Wow, great feedback everyone. I really appreciate it.

Forgive brief comments for now. I’m definitely reading, thinking about, and considering all your points.

Ethics doesn’t like open-ended recruitment, on the premise that excessive recruitment beyond the number of subjects predicted with a power analysis to find a significant or meaningful outcome, is undue burden and risk to patients and lab resources.

Yeah I’m really not a fan of the age limit in the 40’s by most ethics boards for these kinds of studies. But there is a strong caution around the risks of maximal exercise from ethics boards and insurance providers. I have strong opinions about this. Exercise is medicine and one of the most effective treatments we know of for the broadest range of health issues. And yet, this is the system we work within.

It’s a cost-benefit decision made in a different time, then baked into the system. It’s tough to overcome that inertia in a naturally conservative, “first, do no harm” field like medicine & human clinical research. But I have to say, there are good historical reasons why the current system exists. It’s just hard to overcome path dependency, even when we “know better”.

I absolutely agree female representation in sport science is one of the largest issues limiting applicability of research (overlooking ~50% of the population is not acceptable). And we were not able to alleviate that disparity in this study. Specifically here, though, I can’t say I fully understand why we didn’t have more females participating. So I’d be interested to hear especially from any female athletes maybe what you think the limiter was? (whether you participated, had the opportunity and couldn’t/decided not to, or just have an opinion on the topic)

We did not have any exclusion criteria that should have lowered female participation (that I am aware of?). We assumed we would get more male volunteers at first asking, so we specifically reached out to female coaches, coaches of female athletes, and Womens’ teams. Maybe just like we under-estimated the drop-out rate, we also underestimated the engagement rate from females, and needed to make more of an effort at recruiting female athletes? I will have to look back at how we promoted recruitment.

There are some other systemic issues here in a predatory, publication-obsessed field with limited resources, having to minimise variability in order to manage project scope to actually complete projects, to get degrees and meet deadlines, to pay bills and such, along with all the historical prejudices that I don’t mean to minimise but I think should be table-stakes at this point to recognise that they still exist, which results in females being severely under-represented despite individual best-efforts at inclusion… so yeah, the demonstrated preference of sport science as a whole is still that “we just want dudes” :man_facepalming: Although that is gradually shifting.

Yes. Roughly FTP ≈ Wtt for the group. But lets consider individual variability vis-a-vis fatigue resistance at FTP (time to exhaustion at FTP/CP range ≈ 15-70 min) to put some 95% prediction interval error bars on that estimate. So it’s probably more like:

FTP = Wtt ± 0.5 W/kg

Please don’t consider this advice to use a 40-min TT to predict FTP. There is zero way for an individual to know their precise FTP/CP/metabolic steady state power by performing a single 40-min TT.

But a 40-min TT is a pretty great performance indicator on its own: if it goes up (above some minimum detectable change) then you can say you’ve improved your performance!


Reasonable, but to be clear you only estimated Wpeak (from 30-sec ramp steps) and 40-min TT. I’ve said estimated Wpeak because I’ve seen meta studies point out Wpeak can change based on ramp protocol. In other words, Wpeak-30secRamp can be different from Wpeak-1minRamp and Wpeak-2.5minRamp, is that right?

Why did you use 30-sec ramp steps instead of 1-min or say 2.5-min steps?

On a related topic, your thoughts on predicting FTP in a ramp test by using a fixed % of best power over time=StepSize. For example 70% of best 30-sec power in the studies 30-sec ramp step, or 75% of best 1-min power in a TR ramp?

You use terms like “zero way” and “precise FTP/…” and “single 40-min TT” to qualify, which I agree and cannot debate against, but on the other hand if you are well trained and have some experience doing threshold intervals, I’d argue you can learn a lot more about your FTP and TTE from a single 40-min TT attempt than you can from a ramp test.

Fun with math…

Group 40-min TT / “FTP” Wpeak / “30-sec Ramp MAP” ”FTP” as %Ramp
Group1 Pre 250 +/-22% 344 +/-20% 72.7%
Group1 Post 262 +/-19% 382 +/-18% 68.5%
Pre vs Post 104.8% 111.0%
Group 40-min TT / “FTP” Wpeak / “30-sec Ramp MAP” ”FTP” as %Ramp
Group2 Pre 221 +/-18.6% 329 +/-20.6% 67.2%
Group2 Post 219 +/-16.9% 334 +/-17.1% 65.6%
Pre vs Post 99.1% (decrease) 101.5%
Group 40-min TT / “FTP” Wpeak / “30-sec Ramp MAP” ”FTP” as %Ramp
Group3 Pre 257 +/-14.8% 371 +/-12.1% 69.3%
Group3 Post 263 +/-15.3% 388 +/-11.6% 67.8%
Pre vs Post 102.3% 104.6%
Group 40-min TT / “FTP” Wpeak / “30-sec Ramp MAP” ”FTP” as %Ramp
Group4 Pre 242 +/-23.4% 345 +/-18.6% 70.1%
Group4 Post 247 +/-22.8% 359 +/-18.3% 68.8%
Pre vs Post 102.1% 104.1%

Not only is there fair amount of variance on 40-min TT (FTP proxy) as %ramp, there is considerable variance on group average TT power and Wpeak. Had to ignore all that and using averages as I don’t have data per individual.

Looking at that its hard to not think hard about the logic of using a 70% multiplier to estimate FTP from average watts of last step in a ramp.

What if the participants used ramp’s 70% of best last step power to pace a 40-min TT? Would it matter in practice? Group1 data:

  • group1-pre average of 344W * 70% = 241W target for TT, while they actually did 250. Target about 4% too low.
  • group1-post average of 382W * 70% = 267W target for TT, while they actually did 262. Target about 2% too high.

Its ok as a rough guess at TT pacing, but without the TT effort it looks a little like a random variable around a mean. Somewhat like I would expect from the % data seen using a TT and ramp.

If you don’t have a lot of experience training, or doing TTs, there is absolutely a lot to like about coming up with a rough estimate of FTP as first pass at setting zones, training around threshold, and pacing a longer 40-min TT.

yes, and my point-of-view is that you can learn a lot from it (or any long TT effort over 30-min) because its basically a similar effort as to one that’s at your current FTP / metabolic steady state.