2021 U of T Training Study, p/b TrainerRoad (2022 update: results posted!)

Hey this study was mentioned in Canada’s largest national newspaper…The Globe and Mail

Paywayll (article below)
\https://www.theglobeandmail.com/life/health-and-fitness/article-2021-health-and-fitness-preview-four-predictions-about-what-will-go/

CROWDSOURCED RESEARCH

Back in November, a University of Toronto doctoral candidate in exercise science named Michael Rosenblat started recruiting cyclists for [a study on interval training. Of course, none of them could actually come into the lab – but that was no problem. Subjects from around the world could train in their own basements using their own smart trainer and a piece of software called TrainerRoad, while Rosenblat monitored their performance from afar.

Other technologies are also making it easier to take research out of the lab. A study of the Apple Watch’s heart rhythm detection app, for example, collected data from over 400,000 participants. For a field like exercise science that has long been plagued by unreliable studies with a dozen or fewer subjects, the workarounds necessitated by the pandemic may turn out to have long-lasting benefits.

1 Like

Herd Seiler mentioning this study in a recent podcast (FastTalk). Have their been any updates on the study results?

3 Likes

Have not seen anything on Jem’s blog. He has posted a few 2022 articles on the blog, and appears active on Twitter (I don’t tweet but he has a feed on the blog page). Maybe he’ll pop in here and let us know if the study is on track and if data are still being generated or if they are on to the analysis phase.

@SpareCycles

Blog Link: https://sparecycles.blog

4 Likes

Hi!

I haven’t listened to the FastTalk podcast with Seiler, but Michael Rosenblat (https://www.evidencebasedcoaching.ca/) has been hard at work on this project. Expect to hear something in the next couple of weeks.

Yes I’m more active on twitter these days where I post preliminary analyses & anecdata from my PhD work with NIRS & muscle oxygenation

9 Likes

Jem, Thanks for hopping in and best of luck with your ongoing research!!

1 Like

Hi everyone,

Michael (Dr. Rosenblat) just posted a video summary of this study

And a summary article posted to his website

Stephen Seiler just posted about it on twitter

Thanks again to everyone who participated as a subject, and big thanks to @Nate_Pearson and Corey for their help setting things up on the back end.

I think the biggest takeaway for us was learning the many logistical challenges to setting up a remote randomised study with enough scientific rigour to control as many variables as possible, but maintain a high ecological validity.

The challenge we saw and ultimately the biggest limitation to this study was a high subject drop-out rate. We received very good feedback from some of you who did and did not complete the study, but I’d be interested in hearing from anyone willing to share here (or private DM me) how do you think we can improve the subject participation experience for next time?

If you participated, whether you completed the study or not, what were some of the hurdles you encountered? And any advice for us for how to overcome those hurdles?

Would you be interested in participating in future virtual training research? Under what conditions? (eg. only outside of my competitive season; only if I got individual feedback that I could use in my training; only if I could perform the training sessions outside, etc.)

Also happy to discuss anything else. This is crowdsourced science! The best way to improve the process is by getting more input from you the subject-stakeholders.

Jem

11 Likes

As someone in the field of research, I am very excited to see this collaboration. Way to go!

I recently read an article that, Long covid may set you back a decade in exercise gains, and thought it would be cool if the TR team would work with Matthew Durstenfeld and their team to integrate an Informed Consent agreement and additional survey questions if opted-in to be able to provide more long-term data towards these types of health initiatives within the health community.

Reference

2 Likes

Just as a caveat - I haven’t had a chance to watch the video above yet

I participated in the study and had several (maybe 4 total?) 30 minute zoom calls with Michael both before, during, and afterwards. I also participated in a case study for him that tested a different methodology

I had no issues following the protocol - although when I saw the cohort I was randomized into I wasn’t particularly stressed by the workouts I had to complete - they were very easy for me

While I had no issues - and would happily try another trial of this type during my winter months - I would suggest that you be as open and communicative as possible both about the purpose of the trial, the importance of strict adherence, and what, if any, flexibility was allowed before the data became meaningless (i.e. if you miss an interval session in a given week - that’s ok, but it can only happen once during the trial period vs. if you miss a session once we cannot use the data)

Happy to discuss in more detail - if you’d like PM me here or Michael has my contact information.

2 Likes

That’s really useful feedback, thanks.

Yeah, I think up-front making it clear what the ‘costs’ are for dropping out is crucial. Because there is really no cost to the subject (and there shouldn’t be, for ethical voluntary participation reasons) for dropping out, but the cost is quite high for the research. So we need to communicate what the obligations are for participation, and where the boundaries are for acceptable deviations from those obligations.

I think we would have benefited from a better face to face selection/filtering process for subjects. Maybe the process was too automated? It worked well for subjects who reached out to us with questions, but not for the majority of ‘quiet quitters’ (if I can appropriate that term :slight_smile:). That also shouldn’t be an obligation of the subjects, so it’s something we need to be more proactive on.

1 Like

Are all the spots filled? The link seems not to be working

Sorry, yes this study was conducted in 2020-2021. We have just produced the results, hence resurrecting the thread. I’ve edited the title now

@SpareCycles

Interesting, though a bit disappointing in that lack of power constrained the possibility of good inter-group comparisons. However, I thought that this was a good first try, with plenty of ‘learnings’.

Three issues come to my mind:
[1] Why was there a target for the number of participants? I understand the need for a minimum, but the maximum could surely be more flexible. Traditional methods for this kind of research impose high costs on the researcher of having more participants + the cost to participants is high, since they have to turn up to a lab for their training sessions. But in this kind of research, the major cost to the researcher is clerical [recording the data, keeping track of people] together apparently with some one-to-one discussion.
[2] I presume that the ethics committee was worried about older people having intervention-induced health issues. Is that so? But surely those concerns can be overcome by participants getting some sign-off from their doctor. [I speak as someone close to twice your age limit.] Given the existing interest in age-related training effects, it would seem to me valuable to try such an extension, though of course any one study can only test so many variables.
[3] You have the traditional female participation problem. That seriously needs to be addressed in future work of this kind.

And finally, a question: what was the locational distribution of the participants? I mean: country of domicile.

1 Like

Jem (and Michael if reading),

Congrats on giving this a shot. As a clinical trialist in a former life, am all too familiar with the difficulties of studying humans. A few comments / suggestions after reading the summary. Apologies if the questions are answered in other areas but am not following the socials.

The dropout rate, or perhaps differently termed, the non-initiator rate post sign-up, is very large. It looks like at least half of the signed up individuals simply disappeared? It would be worth trying to contact those folks to find out what happened. You guys likely have tried, but if not, perhaps provide an anonymous way for non-initiators to tell you why they didn’t start. It’s going to be a combination of best intentions not panning out, life intervened, protocol too hard, injury, etc. But knowing the answer rather than guessing, might help with future design.

Am wondering if non-initiation could be partially age related? e.g. perhaps a study of older individuals (45-65 or something) might provide a more stable or comitted cohort. Was there an IRB/ethics reason for capping age at 45, or was that a study specific decision to try and limit variability? If the IRB would permit it, and you guys are up for study #2, give it a shot with an older group and see what happens! (TL;DR Kids are unreliable and retired people have more time to do things like crowd sourced physiology studies).

Non-centralized trials, site independent trials, other flavors of studying people without requiring a ton of site visits, are a hot topic in drug development (and other forms of medical research). While drug development is not ex phys, if you do additional studies in the future, would suggest reaching out to some of the companies with growing expertise in the non-centralized study arena.

Human touch matters. Designing in interaction with a study coordinator or PI before starting and during the study for updates could be useful. One of the things we hear from patients in drug studies is they want to know more about what is happening. Blinding is critical for those studies as is avoiding bias and other issues. But your study is not going to a health authority, and you need completers foremost, so consider that engagement and completing is critical and think of ways to enable and encourage.

While pure altruism is great, incentivizing is even more great (LOL). Consider offering something of extremely low monetary value but extremely high “swag” factor to participants for initiating and completing. For example, a T-Shirt at the end for folks who initiate, complete the tests and complete an end of study questionnaire. A reasonable IRB should permit that type of thing and it wouldn’t be hard to find a sponsor for the reward.

There are other ways to approach the reward and engagement topic that doesn’t require any physical object. Will save those thoughts for another day though as this is getting long already.

For cyclists, I think the original question posed is important, and it would be great to see some guidance. Null hypothesis is that any six week VO2max block in a reasonably trained person will yield a similar benefit. But wouldn’t it be fun to have actual data of say 30/15 vs 6x3 vs The Empirical cycling Vo2max block are similar or actually different? With data collection including (a) completion/compliance difficulty (eg the best block ever created is useless if riders can’t/won’t complete the training block) and relative benefit on the measure of interest (VO2max surrogate like a 5 min test, wPeak, etc). You don’t need 100’s of subjects to do this, you need maybe 50-75 motivated subjects who will complete the study. Depends where you set alpha and what you assume for effect size.

Tl;DR - You guys are definitely onto something decent here. It might be that random crowd sourcing is not effective and need to do some preselection to find subjects more likely to complete.

I hope you’ll give it another try.

Good luck and best success in your studies and academic careers.

-Darth (LOL, have to preserve the anonymity but Jem has my contact info from a prior PM discussion)

@SpareCycles

5 Likes

Well said! What is the point of an age restriction. Also it’s usaully Dudes they want, and us females don’t get a look in :frowning_face:

2 Likes

How does the intensity in the table correspond to FTP (roughly)?

1 Like

From the link to study on Evidence Based Coaching website:

The IET in the current study incorporated shorter stage increments (12.5 watt increases every 30-seconds). Interval work-bout intensity was programmed for each participant at a percentage of the difference between Wpeak and the WTT (15% for groups 1 and 2, and 30% for group 3 and 4); where WTT was used as a proxy for the metabolic steady-state threshold. The group mean power output for WTT was 70% ± 4% of Wpeak. This is consistent with CP and MLSS as shown in previous literature where CP can occur at 67% of Wpeak [17] and MLSS at 70% of Wpeak [18]. The mean power output during the interval sessions for all subjects in the current study was 77% ± 4% of Wpeak (75% ± 3% for groups 1 and 2, and 79% ± 3% for groups 3 and 4).

They used 40-min time trial power - TT in that screenshot - as proxy for metabolic steady state.

If you pull up Table 4:

HIIT work intensity by group, using Pre values of watts for TT (FTP) and Wpeak (MAP/pVO2Max)

Group TT / FTP Wpeak / MAP / pVO2max HIIT Interval Target HIIT as % “FTP”
Group1 250 344 264 105.6%
Group2 221 329 237 107.2%
Group3 257 371 291 113.2%
Group4 242 345 273 112.8%

Check my math, I’m on a conf call and whacked that out.

Summary:

  • about 106% FTP for the 4x6-min (4-min RBI) and 12x2-min (2-min RBI) groups 1 and 2
  • about 113% FTP for the 12x2-min (2-min RBI) group 3 and 16x30/15-sec group 4
3 Likes

Wow, great feedback everyone. I really appreciate it.

Forgive brief comments for now. I’m definitely reading, thinking about, and considering all your points.

Ethics doesn’t like open-ended recruitment, on the premise that excessive recruitment beyond the number of subjects predicted with a power analysis to find a significant or meaningful outcome, is undue burden and risk to patients and lab resources.

Yeah I’m really not a fan of the age limit in the 40’s by most ethics boards for these kinds of studies. But there is a strong caution around the risks of maximal exercise from ethics boards and insurance providers. I have strong opinions about this. Exercise is medicine and one of the most effective treatments we know of for the broadest range of health issues. And yet, this is the system we work within.

It’s a cost-benefit decision made in a different time, then baked into the system. It’s tough to overcome that inertia in a naturally conservative, “first, do no harm” field like medicine & human clinical research. But I have to say, there are good historical reasons why the current system exists. It’s just hard to overcome path dependency, even when we “know better”.

I absolutely agree female representation in sport science is one of the largest issues limiting applicability of research (overlooking ~50% of the population is not acceptable). And we were not able to alleviate that disparity in this study. Specifically here, though, I can’t say I fully understand why we didn’t have more females participating. So I’d be interested to hear especially from any female athletes maybe what you think the limiter was? (whether you participated, had the opportunity and couldn’t/decided not to, or just have an opinion on the topic)

We did not have any exclusion criteria that should have lowered female participation (that I am aware of?). We assumed we would get more male volunteers at first asking, so we specifically reached out to female coaches, coaches of female athletes, and Womens’ teams. Maybe just like we under-estimated the drop-out rate, we also underestimated the engagement rate from females, and needed to make more of an effort at recruiting female athletes? I will have to look back at how we promoted recruitment.

There are some other systemic issues here in a predatory, publication-obsessed field with limited resources, having to minimise variability in order to manage project scope to actually complete projects, to get degrees and meet deadlines, to pay bills and such, along with all the historical prejudices that I don’t mean to minimise but I think should be table-stakes at this point to recognise that they still exist, which results in females being severely under-represented despite individual best-efforts at inclusion… so yeah, the demonstrated preference of sport science as a whole is still that “we just want dudes” :man_facepalming: Although that is gradually shifting.

Yes. Roughly FTP ≈ Wtt for the group. But lets consider individual variability vis-a-vis fatigue resistance at FTP (time to exhaustion at FTP/CP range ≈ 15-70 min) to put some 95% prediction interval error bars on that estimate. So it’s probably more like:

FTP = Wtt ± 0.5 W/kg

Please don’t consider this advice to use a 40-min TT to predict FTP. There is zero way for an individual to know their precise FTP/CP/metabolic steady state power by performing a single 40-min TT.

But a 40-min TT is a pretty great performance indicator on its own: if it goes up (above some minimum detectable change) then you can say you’ve improved your performance!

2 Likes

Reasonable, but to be clear you only estimated Wpeak (from 30-sec ramp steps) and 40-min TT. I’ve said estimated Wpeak because I’ve seen meta studies point out Wpeak can change based on ramp protocol. In other words, Wpeak-30secRamp can be different from Wpeak-1minRamp and Wpeak-2.5minRamp, is that right?

Why did you use 30-sec ramp steps instead of 1-min or say 2.5-min steps?

On a related topic, your thoughts on predicting FTP in a ramp test by using a fixed % of best power over time=StepSize. For example 70% of best 30-sec power in the studies 30-sec ramp step, or 75% of best 1-min power in a TR ramp?

You use terms like “zero way” and “precise FTP/…” and “single 40-min TT” to qualify, which I agree and cannot debate against, but on the other hand if you are well trained and have some experience doing threshold intervals, I’d argue you can learn a lot more about your FTP and TTE from a single 40-min TT attempt than you can from a ramp test.

Fun with math…

Group 40-min TT / “FTP” Wpeak / “30-sec Ramp MAP” ”FTP” as %Ramp
Group1 Pre 250 +/-22% 344 +/-20% 72.7%
Group1 Post 262 +/-19% 382 +/-18% 68.5%
Pre vs Post 104.8% 111.0%
Group 40-min TT / “FTP” Wpeak / “30-sec Ramp MAP” ”FTP” as %Ramp
Group2 Pre 221 +/-18.6% 329 +/-20.6% 67.2%
Group2 Post 219 +/-16.9% 334 +/-17.1% 65.6%
Pre vs Post 99.1% (decrease) 101.5%
Group 40-min TT / “FTP” Wpeak / “30-sec Ramp MAP” ”FTP” as %Ramp
Group3 Pre 257 +/-14.8% 371 +/-12.1% 69.3%
Group3 Post 263 +/-15.3% 388 +/-11.6% 67.8%
Pre vs Post 102.3% 104.6%
Group 40-min TT / “FTP” Wpeak / “30-sec Ramp MAP” ”FTP” as %Ramp
Group4 Pre 242 +/-23.4% 345 +/-18.6% 70.1%
Group4 Post 247 +/-22.8% 359 +/-18.3% 68.8%
Pre vs Post 102.1% 104.1%

Not only is there fair amount of variance on 40-min TT (FTP proxy) as %ramp, there is considerable variance on group average TT power and Wpeak. Had to ignore all that and using averages as I don’t have data per individual.

Looking at that its hard to not think hard about the logic of using a 70% multiplier to estimate FTP from average watts of last step in a ramp.

What if the participants used ramp’s 70% of best last step power to pace a 40-min TT? Would it matter in practice? Group1 data:

  • group1-pre average of 344W * 70% = 241W target for TT, while they actually did 250. Target about 4% too low.
  • group1-post average of 382W * 70% = 267W target for TT, while they actually did 262. Target about 2% too high.

Its ok as a rough guess at TT pacing, but without the TT effort it looks a little like a random variable around a mean. Somewhat like I would expect from the % data seen using a TT and ramp.

If you don’t have a lot of experience training, or doing TTs, there is absolutely a lot to like about coming up with a rough estimate of FTP as first pass at setting zones, training around threshold, and pacing a longer 40-min TT.

yes, and my point-of-view is that you can learn a lot from it (or any long TT effort over 30-min) because its basically a similar effort as to one that’s at your current FTP / metabolic steady state.

2 Likes

Correct, mostly. Wpeak is Wpeak, however you operationally define it. We defined Wpeak as “the highest 30-sec power attained from an incremental test of 30-sec steps at 12.5 W/30-sec ramp rate”.

Wpeak could be considered an estimate of some other construct, like Max Aerobic Power (MAP), for which we use the rough definition “the highest constant workload power that will allow attainment of VO2max before task intolerance occurs” Then we could say the agreement of Wpeak with MAP will depend on the IET ramp rate and duration, as you suggest. Discussed well here (and elsewhere).

We weren’t concerned with measuring steady-state physiological responses, which is the primary reason I can think of for using longer stage duration. Test duration mattered more to us than stage duration. Without going deeper into rationale, 25 W/min ramp rate is fairly standard for a heterogenous sample group with unknown fitness. Smaller discrete jumps are preferred to large jumps, hence 30-sec steps at 12.5 W/step.

Depends what definition of FTP you are using (how ‘performance-ish’ is it, how ‘physiological-ish’ is it), how much prior information you have, and if you are interested in group mean results or individually accurate intensity domain / training zone prescription. That’s the big one.

40-min TT and IET Wpeak are fantastic performance indicators in their own right, and of course they are strongly related to physiology and physiol. outcome measures like max metabolic steady state (MMSS). We can say with high confidence that if your 40-min TT & Wpeak numbers are higher, your FTP, CP, MMSS, and overall fitness are probably higher. They move together (relatedness), but that doesn’t tell us how close they are (agreement).

Assuming a group is normally distributed, let’s say the group mean 40-min TT power agrees well to the group mean FTP/CP/MMSS (we don’t know that because we didn’t test FTP, CP, or MMSS). But with no prior information, I as an individual in that group, have no idea where on that normal distribution I fall (how close my own 40-min TT is to my own FTP/CP/MMSS). So I can use it as a starting estimate, but I have to consider the uncertainty in the group from the 95% prediction interval (PI).

Handwavy but important definitions: 95% confidence interval (CI) is the uncertainty around the sampled group mean where the ‘true’ population mean is expected to fall. But we’re still talking about means. 95PI is the uncertainty within group data for where the ‘true’ sample for any one (or ‘the next sampled’) individual might fall. 95PIs are much larger than CIs. (apologies to any statisticians for these oversimplifications… don’t trust me, check my work! :sweat_smile:)

So that’s why I can simultaneously propose that yes, the group mean 40-min TT is a (reasonably) good proxy for group mean FTP/CP/MMSS, but no, no single individual can use it to estimate their own FTP/CP/MMSS unless they are willing to embrace the uncertainty of very wide 95PI error bars on their thresholds/zones.

If prior information is known, like we all know our own power data, PD curve, maybe even prior physiological tests, or even just repeated testing. Then we can probably largely reduce our individual uncertainty and gain even more performance information from 40-min TT & Wpeak, like TTE at FTP/CP/MMSS as you say.

Absolutely! Well said. TR and other training software already do it (reasonably) well for a first pass. And btw, I really appreciate your thinking here. Some great concepts to digest!

2 Likes

Thanks! Much appreciated.

1 Like