TrainerRoad's Big Data

So, TrainerRoad has loads of data, LOADS.

I’ve no idea how many users there are or how many workout get completed every day but with every one there’s probably something we can learn. Success or failure, years of dedication or just starting out - everything that’s collected can tell us something.

Putting it all together and there could be new ways of looking at training.

I’m sure the team have ideas about what they would do with the data but if it was yours what smart things would do with it that may (or may not) make us all faster?


Oooh, without saying too much, this is definitely something we’ve been looking into. Along with machine learning to leverage the data, we’re hoping to learn a lot and build innovative tools to make you faster. I look forward to hearing what some of your ideas are!


in looking at my own data (pre-TR), I’ve seen different cycles like 2/1 over 9 weeks produce results. Anything to help tune recovery days/weeks into the stock plans would be helpful to individualizing the plans.


Yea, I think the dream is you plug in the days/times you plan to ride and TR puts in the correct intensities. Then adjusts if you wander off plan. Try to find correlations between HR, power variability, cadence, etc. to predict when you are starting to go too deep and start putting in recovery rides until you come back.


I would love to play with that data, so if you open source it a bit, I would be more than happy to take a look and try to do something interesting.

1 Like

Has trainerroad ever thought of sponsoring a kaggle contest? Might not be great as i imagine the largest challenge is making meaningful variables but if you ever need a optimization of a model kaggle is great.


Being a cyclist and a data scientist, this data set is my wet dream! I think the key area is personalisation - what’s the best training for me, not what tends to work well in general. Off the top of my head…

  • Are the traditional rider profiles, such at sprinters and climbers valid (cluster analysis).
  • Dynamically adapt intervals - a bit like Xert but using machine learning rather than a physiological model. You’d probably want some reinforcement learning here.
  • Identify an athlete’s fatigue state from heart rate + power e.g. looking at heart rate variability, decoupling of heart rate and power. This would contribute to dynamic intervals.
  • Determine an athlete’s optimal cadence at different power outputs and speeds (i.e. low speed climbing vs. high speed flat).
  • Optimal weekly workout structure for an athlete.
  • Optimal time of day to train for an athlete.
  • Determine athlete’s aerobic and anaerobic thresholds without the need for lab tests.
  • Tune the intensity of different zones to the individual - I might be able to handle VO2 repeats at 120% whereas you might struggle at 110%.
  • Warn users if their FTP is set too high.
  • What is the potency of specific sessions for individuals/types of rider - session X will be the best session to improve your VO2max given you current fitness, fatigue, etc.
  • Comparison of smart trainers & power meters - which is the “best” combination e.g. fewest drop outs, closest match to prescription, closures power match.
  • Rather than phases base, build, etc. Given I X weeks until my race and my fitness is Y, what is the optimal plan. Adjust this dynamically after setbacks (missed sessions, illness etc).

The holy grail would be to essentially have an “AI coach” that dynamically adjusts your training plan such that you’re “fittest” for whatever A race you designate, think BestBikeSplit, but instead of allocating energy stores for a single race, it would be allocating energy stores for an entire season. I think that would be too difficult, and at the least would probably require outside information such as food intake and sleep - the variability of ability to do workouts is too dependent on food and sleep even for the same individual. Other ideas (many already stated):

  • A model to link ride history over an interval of time to FTP changes over that interval. A neural network of some nature might fit the bill here, though I bet a lot of data pre-processing might be necessary to get decent results. It may also be such that riders would have to be clustered into similar groups based on lifestyle, etc and have separate NN parameters to get accurate enough results.

  • If the above was proven to be accurate enough (unlikely, but you never know), you could just simulate through all the possibilities of workouts in the TR library given user time and date constraints to find the “best” plan and re-run every time the user goes off schedule.

  • A more realistic option may be using user workout failure and intensity decrease history to advise whether or not a user should stick to a planned workout, rest, or do an easier version/different type. A recurrent network of some nature might work well for this, with the architecture varying based upon how far you think one should look back in time to determine these things. Again though, I think more information such as food intake, sleep, and general stress would be necessary to get accurate results.

  • Most granular idea would be to detect if the workout is going well or not and then choose whether or not to stop, extend/lengthen intervals, etc. You’d have to assume what the best option is based on human knowledge (is it really best to do more if you can? is it better to stop or reduce intensity?), but this is probably most realistic. Maybe a CNN with heart rate and power history could infer user state during a workout fairly well.


Nice… something tells me Xert and The Sufferfest should be nervous :wink:


I remember hearing @Nate_Pearson talk about a method of automatically adjusting a training plan based on the multitude of data which is collected by us (not just HR, power or adherence to a plan/workout but lifestyle tracking devices like step counters or sleep monitors etc).

That would be ideal, particularly as it takes the thought out of things. I would highlight a note of caution around machine learning/AI applications to this though. In my experience there is a tendency to overstate the effectiveness of these approaches and they are often limited by the ‘training data’ used to develop the algorithms.

If they are looking into making use of their data I would recommend getting involved with proper academics who are used to analysing large administrative datasets.

1 Like

Based on the data they have available now the big one for me would be looking at cadence trends (@occasionalathlete touched on this briefly), as I imagine TR have one of the biggest cadence versus target/actual power data sets in the world.

When researching optimum cadence a while ago everything I found was really vague (it’s personal to each cyclist etc.). It’d be really interesting to see if any strong cadence trends actually emerge from all the data TR have.

Do people performing well on the ramp test all share a similar cadence range when completing the test? Are there any cadence trends shared by the cyclists with FTP’s above 300W? When completing intervals at 110%/120% etc. does any particular cadence seem to be more effective to successfully hold the target power?

If any decent data comes out of this, the workout instruction text could be improved to be much more specific; so instead of recommending a cadence range of say 85-100 for a particular intensity, it could be much more specific and state that, for example, 90-92 rpm is the most effective cadence for the intensity.


Here’s a bold idea. Open source the whole dataset - or at least data from those athlete’s who opt-in to sharing - and provide an API to access it. Scientific advancement works best when it’s open and collaborative.


I think saying ‘give up your competitive advantage’ is a bit of an ask. It would be nice if whatever metrics they come out with they publish patented/copyrighted white papers describing how they do it as Coggins has done.

1 Like

Yeah, there’s very little chance TR will share the core data, for a host of reasons. Just not going to happen.

1 Like

likely starting with privacy laws, and going from there…

anyone that wants open-source, a serious suggestion. Find a few like minded folks, and put up a website for like minded people to create a database of completed workouts for analysis. To encourage people to join you might need incentives, for someone like me at a minimum I would want to remove gps data before ever considering to participate.

1 Like

I did say it was a bold idea!!

@bbarrera you mean Strava?

@occasionalathlete not following, what did I say that made you think Strava?

Great list. One to add:

  • Predict FTP trajectory based on previous FTP tests, the training you have done between successive tests, and your future training plan

On your first bullet, you may have intended this, but I think TR should augment/enhance rider profiles. For example, one thing I’ve figured out through my own research/testing is that my aerobic fitness is lacking (high HR/power decoupling during endurance rides). This is something TR could easily have a test for.

I’m not a patent lawyer, but I don’t think Coggan could patent his formulas. I do think the terms “TSS”, “IF”, etc. are copyrighted by TrainingPeaks, but they can’t stop anyone from using the actual formulas.

1 Like

Ya, the terms TSS/CTL/etc only have a ™ as far as I know