Which I think brings it full circle to where this all started… AI FTP feels more like it comes from deep data analysis and it is eerily accurate (from a TR workout perspective) in many cases; whereas PLs feel like they come from some simple standard algorithms and many people (I’m guilty) end up gaming the AT system to get to accurate PLs more rapidly after a new FTP is assigned.
AT definitely doesn’t get PLs to what I think is the right level. VO2 are always too easy if I don’t do them for a while due to the natural decay they built in.
Honestly, I think it is the opposite way around: I reckon AI FTP is a much simpler problem because it is very clear what you are optimizing for and how to score that.
With AT that is much less clear, because you can define success in many, many different ways and weigh them differently as well. E. g. does AT take specific aims of the various plans into account and e. g. give more emphasis to short power in the crit plan? I don’t know. This is probably one of the most closely guarded secrets by TR.
I’m not quite sure what you mean by gaming. Do you just mean that when e. g. energy systems you haven’t trained in a while (and PLs have decayed to ridiculously low values) that you will do “breakthrough workouts” that simply reset the PL to a more sensible value?
If that’s what you mean, then I am doing that, too. (Two common ones being endurance PL and currently, my sweet spot PL: I have been replacing my sweet spot Sunday workouts with endurance rides.) Although, I wouldn’t call it “gaming the system”, you just understand how it works and work within the current limitations.
Again, that doesn’t mean that AT is “simpler” or “more primitive”, it just means that AT as it currently works has quite a few limitations that will need to be addressed in the future. IMHO AT is simply trying to solve a much more complicated problem than AI FTP.
PS My experience with AI FTP is quite mixed. During regular blocks the predictions are fine (I wish I could compare with ramp test results afterwards. But it had significant trouble estimating my FTP after a longer training pause, it was off by 15–20 W. (My lactate threshold had decayed less than expected, instead, my endurance went to shirt.)
I’m with you @Pbase. I often use use AT’s suggestions, but i definitely game it. Sometimes it’s off because it missed some outdoor rides, or it decayed my levels too fast, or has my anaerobic PL down at 1.3 when my V02 PL (which is sometimes set by doing intervals in the anaerobic zone) is at 7.
AT and AI FTP detection are great, but the real genius is in the PL algorithms. Knowing if one workout is harder or easier than another and by how much is an amazing luxury. It makes AT possible and allows us to fine tune our training.
I think we disagree because you believe in the marketing. Which is fine, but from a practical point of view there is no reason the two need to be interconnected. By their own marketing speak PL mean that FTP doesnt need to be that close, the system will fix itself shortly based on your workouts. You can set a wildly innaccurate FTP and it fix itself and in theory it can do it despite whatever value is there because it can simply compare previous work done against the whole database to suggest a correctly level. Its just a simple percentage of ai ftp or ramp test ftp or whatever input.
Conversely ai ftp can be calculated well without any workouts at all as long as it has enough historical power data.
And thats the thing. There is already programs that can do what TR invisions right now. Real time FTP updates with fitness insights beyond just a single ftp number based solely on unstructured power data, and tailored workouts and progressions based on overall fitness signatures to recommend the right level of workout for the day, inside or outside. But even then people choose to use only various aspects of the whole package that fit their specific goals, potential biases in philosophy, and knowing their own strengths and weaknesses etc. Do i think TR can eventually do a better job of it with their vast database? Sure definitely possible but certainly if we put marketing and semantics aside both systems are independent algorithms at different stages of development because they are tasked with very different goals as you’ve stated
No, I don’t think so. No need to believe in marketing, just listen to what they say and compare it with reality. When they made AT public, the TR team told us to anticipate AI FTP, and now that feature is public beta. They explained how they see it as an integral part of a ML-based training platform, and at least the logic makes sense to me.
To be honest, I think you mistake my enthusiasm for interesting stuff happening at the intersection of different fields of science with enthusiasm for TR. I am curious to see what interesting stuff will happen.
Maybe there is no need in your mind, but in this case history tells us it was.
I think that fundamentally misunderstands what PLs try to quantify, your endurance in a zone/the number of matches/whatever you want to call it. This is distinct from your lactate threshold. Your FTP = lactate threshold may be set correctly, but your endurance may be higher or lower.
Yes, you can use PLs to accommodate changes in FTP (which is not a fixed number for 4–6 weeks, but changes anyway). But that’s not what they are fundamentally there to do.
When it comes to FTP, maybe. And this shows one important point: users don’t and shouldn’t care how a device, an app or a platform estimates things like VO2max or your FTP. I don’t know whether TR is any better or worse, because they use ML and a very big, unique dataset. Personally, I don’t care about FTP estimation, I like ramp tests and if they are an option, I’ll probably keep doing them. I don’t know whether any of the other platforms I currently use can and estimate my FTP in some fashion. Probably the biggest challenge in even deciding whether these different platforms have better or worse algorithms is that we need to sample a large cross section of people.
But when it comes to adapting training plans, I am only aware of one other competitor that uses methods from ML (thanks to these forums), but I have no experience with them. So I cannot say whether they work any better or worse. If you go broader and do not restrict yourself to platforms that leverage ML, you can include platforms like Wahoo’s SYSTM (spelling?). Again, I don’t know if they are better.
I think you fundamentally my enthusiasm for science and ML-based methods for TR fanboi-ism. I’m very enthusiastic about ML, because I have seen it grow up in my vicinity. E. g. in the late 2000s my best friend’s Master’s thesis on computer science was using Big Data techniques to improve automated translations for software, partially developed during an internship with a certain Fruit Company. About 8 years ago I had the privilege to meet the who-is-who in the mathematical theory behind it at a research institute, go to lunch with them several times a week and absorb knowledge by osmosis (that doesn’t make me an expert!). One of the participants was offered a six-figure consulting gig with a MLB team to use ML to analyze player movements. If she had wanted to get out of academia (she’s a professor now), that would have been a good ticket.
A few years later it was instrumental in the search, discovery and imaging the event horizon black holes, a discovery I am sure will be awarded a Nobel Prize in the future. (There is an excellent talk on Youtube by the leader researchers of one of the competing groups for the ML algorithms that were used to reconstruct the image data, Katie Bouman.](Katie Bouman “Imaging a Black Hole with the Event Horizon Telescope” - YouTube).) If I sound enthusiastic, that’s because I am, it lies at the intersection where I am at home, the intersection between mathematics and natural sciences (I’m a mathematical physicist). However, ML is used even in small ways. I’ve seen more and more physics papers from experimentalists who have used ML-based methods in their data analysis. (Often senior researchers would answer me “my Master student did this and I don’t fully understand it …” when I asked about it.)
So you asked whether I think “TR can do eventually do a better job with their vast database?” Yes, absolutely, if they play their cards right, they can leapfrog the competition. My reasoning is simple: they are the biggest platform of its kind (Strava does not have all the information and cannot recreate a lot either), which gives it a strong platform effect (in the same way that Google dominates search or FaceBook has eliminated a lot of other social networks). The second big factor is the quality and amount of data, which is also unique. The smaller competitor (I have forgotten the name) has to optimize its algorithms with far less.
The biggest limiter I am sure is talent. Some friends and former colleagues who went into industry are working on applying ML to all sorts of problems, some as slimy, yet lucrative as “predicting a customer’s journey” (aka ads, this was a quote from a friend who got offered a job with Japan’s biggest ad company; he declined). Others are much more interesting and can rely on cutting-edge research (e. g. image recognition for satellite imagery). Modern APIs are at a point where many people can do the “mechanics” with shake-to-bake solutions.
But problems like AT require extensive domain knowledge and knowledge of methods in ML. This I am sure is the limiter in TR’s case. Clearly, they have attracted plenty of people with domain knowledge in cycling, finding the few people at the intersection is hard. I am not sure how many people like @WindWarrior are out there on this planet, but I am going to bet it is very, very few. Perhaps you can count their number on two hands, and I am not exaggerating (speaking from experience in research, the more you want depth of knowledge, the fewer people there are).
In a sense, ML is relatively simple: it is the search for a local minima or maxima of functions in a vast, vast space. The space is so vast that in many cases you wouldn’t have enough memory even if you made the entire universe into a computer (that’s a way of saying that the number of configurations is larger than the number of atoms in the universe). Technically, the biggest challenge is finding the right optimization function, i. e. a way to quantify how much better or worse algorithm A is compared to algorithm B. FaceBook and Youtube famously want to maximize engagement with their selection algorithms for items in your timeline or in the list of suggested videos. You can see the results very clearly.
Even without AT, just having a quantitative measure of how much harder one workout is compared to another has significantly improved my training. Of course, you should never leave your brain at the door and use things blindly. E. g. I cannot directly compare VO2max workouts like versions of Spanish Needle with steady state VO2max workouts, I might be better at the former than the latter. But still, it is hugely helpful.
If thats all you think is out there in terms of quantifying fitness beyond FTP then maybe that is where the disconnect is. ML is a tool that can be used to ask interesting questions but you have to know questions to ask it, and feed it the right data to try to approach question.
You talk about matches and power bars but TR cant currently predict real time points of failure in unstructured power data, even what is front facing they cant predict point of failure in structured workouts. Multiple programs can do that now pretty accurately without any machine learning involved(some with). You cant currently do anything with PLs to guess at TTE at a certain wattage over threshold for example, but others can to a pretty close approximation.
Which again is fine, they are focusing on the aspects of training they are good at and other programs focus on what they are good at. Throwing ML at it doesnt change the emphasis of the are of focus though.
I guess maybe its also because ML doesnt wow me, i see its potential every day in my field of immunology, and im sure TR has tons of talent but i also work with Nature and Science publication level bioinformaticians that work with huge datasets. But ML is still just a tool at the end of the day, the brains behind it are still what are driving the outcomes
For what it’s worth, from their description it doesn’t seem like AT is one single model anyways. It’s probably a few models underneath, such as one for predicting progress levels and another to predict long term outcomes.
Look at the problem they want to solve: they started with training plans that were often too hard and led to burn outs and demotivated users. It seems like AT is definitely biased for making things as easy as possible while maintaining good outcome. With that criteria in mind, I think it’s been great to most users.
So perhaps when it errs, it makes some workouts too easy, but that’s hard to objectively measure. Especially when the outcome is just supposed to be better fitness. Perhaps not pushing every workout to the limit is by design, certainly if I can get the same adaptions I’d prefer it easier.
Please don’t put words in my mouth. Of course, I don’t think that, and I never wrote that.
Yes, which is why applying ML is difficult and fraught with risks. Even independently of what techniques you use, you want distill some essential information from a whole host of data. Deciding which metrics are important and which aren’t is key.
For example, as best as I can tell, the purpose behind PLs is not to quantify performance, but to select workouts and achieve progressive overload at prescribed rates. I suspect this is the reason why TR hasn’t done much with them yet (at least publicly). For this limited purpose, PLs work well for me even without AT.
Of course, this means AT’s functionality is quite limited at present, and I am missing good analysis tools to judge my progress.
I’m not quite sure what you mean. Yes, the public version of AT cannot ingest unstructured rides at present. But I struggle to understand the rest of the sentences.
Can you be more precise? What can multiple programs predict currently? And from what data?
Yes, and? TTE is not necessarily a relevant measure. For shorter VO2max and higher efforts, you might want repeatability and rather than TTE. Certain smart watches also estimate your VO2max. But what does that number tell you? How does it inform your training? Ditto for TTE? Why should I care, i. e. how does it relate to performance outcomes that are relevant to me?
To be honest, that’s one important issue we haven’t talked about in this thread at all: what information do you expose to the user? I have ranted about TR’s poor performance analysis tools in several other threads (e. g. here), but I understand the problem isn’t simple.
Basically, people can only track a few metrics, and it is the choice of metrics that is the tricky bit. This is really where a good coach can help an athlete: they find out where the strengths and weaknesses lie, what the athlete wants (e. g. be good at a certain cycling discipline) and then weigh whether to focus on strengths or weaknesses (limiters). TR could (and should) take a stab at this, but simply predicting numbers from data might not be helpful at all.
Ideally, I want that TR analyzes an athlete’s past performance, identifies strengths and weaknesses and tells athletes what a particular plan emphasizes. Athletes should know why they should track certain numbers (and not others). Perhaps it does surface TTE as a metric for people choosing the 40k TT plan or the tri plan. But it exposes other metrics to athletes from other disciplines.
I completely agree, and I wrote as much above. End users don’t care whether “the computer” got the result by a traditional algorithm written by a human, an ML-based algorithm or a weegee board.
ML should also be used with a lot of caution. If you listen to the talk I linked to, a vast part of the research was to ensure that the algorithms were not biased — i. e. it reconstructs what we expect the event horizon of a black hole looks like because we want to see a reconstructed image of a black hole.
I know full well it is not a panacea: Amazon used ML carelessly when pre-screening applications: they trained their algorithms to reproduce sexist bias, where e. g. being a member of a women’s chess club was counted as a negative. Another one is Google’s image recognition snafu, they apparently had too few (or no) black people in their training set and black people were categorized as gorillas. I also don’t quite like if colleagues cannot answer what ML did to their data and why it was necessary.
However, treading carefully and resisting the urge to simply produce numbers (e. g. TTE or whatever other metric you want) without thinking about whether this is useful for the user and in what way it is useful seem like good ideas.
Most likely. From what I remember, they have a Quantifier that judges how hard workouts of different energy zones are, although if memory serves there was some human input as well. So in this sense it is also a matter of semantics whether AI FTP is just a feature of AT or not.
What is smarter or best is in the eye of the beholder. Four years ago I figured out how to do my own FTP detection, and then I bought a Garmin 530 in 2019 and the ML stuff using HRV, HR, and power has also been surprisingly valuable and useful. If you’ve never done FTP estimation to set sweet spot and threshold intervals, then AI FTP is an eye opener.
The other part of this discussion is getting into AT/ML vs physiology models. I’m all ears for the day that AT understands that I need more aerobic base development, that I respond better to more endurance work (vs SSB), and that I respond better to certain types of intervals. The physiology models provide metrics/data to inform these type of training decisions. If you are interested in physiology modes, perhaps watch a few key WKO webinars on YouTube (go hit up the WKO thread for recommendations).
Not always. I think you could in principle benchmark the different approaches very easily, there is nothing subjective about them. It’s just that the different companies wouldn’t want to let you do that (at least with an N value that makes your results matter).
FTP estimation is for me a non-issue: once you get into the habit of validating your FTP (as necessary), it doesn’t really matter how FTP is determined, whether the baseline is suggested by eFTP, AI FTP, a ramp test or a 20-minute test. Perhaps that’s why I am personally not wowed by AI FTP or algorithms by competitors that do the same. It seems to me that rather than finding the perfect algorithm or testing methodology, I’d rather verify and be done with it. Certainly seems easier than a multistep FTP testing protocol or cooking up my own FTP estimator. (Still, I can very much appreciate that you rolled your own )
But I can see that it is a good feature for the broader masses, especially people who don’t yet know what all-out feels like. (I think this is something more experienced athletes take for granted.)
Are these mutually exclusive? I don’t see a reason why in the future you can’t combine both.
IMHO the critical missing feature is that AT v4.0 — with athlete/coach input! — selects and surfaces certain key performance metrics. This is also what is missing on all other platforms I have seen (and I do not claim to have seen them all): performance analysis is a bit of a mess. Yes, some will spit out things like estimates for TTE at certain powers, but is that a relevant metric for you? Maybe? But maybe not. Spitting out numbers without knowing how relevant they are for that particular individual means you could flood people with irrelevant numbers.
There has to be some user input, I don’t expect it all to be automatic. E. g. I don’t expect that AT will be able to tell you whether you should focus on a weakness or a strength this season. That is something human coaches are needed for. My current weakness is resilience: I can put out decent power, but I fatigue more quickly than before the month-long hiatus. So I would like to focus on that. This has to be translated into training goals and metrics.
Of course, having the right metrics will also improve AT since you can really optimize the training plan to optimize these metrics (and perhaps some additional metrics under the hood).
AT v7.0 could then give you feedback and make broader changes to your training plan: @WindWarrior, looks like your intensity is too high. I’m going to prescribe you more endurance work and less intensity in the current block, let’s see if that improves things. But IMHO it all starts with choosing the right set of metrics for the various training goals. The first metric to start with is consistency — it is easy to define, easy to implement and independently of your training philosophy, staying consistent is one of the fundamental principles that will make you faster.
I’ll have a look.
At the moment, I am skeptical of physiology models, but remain open-minded. Still, I have signed up for the newsletter of Frank Overton’s HRV-based solution — I love someone knowledgable is trying. There features similar to that (e. g. some Garmin smartwatches will tell you “how full your battery” is and how well you have recovered), which you might use to inform your training. But I am not sure if this is really useful input for making training decisions at this point.
Let me briefly chime in… have been reading thIs largely with interest, mostly because I can follow the TOs comment and idea.
Through the thread, got a bit tired by @OreoCookie ’s long and somewhat – seemingly – overconfident posts, but they did trigger some interesting thoughts and responses;-)
But now it matters how FTP is validated… same issue;-)
Uhm, well, the “subjects” (human beings) are …
But something along the lines of AI-AT should be able to do, for instance, exactly that.
But I am with you, for now we are stuck with ML, which is still conceptionally and effectively much closer to linear regression (pun intended;-) than artificial intelligence…
I am truly looking forward to much improved robots here, and times and current progress are exciting.
However, all this relies not only on software and modeling, it very fundamentally relies on large sets of high quality data “of all kinds”. Including very intimate health data collected with minute detail over very long times.
This very clearly also requires truly new and very much improved approaches to data protection and safety, (temporary and permanent) data ownership, retraction and deletion options, etc. All this in an international setting…
Don’t want to be too pessimistic, we’ll likely halfway figure this out more or less in time, but I don’t see how this is actively worked on in this community yet:-o and without it we will end up with severely dangerous versions of /big brother/ or the brave new world.
It is not the same issue, because knowing what being right at lactate threshold feels like is a vital skill when you ride and train. And it isn’t a difficult skill to learn.
In terms of techniques, there is no distinction between “AI-AT” and AT, the AI in AI FTP is a marketing name. Under the hood it uses the exact same techniques as Adaptive Training.
ML is a name for a vast array of techniques, most of which have zero to do with linear regression.
Outside of research, AI is often used interchangeably. For example, Google’s chess engine Alpha Zero is based on machine learning techniques, but is often referred to as AI.
The fundamental problem today with AT is that it is trying to optimize your workout on a specific day, within very tight constraints: it is only allowed to select a workout with similar a Zone focus and roughly the same duration as the one on your schedule. It can’t (doesn’t have the intelligence today?) to fundamentally alter the workout. In addition, AT is constrained by the structure of TR plans:
- Work to recovery ratio (number of weeks of work compared to rest)
- Number of days with intensity vs endurance vs active recovery
- Workout length
It doesn’t matter how sophisticate or not AT is, it is fundamentally hamstrung by the above constraints. So you can’t make the logical leap that AI FTP detection is using a more sophisticated ML model than AT is using. AT’s model could be super sophisticated, but at the end of the day it is solving a comparably much simpler problem: given how you’ve performed on previous days, and your perceived exertion, should I give you a harder / easy / same workout?
Hard agree. It may be really clever, but it’s solving a problem that isn’t all that difficult. If we as athletes were able to look at our workout performances objectively, we could do the job ourselves easily. It’s “just” a matter of assessing how you laid down the watts and how that felt, and then serving up something of the same flavour but between 20% easier and 20% harder next time.
The main point for me is that for 20 weeks it’s served me up workouts that I’ve consistently nailed (and very much enjoyed in the main) while still progressing me and given me the flexibility to choose something more challenging on days when I get to the bike and feel strong. It’s all good.
As a matter of building out functionality of AT, you probably want to start with something whose functionality is quite modest, learn from building it out and optimizing it, and then include more functionality in a revision (e. g. allowing AT to also change volume).
Maybe here – in the mutually diverging backgrounds – is the reason for me seeing problems with your posts…?
The rest of this post confirms my opinion about
Completely agree. The more you train outdoors without using TR structured rides … the more useless Progression Levels become. That said, at least FTP seems to remain accurate based on outdoor rides. I simply ignore progression levels this time of year, and pick harder workouts as needed if I train indoors.
Exactly my feelings. At is doing what it was programmed to do, which is to purposefully underestimate the workout, get feedback from the user and then adjust accordingly. I personally dont like the super slow ramp as i can do pl 5 vo2 work almost any time of the year without any prep. I don’t see the point of using at if it doesnt adapt to me and i need to keep manually adjusting workouts.
Tr does have a huge dataset, but also with mostly new riders who adapt to any new stimulus.