You left out the 5th variable - pizza.
I guess for WLv2 they’re going to have to try and validate by looking at past datasets where people have done unstructured rides/races and then subsequently attempted structured TR workouts. As the aim is to derive a PL from those unstructured rides that is a good predictor of which TR workouts you should then be able to complete.
Maybe an interim fix (to faster get WLv2 out) could be to have some kind of power variability threshold in the model above which no valid scores can be given (for now) or which will shrink the confidence of the output so it will be downscaled. But that’s just my amateur thinking.
@Nate_Pearson Would be interesting to hear how TR is planning on adjusting the WLv2 model after release? Will the model always output zone level scores 1-10? What happens to past rides if the model gets updated and results in different scores than before?
Hmm, I didn’t even think about how TR probably has the perfect dataset to validate their ML model for WLv2 against: All the athletes who performed outside workouts. TR knows what the workout level should have been based on the workout parameters. They could could run their model against the performed outdoor workout to decide if the WLv2 algorithm comes close.
That’d be a super tricky problem to solve though, because what if the athlete did the workout at the beginning of a long ride? Or if the athlete abandoned the workout after only a few intervals and then just did endurance instead?
Jeez, there would be so many corner-cases for someone to explain and classify as an outlier. How would one even go about evaluating the success of a trained model with so many outliers? You’d have to massively trim and cleanup your dataset.
And that’s why it’s taking so long
One way you could (emphasis on could) do this is to have a group of very good athletes who can perform workouts outside very well. And you do something like this:
- You have this group do specific workouts outside “straight”, you ask WLv2 to evaluate what the athlete did, but you don’t give WLv2 the workout that was supposed to be done. This is the simplest case. You know ahead of time what the answer should be (assuming again the athletes performed the workout very close to the prescribed workout), so you are checking that WLv2’s answer matches what the answer should be
- Repeat the above but add in extra endurance, change the rest / interval, do some intervals slightly above / below / etc. These should be “easy” changes where you can create a known PL ahead of time. See how well WLv2 works on this more complex dataset
- Keep adding in more complexity, until you start trying to score things like group rides that have no known answer. This is the most complex, as you are sanity checking after the fact and trying to figure out what perturbations cause WLv2 to come up with the “wrong” answer. As you figure these out, you can then create scorable workouts to train WLv2 on. Rinse / repeat
Step 3 above is the real magic: trying to identify the perturbations that cause WLv2 to “mis-score” workouts, and then being able to create a workout that includes this so you can train WLv2 to handle these. But as there are a very large dataset of possible perturbations it’s possible you never come up with a model that can handle the vast majority that need to be handled to make WLv2 worthwhile in the wild.
TR has a huge dataset of rides both structured and unstructured.
That’s why they can do it.
We’re not at the stage of having explainable neural networks . Still we’re curious.
Any rides can be represented mathematically with a bit of crunching. That’s the easy part.
A workout by workout comparison/matching (an outside ride looks close to this structured wo) is useful but only get you so far. It’s just matching; you want a correlation about results ( you get faster).
So I’m pretty sure they also took a large sample sets validation approach along those lines:
Take two groups of riders:group A does a bunch of unstructured rides (captured by pwr data) while B has more of a structured mix. For riders with similar outcome, their set of A rides produced the same result than whatever those in group B did. Group A vs B could be the same riders 's summer vs winter seasons.
You use some of the data set to create the model and the rest to test/improve that it predicts reliably.
Once you have rinsed & repeated that over and over, you can validate with new rides and that’s when you discover edge cases and/or weird outcomes. That’s when the head scratching gets intense and then they could decide
- avoid the issue by narrowing down what is in scope (e.g. a minimum of 10 rides is required, PLv2 doesn’t work yet with this kind of workout)
- smooth over the weird cases ( if x happens then exclude it as outlier or limit bump to 1%)
- get more samples covering the problematic use case (to cover this we need 50 more tempo rides at very low cadence )
- iterate the ML
Data scientists always say they want more data sets so let’s keep training.
(Full disclose: I’m in product management and I’ve worked on AI ML. I’m sure there’s more to it than the above but hopefully it helps explain a bit to those not familiar with this recent magic tech).
Training machine learning is easy. Feeding it useful data so the neural network it creates can make vaid predictions is hard. How a ride is described (the numbers fed into the ML) is the hard part. Raw power data is kind of useless as its the outcome of how hard your pushing, the combination of energy systems is being used, how worn out those energy systems are, how recovered the energy systems are… What happens in one moment in time is highly dependent on what happened in the past. Yes, much of that judging is what ML is to do but how you describe it to feed to the ML is important. Do you feed it the W’ value from during your ride? Maybe, but what if W’ isn’t really describing what its supposed to describe and so throws off the predictions the ML would make.
Just hope this takes less time then Duke Nukem Forever :-p
Quick link to the section in podcast 405 today, where Nate discusses Workout Levels Version 2.0 status and more:
It was a bummer to hear they’re still battling issues, but, as always, I love Nate’s transparency. He calls himself an over-sharer, but honestly, I appreciate it.
The only downside is the competition knows your plans better. I love when @Nate_Pearson is on the podcast cause his not sticking to a limited sharing of info in general, not just upcoming TR plans, makes for a better view of what is being talked about.
I doubt that is much of a downside. V2 is well into development so I doubt a competitor could release something similar sooner.
“They know TR’s plans better?”, what do you mean?
You can easily describe high-level what TR is doing the same way you can easily describe very hard things in simple words. Knowing the strategy is not a big deal, the algorithms and the data set are, and I think TR has (understandably) been very cagey about details. We all know that TR is working on scoring of outdoor rides. If it were simple and straightforward, they would have released that a while ago. Ditto for workout levels v2.
I’m not saying it is a big deal, but that is the biggest downside to the way he overshares. (as in I don’t think its really a downside)
Companies sharing future plans always comes with pros and cons. With physical goods there is famously the Osborne effect. (Osborne was a computer company, and they ginned up so much excitement around their next computer that sales tanked and the company went bankrupt.) With services that is not a issue.
You raise another issue, giving competitors a heads up and potentially allowing them to leapfrog you. I think this is what you are focussing on. It depends very much on the nature and level of detail of the information, and the timing. The general ideas are quite old. The first time I heard of people wanting to use ML-based methods was in 2015 (a friend of mine was approached by a major league baseball team to work for them as a consultant). So TR’s move was “obvious”. Executing on it is hard, though, and I feel like it is the equivalent of revealing that you are working on a rocket whose first stage returns to earth and can be reused. Simple idea, hard problem. I think workout levels and scoring of outdoor workouts is in the same category, a problem that is easy to explain and hard to implement. All the difficulties lie in the details.
We also shouldn’t forget about the upside of disclosing some of the plans: TR has a vibrant community that provides a lot of feedback. A lot of it good
Here’s a thought exercise: what is the value to Zwift / Systm / etc. of developing something like WL v2? Without all of the other infrastructure built up around plans to progress WL, WL v2 serves zero purpose. So I wouldn’t worry about a competitor beating TR to the punch. The bigger near term risk is that the approach that TR has gone down is a dead end. Which is exactly what happened with the original version of “adaptive training “ which a bunch of us tested before TR scrapped the approach and went with the current iteration
I remember hoping Workout Level 2 would be ready in time for the outdoors season… last year. So 1.5years counting, meanwhile workout levels are still completely useless for me with a majority of my rides not per plan in base season Doing about 5x3h endurance per week, endurance level is 3.6
My transition to structured build will be very bumpy
Well, a toast to hoping WL2 will be out this season
Really would like to hear an update on V2. Even a target date would be great to read, ex. Q2 '23 or whatever. As I’m transitioning out of JUST structured programs and about start outdoor riding & racing I would value Adaptive Training to understand my outdoor efforts and suggest Train Now rides accordingly. As it stands today I’m planning to cancel my subscription in 30 days and likely resume it in the fall.