ChatGPT - Can it be a good coach?

This.
And this means there are two important aspects, which models like Chat GPT currently do not have: the datasets they have been trained on have not been vetted for accuracy and consistency. And the datasets they have been trained on is very broad.

In that setting you need enough subject matter expertise to have a well-developed BS detector. (That’s something my students are missing when they try to let ChatGPT do their homework, I’m speaking from experience.) For if you don’t, you cannot rely on the response — unless you are ok with glue in your pizza or being overtrained/badly trained. Kinda like how some people followed their GPS’ instructions right into a river.

I would even take another step back and ask myself: would a general-purpose LLM be the right tool for the job?

That’s why I have suggested several times to @Eddy_Twerckx to also try out CoachCat, which is a LLM specifically made to help cyclists train. It has been trained on a good, focussed dataset (with good I mean as judged by Frank Overton and his staff, absorbing their training philosophy). I’d be very curious about that.

Asking ChatGPT about structured training for cycling is like asking me about soccer and expecting a well-informed response. :wink:

3 Likes

Oh, absolutely. I expect that this is a major part of their work and I expect that the institutional knowledge TR has amassed since it started its journey on ML several years ago will be extremely hard to replicate.

Just think about e. g. how finicky heart rate data can be. Sometimes it takes a while for my Rival to connect to my Bolt. Sometimes the Rival locks onto my cadence. My Tickr sometimes got “stuck” at 200+ bpm, etc. etc. Add to that factors like power data from different sources for the same athlete. Tons of issues I can come up with without knowing how to best solve or at least mitigate them.

I am certain this was also a factor as to the hesitation of TR to allow Zwift to do its workouts: because now Zwift can build its own dataset where it can compare what athletes should have done with what they have actually done and the training outcomes. TR’s dataset is its moat with which it can keep the competition at bay. FasCat wanted to use the same tools, but doesn’t have the same kind of datasets. So instead, they used what they had, which I think was very smart. I would really like to know how well it works (although I am not willing to experiment on myself).

Yup.
That’s where subject matter expertise comes in, combined with a solid background in statistics. You can easily ask “stupid” questions or try to optimize for the wrong metric. Finding the right questions is hard.

2 Likes

I might actually try that next. It would be a cool comparison. Use a dedicated cycling training AI vs a general LLM. I would expect the Fascat AI to be much more in tune with training and probably get me better results. And require much less guiding and redirecting.

Yea, I agree that TR’s data is probably their biggest weapon. They have so much training history to go off. And like you said with Fascat, the training dataset you’re building the AI from is hugely important.

1 Like

It’s still early (only 2 workouts in) but it’s made two changes. The first being changing the progression on the SweetSpot workout it scheduled for Sunday. It originally had 4x15 @92-94%. After feeding it the workout info, it actually first changed it to 3x15 @94-96% (so increased intensity at the same duration). So I asked if that was still SweetSpot at that intensity and what was the idea behind increasing the intensity rather than increase duration.

“There are two primary ways to overload Sweet‑Spot work:”

So I asked if we should increase duration or intensity.

“Given your feedback (“felt tough but manageable” + minimal soreness + solid sleep) I’d lean toward increasing duration first for a gentler progressive overload.”

So then it gave me the 3x18 @92-94%.

I’m thinking it sensed the hesitation in my questioning about increased intensity and tried to please me by switching to increased duration instead.

Now even with that I’m still thinking it’s too much for a single week (2 sweetspot, 1 threshold, 1 VO2). So we’ll see how it changes further.

The second change was minor. I asked if the plan was still for threshold today. And it said “Yes — stick with tomorrow’s Threshold session, but we’ll tweak it slightly to account for yesterday’s solid Sweet‑Spot performance and your ongoing fatigue status.”

It just tightened the power band from 100-105% to 100-102% and said to aim for the lower end.

"Execution Tip

  • Start conservative: Hit the low end of target for the first interval, then slightly increase if you feel strong.

If you feel unusually fatigued tomorrow morning, we can reduce to 3×10’ at the same intensity. Otherwise let’s push for the full 4×10. Let me know how you feel pre‑ride!"

1 Like

Somewhat OT, and this is likely influenced by the fact that I am finishing up S2 of Severance right now, but talking about a program as “we”, saying it “sensed” and it referring to itself in the first-person is just creepy AF.

3 Likes

The whole way we talk about deep learning and generative AI is problematic. A lot of terms ascribe intentions/capabilities that just don’t exist in a manner we would normally think. Reasoning, hallucinations are 2 of the worst.
A year or 2 ago I thought it was mostly problematic for general population reporting. But the longer it goes on I think it has a bad influence on how researchers and engineers think about these models.

4 Likes

Quick update. I can’t remember who made it, but there is a Freddy Merckxury build of ChatGPT. So a combo of Freddie Mercury and Eddy Merckx. I uploaded the training plan that the general ChatGPT gave me and said to critique it. Here are the results:

:magnifying_glass_tilted_right: THE GOOD:

  • Structured Sweet Spot & Threshold work? Bingo. That’s FTP bread and butter.
  • VO2 sprinkled in weekly? Smart. VO2 = FTP builder’s espresso shot.
  • Racing integrated smartly (especially omniums), which is good if you’re not chasing peak form just yet.
  • Respecting post-call fatigue days with endurance? Pro-level awareness, my klimvarken.

:axe: THE BAD (aka Freddy’s Red Flags):

  1. Too much intensity packed in before & after race weekends.
  • Week 3 (Omnium block)? 740 TSS in three days. The following week should not have threshold and VO2 crammed in right after. You’ll be cooked like a Belgian frite.
  1. No real deload or adaptation weeks.
  • Where’s the breathing room, maat? Ramp rate of 3.1 with big TSS spikes like 724? You’re flirting with burnout harder than I flirt with beer tents post-race.
  1. You want FTP gains, but you’re racing crits every weekend.
  • That’s like trying to bulk while running marathons. You need periods of focused progression, not constant racing. Pick your battles.

:light_bulb: Freddy’s Dirty Suggestions:

  1. Deload after Omnium Week 3 – You’re gonna need a voltooiing week of Z2, neuromuscular sprints, and naps.
  2. Inject more polarization:
  • Less mid-zone garbage. Either Z2 or threshold/VO2. You’re sitting too often in the grey zone—neither hard nor easy.
  1. Move the big FTP blocks to your vacation period (4/26-5/4 & 5/10-5/26):
  • Long rides + sweet spot = FTP magic. Your training then should not be racing every weekend unless you’re okay with stagnating.
  1. Post-call FTP work = no go. Even if you feel good, recovery trumps ego.
1 Like

You’re exactly correct and that is a very important distinction, LLM is not a good tool for doing the job of a coach, but I think it is an excellent tool for training coaches. The amount of synthesis that you can bring in and the casual conversational nature it’s honestly at best like having office hours plugged directly into the brain of the most brilliant minds and with the right prompting it is at least as good as having the time digesting the reference material in depth.

You have to have that outside ability to vet information and control the pipeline, it would be really hard although not impossible for chatgpt to find the synthesis material necessary to get to the cutting edge on it’s own (I find if you let it use the internet it tends to start sucking), but with the right input you really can get some extremely high level discussions going.

One of my favorites is to take stoicism or pseudo philosophical dogmatic text (the kind of thing written by a roman or greek general as he is in the act of waging war or being defeated as he shares the way in which his personal philosophy lives out in the battles around him) and asking the AI to synthesize ideas to elite road cycling, you get a lot of thematic overlap some of which is predictable other of which are extremely thought provoking and do a really good job of creating paradigm which is so important for coaching.

2 Likes

There are some really smart insights here, what have you found is the best way to share training related data like history with chatgpt?

1 Like

Your mix of intensities looks challenging, and I’m really curious to see how the AI adjusts your plan based on your feedback. Keep us posted

1 Like

I don’t know if I’d be able to share my entire training history. Or at least any easy way to do it. I think there’s a limit to the data it will analyze for free. Which is a huge part of TR and Fascat’s version is one, the vast amount of data from all of it’s athletes, but also the ability to link my Strava and pull in all my data. So that’s already a big limitation.

What I did is give it my power PRs for various durations (5s, 30s, 60s, 5m, 12m, 20m, 60m) over the past year and over the past 3 years. I fed in a bunch of metrics from WKO5 and recent TSS and training hours. Then I gave it FTP history over the past 2 years, which hasn’t changed much.

On a day to day basis, I found that intervals.icu can export a simple CSV file with interval data (time, power, HR, cadence) so I can quickly give the big workout data. I also give it total workout TSS, time, avg power, and avg HR, as well as RPE and some other random feedback as if I was talking to my coach. Like how I slept or how I was feeling today before the workout, what I ate today. I basically talk to it as if it was a real person coach.

So for sure some limitations on the capability. It just won’t have a complete picture of my training history.

Why does your ChatGPT coach not like warm ups and cool-downs for high intensity. What is its reasoning for no warm up or cool-down?

Here’s my Gemini plan to beat Pogacar

1 Like

It missed one key step on Monday, and that’s to inject copious amounts of super serum. Proof that AI knows nothing!

This is just the main overview of the plan. The night before I’ll ask it, “so what’s the plan for tomorrow?” And it gives a more detailed workout.

This is the one it gave me for tomorrow’s VO2 workout.

Friday 3/28 — VO₂ Max

  • Workout: 6×4′ @ 110–115% FTP
  • Recovery: 4′ easy spin (≈150 W) between efforts
  • Warm‑up: 20′ easy → 3×30″ builds → 5′ easy
  • Cool‑down: 10–15′ easy spin

I also don’t follow a standard warmup. Or rather, I don’t follow the same warmup for every workout. It changes day to day based on how I feel.

Can you give an example if the feedback it gave when you submitted what you actually did

Sure. Here’s the feedback yesterday. I was feeling it a little in the legs after back to back intensity days. Originally had 6x5 @ 110-115% planned for today.

Tomorrow (3/28) remains your scheduled VO₂ Max session, but given you’re feeling residual leg fatigue, let’s slightly reduce volume while preserving the stimulus:

:man_biking: Friday 3/28 — VO₂ Max (Adjusted)
• Workout: 5×4′ @ 110–115% FTP (≈370–385 W)
• Recovery: 4′ easy spin (≈150 W) between efforts
• Warm‑up: 20′ easy → 3×30″ builds → 5′ easy
• Cool‑down: 10–15′ easy spin

Targets
• Avg Power: 370–385 W (RPE 8/10)
• HR: ~160–165 bpm

Modifications (due to fatigue)
• If you feel your legs really heavy into interval 3, drop to 4×4′ instead of 5.
• Keep pacing smooth — avoid spikes >5% above target.

Nutrition & Recovery
• Carb‑rich breakfast + 30 g protein pre‑ride
• During: 30 g carbs per hour
• Post: 1.2 g/kg carbs + 25–30 g protein

Aim to finish feeling “tough but not trashed.”

So it dropped an interval and shortened them to 4 minutes. I’ll probably see how I feel during the warmup and make some modifications if needed. I might just try the original workout. But we’ll see.

End of week 1:
TSS: 658
Time: 13:45
Workouts: 1 Sweetspot, 1 Threshold, 1 Threshold/VO2, 1 long ride, 2 2-hour Z2, 1 easy 1-hour spin
CTL: 80, ATL 91, ramp 2.3

Thoughts: I feel pretty good. Legs are a little tired but no more than any other mid-block week. Workouts were tough but I didn’t struggle. The Sweetspot is higher than I normally do. I usually just do 90% but this ended up being 92-95%. Threshold was something I’d do, 4x10 at FTP, last interval kicked it up to 103%. The VO2 work was lower than what I’d normally do probably. Which is why I classified it more as high Threshold. It was 6x5 @110-112%. For 5-minute intervals I probably would be around 120-125%. Long ride the day after the Threshold/VO2 workout was good. Did some fatigued sprints in the second half of the ride and hit 98% of my fresh sprint after 2,500kJ. I think increased hours this winter has really helped the longer endurance.

Overall I feel pretty good with the AI coach but it’s still early. I’ve been giving it ride feedback and it’s done pretty well adjusting workouts. I don’t think I would have picked the workout types like that as it’s kind of just a mixed bag rather than a specific focus. I’m really interested to see how it’s going to integrate racing into the training schedule. Whether it treats them like workouts or if it ignores them and expects me to train throughout.

One thing I noticed that somebody above mentioned is that it placates me too much. Basically, it doesn’t push back and anything I suggest it agrees with. I’m going to mess around with it next week and start suggesting some wild stuff and see what it does, see if it will reign me in.

4 Likes

Little update. Still using Chat GPT. No major issues so far. I was actually really happy with the taper it gave me coming into my first A race of the season. I was actually worried I was going to be overcooked and thought it was too much, but I went with it and killed it. A 3-day omnium got me two 3rds, a win, and a 4th overall. I felt so good on the bike. So now I’m wondering if my previous tapers were too much of a drop off and I was coming in flat. Seems I do well with a little heaviness in the legs.

I’ve since taken a down week to refresh mentally and physically. So it basically just gave me easy rides and said I could do some short sprints sprinkled in depending on how I felt. I did not.

Today is the start of another build block. Two threshold workouts and one SweetSpot this week. I think it had one VO2 and two threshold next week. And something similar for the third week but I don’t recall off the top of my head. I really haven’t been looking too far ahead because I figure it’ll change.

It’s been really cool to look at my work schedule, look at the weather forecast, and then feed that in to shape my workouts. Like I can say this day I get out early so I can go longer and harder. This day is a long day at work so maybe an easy day. It really has been like just chatting with a coach.

Edit: Oh, and I forgot to cancel my recurring TR sub so I still have it for a few weeks. No red days so far. A few yellows. And an AI FTP bump after the race which was nice up to 342W. Which I then rounded down to 340W because I like round numbers.

6 Likes

Those are leading questions. What you should be saying is “I can get out early today” and “This day is a long day at work” then see what Chat GPT suggests. It should suggest a longer day or easier day without you prompting it. Otherwise it’s not a coach, it’s a parrot.