ChatGPT - Can it be a good coach?

I’ve been using IntervalCoach for the last week or so and really enjoying it, I left TR as I didn’t feel it was playing nicely with commuting, been using TrainerDay , loving the feedback and suggestions from IntervalCoach, I think I will be subscribing and feeling the a TD, intervals and intervalCoach is a stack that will work pretty well for me

Only if you’re talking about math that is considered “fact”. Chatbot/LLM will analyse discussions around math and construct sentences based on probable responses, so a wonderful way to avoid speaking to your math teacher perhaps.

If that is considered a benefit.

Depends on whether the student has done enough on their own, including mistakes, to follow the “reasoning”. It is through the individual struggle that much learning sinks in. If you have not had that struggle, then you have not really learnt anything about solving problems. Your critical thinking, logical reasoning and mental agility has not advanced

I will give it a try as tI think he TR bug will not be fixed before my second A-race 2026. Will see how this ends. The ChatGPT questions been asked to get to a good plan gives trust that it will be a good plan and the suggestion to work too (the get better points) are spot on as mt weaknesses…

Math is special because it’s entirely based on deductive reasoning. There’s really no other field of human knowledge where this is true.

It means there is generally a way to tell if a statement is true or not, regardless of who makes it.

It’s very different than say medicine or history, where “truth” is extremely uncertain and often boils down to expert opinion. And if you’re not already deeply familiar with the field, there is no reliable way to know if the answer you’ve been given is erroneous or not.

Whereas with math, deductive reasoning provides an independent scaffolding to compare output to.

If an LLM told me that most 17th century Estonian peasants ate less than 2000 calories a day, it would be difficult for me to verify that statement.

If an LLM told me a function was a bijection, I can use deductive reasoning to double check that it’s true.

What I meant moreso is that Gemini is worse at specificity than other llms, and one common place I see this is in numbers that serve as identifiers or key concepts. So for example if you Google search whether a very specific bike component (identified by some sort of alpha numeric) is compatible with another component y and another one z, it will often treat similar but not exact components as interchangeable. Not always. A lot of it just seems due to pulling simplistic sources such as Reddit. I would similarly not trust it to be adequately specific for any sort of training advice. Regardless, I have reviewed thousands of prompt responses and Gemini is always the worst performing across a range of scientific and medical concepts.

IMO llms are good for giving you some inside Intel on training approaches and plans that otherwise is not common knowledge. Also good for help with injuries and bike fitting. Also good for learning more about training.

Always important to challenge the llm to provide alternative recommendations and challenge its assumptions.

I think AI can be just as good as the average $250/ month coach. In my experience most coaches in this range have too little experience and dedicate very little time to each athlete. Of course there are exceptions, but in general, me think ai is going to crush these people who are charging way to much for what they provide.

I don think you’re wrong and as you noted there are exceptions. Some people need a coach to hold them accountable, reassure them, and be amateur psychologist.

$250/month is at the lower end for a human coach, which means you get limited interactions, otherwise they won’t make any money off of athletes.

Moreover, the strength of a human coach is (in my view) not things like workout selection, but taking human factors into account, picking the right goals and metrics for athletes and such. That’s not something an LLM can do.

Interesting here is that I’ve been using Gemini as more of a consultation service instead of completely building my training plan. I wanted it to critique my history from the perspective of Sebastian weber/ Jan Olbrecht model for how threshold is impacted by vo2max and vlamax.

Perhaps because in my work we have extensive prompt training and always use the rule to keep a human in the loop I’ve been having very good feedback from Gemini. Do far it is far superior to tr ai, which is still almost random workout recommendation. There are a number of times that is had suggested how to find the “hard, but not too hard” intensity level with proper logical justification.

I would be wary of any plan that is purely generated by Ai, unless of course your prompt was sufficiently detailed for it to get it right in the first try, but most people don’t prompt at that level as the full prompt could be several paragraphs long.

Sounds to me as if you are rubber ducking your training plan. Accurate?

My experience is very mixed at best: LLMs are good at summarizing, but they can easily deceive the untrained eye. For example, any model I have tried produces decent general recommendations (e. g. a training plan should be based on principles like specificity and progressive overload).

The more you insist on logical reasoning (I’m in math and physics), the more it stumbles and breaks down. Sports science, which is based more on correlations might be more amenable, but even there I’d be cautious. LLMs pretend to present logical arguments, but ultimately they have no idea what logic is.

In my experience, the tasks LLMs fare best at are tweaking, rewriting and summarizing text, you can get referee reports on your own texts and such. Even then not all reports are created equal. Another strength is generating scripts.

Might this be because of the way you have been using LLMs, namely to figure out reasons behind you making certain choices?

TR AI is based on ML, i. e. its predictions are statistical. It literally doesn’t know cause and effect, it can just make predictions on your (and others’) training history. That’s has advantages as it can go way, way beyond what is currently known. It’s quite likely that TR AI has stumbled across things that no scientist has come across, not least because TR has a unique data set. The downside is that you have no mechanistic understanding based on these ML-based algorithms alone.

However, you could put an LLM on top of TR AI (and its successors) and use the LLM as a script generator. You could use it to generate several training plans and compare the predictions.

Personally (N = 1), TR AI’s prescriptions have been eerily accurate, the Bayesian neural net that predicts workout difficulty is scarily accurate. And it has suggested workouts many times that I thought were e. g. way, way too hard, just to be exactly has hard as it predicted (i. e. entirely doable).

You won’t get that from any traditional ML algorithm as statistics has no idea about cause and effect, all it encapsulates are correlations.

Very interesting. And 100% agree that LLMs have their limitations here. MLs are also quite limited though and as you say a combination of both would be interesting. ML though depends on a lot of training data. And the question is if any individual just becomes the average of all athletes then lacking individualization

If you have enough data, the answer is a clear no. That’s because you are not doing statistics in a classical sense, where you characterize a distribution by things like mean, median, standard deviation and such. Machine Learning is a fitting problem where you approximate the “true” function from data with a parameter-dependent function. The parameters are “knobs” you can turn to change what the function does.

ML algorithms (the computer science-side of the problem) are good at efficiently finding good sets of parameters based on (comparatively little) data.

A local LLM could work well for this. Running multiple models with them checking each other and setting strict rules. There is plenty of literature on training, even just using TSS and a strava API and any rules from your experience (like only 4 days cycling per week, or never ride over 100 miles).
Basically using multistep reasoning.
I use a local set up using QWEN3.6 and deepseek r1 and llama3.3:70b, around Haiku maybe Sonnet level. But it is the fact they can save info on their own workspace and iterate freely which makes them powerful and keeps hallucinations at bay. I use it as an assistant for my day job, but I think it could make a cool little cycling set up. only downside is you need a 64 gb ram M series Mac minimum.

Call me a luddite, but with the energy and water resource issues surrounding data centers, I can’t see the possible benefits of an “AI coach” being worth the trade-offs. There are so many established, free, and scientifically verified methods and resources for training plans already available that I think we are getting ahead of ourselves.

I don’t want a data center in my area, so I’m not going to inflate the demand for building more by using those resources for what I consider an unecessary novelty.

That argument is worthless,with all respect. I don’t want a sewage treatment plant next to me but I still use the toilet. I don’t want a bike factory next to me but I still ride a bike. The energy demand is real but AI usage is provided, sold and used. There’s no changing that.

Sewage treatment plants, or bike manufacturers provide essential services or environmentally positive alternative transportation methods.

Ai data centres are being built in places that are already struggling with water and energy availability, many of them in poorer communities. Then they drive up the prices of these essential resources for what, a bit more shareholder value? Absolutely worthless to the average individual.

Not going deeper here in that but this might be the wrong thread here for being against technology and AI. Not adding anything to the topic

Not exactly, I dictated which days are easy and which are harder, plus the main design of the monthly cycle, but the hard sessions were planned through a couple of factors. Basically this is my history, specifics about my previously measured vlamax, how to do intensity without over stimulating vlamax and i let it pick my vo2 and sweet spot workouts.

As others have said, an llm or agent can be trained towards specific athletic training strategies and make a deterministic conclusion.

I think my recommendations from tr are so off because it doesn’t know how to handle my run training. It tries with the fatigue prediction, but it is very simple. Plus as I said earlier I’m trying to minimize vlamax stimulation, but their suggested workouts are usually going to stimulate an over reliance on glycolytic energy prediction.