WHOOP AI Coach Bias

A few months ago I reupped my WHOOP subscription due to getting back into training and having problems sleeping / recovering correctly. I used to use my Garmin Fenix (no subscription at the time) but found the sleep metrics to be a little lacking in accuracy or depth and preferred WHOOP’s recovery metrics to it.

Recently, WHOOP has launched an AI Chatbot integrated with your data named “Coach” which can prescribe all sorts of things and do a deep dive into your data. This interested me but really it boiled down telling me things I already knew or not having a long enough leash to continue with a prompt outside of its own domain so to speak.

I completed a TrainerRoad workout today, a Z2 hour long endurance ride, which for me resulted in 430 Calories burned. I use a Kickr v5 indoors and I believe it to be fairly accurate. My WHOOP however, using only HR said I burned 220. To my knowledge, using the total work registered by the power meter is a fairly solid metric to determine caloric expenditure, especially in a controlled environment. So in an effort to dispel any biases, I asked Coach why the stark difference.

It told me that “power meters can overstate calories for less efficient rider or understate them for highly efficient ones.” I brought up how HR can be affected by extraneous things too like caffeine, sleep, and stress to its response of “WHOOP’s algorithm is conservative to avoid overestimating, especially for steady, low-intensity efforts” and to ensure my strap was properly placed. It is, it just wasn’t a very hard ride.

It then went on to describe that HR is a better indicator of overall strain since it isn’t just coming from the pedals but instead the whole body (upper body, sweat, etc). And while I can see that holding some credence, when there is a power meter in the mix for a steady state ride indoors, I have to believe it is more accurate. I told it that I needed accurate readouts so I can track my caloric deficit and plan nutrition accordingly and it instead said that I may need help from member services and ended the chat.

Asking the same thing to ChatGPT, it revealed (what I am assuming is true) that Power based caloric expenditures are more accurate than HR ones for exactly the reasons I listed. It makes me wonder if WHOOP has trained their model (inadvertently or otherwise) to favor their brand over proven science, even to stretch the truth in some cases to ensure it doesn’t get shown in a bad light. To me, that makes it pretty useless and is kind of a troubling trend in AI especially with the proliferation of chatbots that are supposed to be used in unbiased analytical contexts. To be fair it is in “Beta” but in the months I’ve occasionally used it, it doesn’t really seem to be any different in its responses or ability.

Love TR though and their AI integration is fairly useful. Rant over, thoughts?

LLMs are designed to give you a good conversation, not accurate answers. Asking ChatGPT is about as good as writing your question on a bathroom wall.

If you listen to the podcast, you will hear frequently for the last many many years of episodes that power is accurate and heart rate is not. Stick with power.

7 Likes

That experience would make me want to drop Whoop immediately.

12 Likes

Indeed LLMs are just very good at the game of what comes next.

As of yesterday it was the nail in the coffin :headstone:

EVEN IF whoop was correct in all the silly things they said about power meters, in my experience with whoop going all the way back to 2016, the HR data from actual workouts is often WILDLY off. So like everyone is saying, disregard the AI coach.

Whoop 5.0 did not improve the HR sensor at all, from what I can tell by looking at HR data from a Polar H10 and the whoop 5 sensor, on the same ride.

3 Likes

The few times I played with ChatGPT to see if I could get anything useful, it has failed. I’ve had people argue with me about it, but it’s just the truth.

If I can’t get it to give me a correct answer on something I know, I can’t trust it to give me an answer I don’t know.

2 Likes

As an engineer, I find ChatGPT, Google Gemini, etc. to be quite useful for coding tasks, sometimes useful or at least mildly entertaining for image generation, and super useless for most else, especially for questions where objective correctness and small nuanced details matter. I’ve had friends rave to me about it generating training plans for them, but after all I’ve learned from the TR podcast, this forum, the Kolie Moore podcast, and my own physiology, I feel like the things it generates in this vein are shallow, genericized, self-contradictory, and generally lacking all around. Even with all the latest new models, they still fall short in answering questions with scientific correctness and nuance.

2 Likes

So WHOOP’s “Coach” basically said “trust me bro” and logged off when you asked a real question. Love that for a product charging a monthly fee to be wrong with confidence.

3 Likes

I honestly think it’s because these models have been exposed to reddit and every other forum out there plus every article ever written about training by any hack coach. They don’t know what to think.

There is so much debate in training and scientific papers are usually 6 weeks long and performed on 20 year old undergraduates. Papers never bring out the long game and multi-year periodization. So, in the end what is an LLM model going to actually “know” or be able to regurgitate about training?

I’d like to train an LLM on Empirical’s podcast and then ask it questions. That might be fun. Kolie is probably annoyed just reading that. :slight_smile:

The fact is that training debates are perpetual. For every person that had amazing success with LSD, there is another that made major gains doing sweet spot, another doing polarized, and another doing HIIT etc.

Mike Joyner said on the Inside Exercise podcast said “all roads lead to Rome, er Tokyo”. The 5000M race in the 1964 Tokyo Olympics was decided by 1 second. Essentially, the medal athletes were the same after years of training. The coaches were Igloi, Van Aaken, and Bowerman. All had very different training philosophies and coaching styles yet they delivered three different athletes to the finish all within a second of each other.

I’m sure businesses with a specific goal will train models, feeding them the exact information they want it to digest rather than rely on every random bits of text found on the internet.

4 Likes

Hmmmm. It depends. :rofl:

4 Likes

I tried using Whoop for both sleep tracking and training tracking for a year. What I found is, it’s great for sleep tracking. I can run experiments with food types and timing, meditation and mobility/stretching work before bed, bedroom temperature (65 degrees is best!) and other things and see the results really clearly. And optimizing sleep “performance” is the top thing I can do to improve.

However, trying to use my Whoop for any kind of training stress tracking and advice was a total failure. It would tell me things like, you need to get ten hours of sleep. Um no that’s not gonna work for me with training, a full time job and a family. Or, I would be on the 2nd of 8 HIIT intervals and it would tell me to stop right now, you’ve reached your strain limit – stop! Stop I tell you :slight_smile: Um no, I’m perfectly capable of doing all 8 intervals, and yes, it’s hard but I can do it.

So, bottom line: for me, the sleep tracking is accurate, actionable and highly useful. Now I only wear my Whoop strap at night. For activity/strain tracking, it’s not useful at all for me.

1 Like

If you’re quoting the Whoop Coach accurately, this is backwards lol.

1 Like