Consider the 220 minus age formula for maximum HR. This formula *does not* calculate *your* max HR. It is an *estimate* of the *average* maximum HR for a given age.

If you know regression, 220 minus age is a regression with intercept = 220 and slope = -1. In the paper they said their were *close* to those numbers, so they published those so people would remember them easily.

But back to my point. Any formulas for max HR are going to come from some sort of regression on some sample of people. The main thing is that you will be getting *the average* max HR given age and whatever other characteristics might be in the regression. Some people will be above that max HR. My actual max HR is at least 6 beats/min higher than the average max HR for my age from the 220-age formula. Some people will have their max HR under-predicted by more than 6 beats. Conversely, some people will have their own max HR be lower than the formula’s prediction. This is the nature of averages.

OK, then what even is the point of estimating average max HR? Well, imagine that maximum HR is a useful training parameter, imagine that it isn’t easy to test, and assume that most people don’t know their own max HR. In that situation, plugging in the average max HR for your age *until you have a better estimate* is not a terrible idea. Just like with an FTP test, you would have to adjust your predicted max HR according to outside inputs, e.g. if you feel you’re working very hard but you’re still 10 bpm below your max HR.

What counts as a better estimate? If that effort during an event was something like a 1-5 min max effort, then I think that should get you pretty close to your actual max HR. I think that your max HR during a ramp test is going to be close to your actual max HR - I think that I could go a few bpm harder than my last ramp test max HR, but you get the point. If that effort during a race was more like a threshold or sweet spot effort, then I think it’s still an underestimate.

Going a bit further from the topic, regression can have two uses. First, how much do we think the average max HR declines with age? Well, we know from the slope that it’s about 1 bpm per year. (I think most of the ‘better’ formulas have it as a bit less than 1 bpm, and in that original paper it should have been a bit less than 1.) That’s a reasonable goal in itself. That’s using regression for inference.

Second, we could use regression for prediction. OK, so some of you with entry level statistics might remember that you want the regression’s R^2 to be high. You might see a paper, and see that the R^2 is low, and go ok, that’s the main critique of the paper. Actually, this may be a shallow understanding of R^2, and it doesn’t matter as much if your only goal is inference. However, if your goal is a reasonably accurate prediction of some quantity, then yes, you need the R^2 to be high. I don’t think that original paper published the R^2. If you look at papers on predictions of max HR, you can check the R^2. I am guessing that they won’t be that high, maybe in the 0.3-0.4 range. R^2 ranges from 0 to 1, and predicting biological parameters in this context is probably going to have low R^2s. R^2s in psychology are also low.

What are some other contexts where we might use regression for prediction? The Framingham heart risk score is one. It predicts coronary heart disease from various risk factors. That’s a binary event, whereas your standard regression is for continuous outcomes. But the principles are still similar. Not regression, but when we use screening tools for things like depression or cognitive impairment, we are also trying to predict the presence of something, and we validate them in different ways.