I recently decided to try to cross test my power meters for accuracy. I was hoping to find pretty consistent results like we see on DCR/GPLama vids, but no such luck. Every PM calibrated (zeroed) on head unit prior to each ride. Crank lengths consistent (175mm) and set-up correctly for pedals.
Some comparisons (all watts are average power over entire duration):
Indoor
Assioma Duo 187 Stages LR 187 Tacx Neo 2T 180 - Everything OK here as I would expect some drivetrain losses.
Outdoor - Road Bike
5/10 Assioma Duo 198 Stages LR 196 - OK
5/11 Assioma Duo 198 Stages LR 192 - 6 watts, but had some dropouts on this ride
5/12 Assioma Duo 194 Stages LR 190 - OKish
5/17 Assioma Duo 181 Stages LR 175 - 6 watts (3.4% difference)
5/18 Assioma Duo 194 Stages LR 185 - 9 watts (5% difference)
These two PMs seem to mostly agree, but if each is supposedly accurate to +/- 1-1.5%, differences of 3.5-5% indicate one or both is not.
Outdoor - Gravel Bike (most rides on road except noted)
5/19 Assioma Duo 165 Quarq 189 - 24 watts!
DCR mentioned doing some sprints when installing Assiomas, so I did that before next ride.
5/24 (mostly gravel) Assioma Duo 174 Quarq 184 - 10 watts (5.4%)
5/25 Assioma Duo 160 Quarq 175 - 15 watts! (8.6%)
Looking at the charts, it’s not dropouts or big spikes causing the differences. There is a consistent difference in power between meters within a ride. However, the delta does not appear to be consistent from ride to ride for any of the PMs, and the Quarq seems wildly optimistic… but oddly it does not ‘feel’ like I am riding easier when using that vs the others.
All of the above used compare the watts (except the first which used zwiftpower). Even using different comparison tools can create significant deviations depending on how they calculate the numbers.
Anyone else gone done this rabbit hole? Are PMs just not as accurate as advertised? A couple of watts isn’t a big deal, but 10-15 watts can be significant for workouts or race pacing.
Thoughts on what might be done next to find ‘the truth’?
Have you done systematic indoor testing like the following - multiple workouts of each type:
Power Meter Combo 1: Assioma Duo vs. Stages LR vs. Tacx Neo 2T:
Structured (ERG MODE for the Tacx) Sweet Spot Intervals
Structured 30x30 intervals
Structured Sprint intervals like 300% - 500% of FTP, depending upon what you can hit
Power Meter Combo 2: Assioma Duo vs. Quarq vs. Tacx Neo 2T:
Structured (ERG MODE for the Tacx) Sweet Spot Intervals
Structured 30x30 intervals
Structured Sprint intervals like 300% - 500% of FTP, depending upon what you can hit
And then used something like the DCR Analyzer to compare workouts and see where the power differs between powermeters? Especially for the Assioma Duos compared to the Stages LR, it’s critical to see how Left power compares to Left, and Right to Right. As dual sided power meters on Shimano cranks are notorious for poor right (drive side) power accuracy due to the crank arm design.
I’ve done a bunch of tests like this, both indoor and out.
You’ve done a good job as a start by comparing averages over the entire duration of a ride, but power meters shouldn’t be judged on how close they are on average, they should be judged on the conditions under which they differ, and by how much.
That means you want to compare moment-by-moment when you’re sprinting, riding steady state, climbing, descending, and just tootling around at low power; at low cadence and at high; and all combinations in-between.
I do this by comparing virtual elevation profiles, but there are other ways.
To back this up - Averages from a recent ride I did 183W (Quarq AXS) / 172W (Quarq AXS). Same ride. Same power meter. Same duration. Two different head units. Selecting a 7 minute section of the ride with no coasting: 238/238.
The iGS800 GPS from iGPSport has some attachment issues with power data when coasting… aka sticky watts.
I went down this rabbit hole a couple of years ago. I found out I had a significant L/R imbalance. Not only though, did it vary with intensity and fatigue but also was completely different inside or out. I put that bit down to the bike not moving around on the turbo trainer as opposed to the road. I eventually moved on from this as insanity beckoned.
Right. Years ago when we were collecting all the packets sent from an ANT+ power meter, we saw that different head units did (slightly) different things with the data they received. This iGPSport head unit sounds excessive, but the general issue that every manufacturer has slightly different proprietary algorithms for what they do is annoying as hell.
(This, btw, is one of the reasons why I don’t do tests with long steady state stretches at a single power or cadence – quite often, these sorts of issues are easier to spot when you’re accelerating or decelerating, or at rapidly changing cadence)
I went down this rabbit hole four years ago. My only conclusion was to use one power meter for training. Bypass the trainer’s power meter at all costs. If you really train on two bikes, then get power meters (like pedals) that can be adjusted to match each other.
I also found out that my Stages performs way better with more expensive, retail pack Duracells. Cheap 10 packs off of Amazon sucked. The Stages is probably voltage sensitive.
I haven’t done extensive indoor testing on these because 1) it’s outdoor riding season and 2) I’m not motivated to swap cassettes & drivers on trainer to figure this out. I have however done outdoor intervals, and the discrepancies are fairly consistent at around 10%
I am aware of the challenges with Shimano crankside PMs, but I have the stages with the ‘arm’ and the Stages vs. Duo is pretty good. The spider based Quarq is the outlier… unless it’s right and both the Duo and Stages LR are whack.
I got the Duos because the Stages said I had a significant LR imbalance. I was afraid it was inaccurate and so got the Duos to check. I basically confirmed the LR imbalance, so any L-only PMs will give me wonky numbers, but a LR (pedal or crank) and a spider should be similar.
So the cadences appear to agree pretty well, so the difference must be from the torque measurements. It also might appear that the torque difference grows with higher torque?
Check that you have the right crank length for the Assiomas, and do a static check on the Quarq. Quarq used to let you do a calibration and re-set your slope if needed.
I’ve mentioned this before but it really would be great if trainerroad looked into the data and answered this question once and for all.
They have millions(?) of rides where users have used power match and I believe from interactions with trainerroad support staff that power data is recorded and stored from both the power meter and the smart trainer which could then be analysed after the fact.
If they crunched the numbers and did a bit of multivariate data analysis they would be able to determine…
Which models of power meters read higher/lower
Which models of smart trainers read higher/lower
Which models of power meters are most consistent units to unit
Which models of smarts trainers are most consistent units to unit.
I’ve now tested 3 Quarq, 1 Assioma, 1 Garmin rally, all against my TacX Neo 2T.
The most recent quarq is the only one outside 2% variance. Just calibrated it so it reads the same as my trainer, verified with DCRainmaker Analyzer at multiple power levels, multiple times. Done and Done.
They’re unlikely to do it though, because I think that the majority of TrainerRoad users are blissfully unaware of these difficulties using power meters and multiple devices. If TrainerRoad published data showing the high levels of variability shown by the Maier et al, more TR users would see that and it would undermine people’s confidence in power meters and power-based training, which is TR’s foundation.
There’s nothing wrong with power-based training of course, you just have to be careful and aware of the difficulties discussed in this thread.
What’s interesting about the study (when read) is that SRM actually had more variation (deviation), but because they had 12 units worth (versus 1 unit for Garmin, or two for Power2Max), the averages make it look like SRM was more accurate than the others.
I do appreciate they updated the table at some point, previously they didn’t list any models, they just had brands (which, way-back-when, when it came out, I gave them crap for).
That said, the challenge for most power meters today isn’t indoor steady-state. It’s outdoors and all variables there.
But, I think in some ways, even despite the studies flaws, if you ignore names, it shows the challenges of power meters in terms of variability across units. In this case, taken from a bunch of random cyclists that volunteered to bring their bikes in. All too often people obsess over 2-3w from their power meter, when in reality, their power meter is just being 2-3w (or 5-7w) high or low for no particular obvious reason that day.