That’s a big difference in numbers - did you use the same power meter across the various tests?
150W for six hours is very low to me, and I’m much lower FTP. If you can hold 200w for six hours with a relatively low VI and without collapsing at the then it’s evidence that is the middle of what I would call Z2.
Even more certain if you can run a marathon afterwards.
Well that is confusing, testing is all good and fine but ultimately performance is the best predictor of performance so I’m going to run against the grain and say that 180-200 is a great range for you. If you can extend that out gradually to six hours, with fuel and water etc, then you’ll have a fine engine indeed.
Could you show those tests or the graphs? Would be interesting to see.
Maybe you’re a bit high glycolytic and in this case, I would follow the lower value if on the next day, there is some intensity workout or hard ride. And otherwise at first glance the 180 watt indeed seems more plausible. Keep in mind that it’s all a continuum. And there’s also daily variations.
Zone 2 is by definition a percentage of FTP, so I’m not sure how you’d measure the same thing with metabolic or VO2max testing.
At the end of the day, z2 in practice is just a pace you can ride for a long time without accumulating too much fatigue. Any test you do is just a surrogate for this hand wavy but pragmatic definition.
With all that in mind, your discrepancy here is just an artifact from testing protocols that are always imperfectly accurate and using different definitions for “z2” between tests.
To resolve that, think about what you can actually do in real life:
Is your FTP set right? Ie: can you ride at least ~35-40min at 318w?
Can you ride 6h at 210-220w without getting into threshold heart rate territory? If so, then that’s quite obviously and literally a pace you can comfortably hold for a long time without too much fatigue.
Put less stock in the tests, and more in what you can actually do IRL! Tests are just a starting point and in no way produce perfectly accurate data. They’re data to consider, but never more important than reality. When you get a discrepancy in the data like this, resolve it based on what you can do in real life