I’m in a similar position to you, and it’s probably mostly because I’m stronger anaerobically than I am aerobically.
What has worked for me is to stop caring about FTP that much. FTP-based power-zone determinations assume everyone’s power-duration-curve has the same relation at all points to your FTP, which isn’t true. Sure it’s a reasonable starting point if you know nothing else, but that’s about it. TR tries to solve this by using progression levels.
After several years, I have some idea how my power zones typically relate to “FTP” as measured by any FTP test: ramp test gives me reasonable z2 and VO2/anaerobic targets. 20min test gives me reasonable threshold and SS targets.
If you are not that familiar with what works for yourself… just ignore AI FTP completely. Reflect on yourself and your riding and decide what you think a target power that you could complete 2x20 at is (or at least 2x15). Then go do this and see how it goes. If you can’t complete this, your target power is too high. Try it again in a few days at a lower power. If it’s too easy - you’re not getting close to or over LTHR by the last few minutes of the second interval in particular - then try again in a few days at a higher power. Once you’ve figured that out, start your threshold progression. You can just follow along with what adaptations are recommended.
I recognize that this is essentially arguing that going off of feel/HR is more accurate for threshold determination than FTP testing… and yes, this is what I’m saying. This isn’t true in everyone - some people get really good estimations of threshold off of FTP tests.
(and most people can once you have some idea about what your personal corrective factor it, but you need experience to do that)…
FTP tests all work by taking some power measurement and then multiplying it by a corrective factor (0.95 for 20min, 0.75 for ramp, etc). The issue is that this corrective factor should actually be a range (ex: ~0.88 to 0.95 for 20min, ~0.65 to 0.8x for ramp, etc) due to variation between people.
TR gets around this by using progression levels. But they run into issues if your predicted FTP is way off of what it actually is. For example, you might get prescribed a “4x5min @104%” threshold workout, which is a 1.7 PL workout… and then not even be able to complete that because it’s actually a 4x5min VO2max workout for you. Then you’re completely hooped as none of the prescribed threshold work you’re going to get is anywhere close to threshold for you.
This lack of precision is why if you are not the “average” person, you will get recommended power targets that don’t work for you. And why, if you can’t at least almost do 2x20, your threshold target is too high, and you need to look at other metrics to get a sense of where your FTP actually is.