I personally had awful repeatability issues with 2 Kickr Snaps. I could compare them to a power tap wheel and anytime (which was often) I found the devices now matching well I would fumble down a hole of normal calibration and the factory one. Getting them to match within 5% seemed more like blind luck than anything scientific in my situations.
I can’t tell, but it sounds like you did a TR test, then calibrated, then used the wahoo app. One would hope the factory calibration is only for the positive (when it comes to accuracy) but I’m not sure that’s a certainty.
I’d assume after your factory cal, going back to TR would yield at least similar measurements, so the answer to the question “is it actually the TrainerRoad app” is probably no.
I think many have SNAPs and like them but in my case I couldn’t bear the constant mental toll of needing to get it calibrated and accurate (at the time I didn’t have a power tap to use with every ride). I ended up getting rid of one in favor of a 2nd gen kickr, and my wife uses the other with the built in ANT power meter connection (via wahoo app settings in the trainer) in which case it seems to perform fine.