I went for the factory fitted option as it removed any user error or ambiguity with fittting. Until the reviews are in I would be sticking with that if I was purchasing another.
That said, the first test above showed parity whilst the second showed almost identical graphs but with a constant variation which could point to calibration.
If the graphs were inconsistent in the variation in power I would be really worried