Comparison of the ability of different computational cognitive models to simulate empirical data should ideally take into account the complexity of the compared models. Although several comparison methods are available that are meant to achieve this, little information on the differential strengths and weaknesses of these methods is available. In this contribution we present the results of a systematic comparison of 5 model comparison methods. Employing model recovery simulations, the methods are examined with respect to their ability to identify the model that actually generated the data across 3 pairs of models and a number of comparison situations. The simulations reveal several interesting aspects of the considered methods such as, for instance, the fact that in certain situations methods perform worse than model comparison neglecting model complexity. Based on the identified method characteristics, we derive a preliminary recommendation on when to use which of the 5 methods.