Visual memory for naturalistic scenes is mediated by: amount of exposure, semantic content, and type of encoding. These factors might interactively contribute to scene memorability. Thus, we tracked computer-mouse movements during an encoding phase where participants verified the congruency of sentence and scene pairs, which varied in plausibility. The presentation time of the scenes was also manipulated. Subsequently, in an unexpected recognition phase, participants had to indicate whether they remembered scenes (old and new). Recognition improved when correct verifications were made during encoding especially: when the scene was implausible, the stimuli pair congruent, and for longer presentation times. When comparing the trajectories between encoding and recognition, we found greater hesitancy during encoding, especially for implausible scenes in incongruent pairs, decreasing as presentation time increased. These results provide novel insights into the factors modulating the memorability of naturalistic scenes.