Research in visual cognition has demonstrated that scene understanding is influenced by the contextual properties of objects, and a number of computational models have been proposed that capture specific context effects. However, a general model that predicts the fit of an arbitrary object with the context established by the rest of the scene is until now lacking. In this paper, we explain the contextual fit of objects in visual scenes using Bayesian topic models, which we induce from a database of annotated images. We evaluate our models firstly on synthetic object intrusion data, and then on eye-tracking data from a spot-the-difference task and from an object naming experiment. For the synthetic data, we find that our models are able to detect object intrusions accurately. For the eye-tracking data, we show that context scores derived from our models are associated with fixation latencies on target objects.