Statistical learning research often assumes that learners collect global statistics across the entire set of stimuli they are exposed to. In naturalistic settings, this assumption of global access to training data is problematic because it implies that the cognitive system must keep track of an exponentially growing number of relations while determining which of those relations is significant. We investigated a more plausible assumption, namely that learning proceeds incrementally, using small windows of opportunity in which the relevant relations are assumed to hold over temporally contiguous objects or events. This local statistical learning hypothesis was tested on the learning of novel word-to-world mappings under conditions of uncertainty. Results suggest that temporal contiguity and contrast are effective in multimodal learning, and that the order of presentation of data can therefore make a significant difference.