Statistical learning is assumed to play a vital role in language acquisition, yet it is unknown whether it is guided by a unitary, modality-general mechanism, or by several sensory-specific mechanisms. Consistent with the latter view, Seitz et al (2007) tested learners with multimodal input and found that statistical learning in one modality is independent of input to other modalities. We tested this assertion of independence by presenting learners with speech streams synchronized with a video of a speakers face. We used the McGurk illusion to manipulate the underlying statistical structure of the speech streams. Contrasting the independence hypothesis, our results suggest that participants can integrate audio and visual input to perceive the McGurk illusion during statistical learning, thereby altering the pattern of segmentation. Our results suggest that auditory and visual inputs are not processed independently, but leave unresolved whether the resulting representations are integrated across modalities or not.