For decades, implicit learning researchers have examined a variety of cognitive tasks in which humans seem to automatically extract structure from the environment. Similarly, statistical learning studies have shown that humans can use repeated co-occurrence of words and referents to build lexicons from individually ambiguous experiences (Yu & Smith, 2007). In light of this, the goal of the present paper is to investigate whether adult cross-situational learners require an explicit effort to learn word-object mappings, or if it may take place incidentally, requiring merely attention to the audiovisual stimuli. In two implicit learning experiments with incidental tasks that direct participants attention to different aspects of the stimuli, we found evidence of learning, suggesting that cross-situational learning mechanisms can be incidental without explicit intention. However, learning was superior under explicit study instructions, indicating that strategic inference may also play a role.