In order to successfully learn the meaning of words such as bin or pin, language learners must not only perceive relevant differences in the speech signal but also learn mappings from words to referents. Prior work in native (Stager & Werker, 1997) and second (Pajak, Creel, & Levy, 2016) language acquisition has found that the ability to perceptually discriminate between words does not guarantee successful word learning. Learners fail to utilize knowledge that they can otherwise use in speech perception. To explore possible mechanisms accounting for this phenomenon, we developed a probabilistic model that infers both a word’s phonetic form and its associated referent. By analyzing different versions of the model fitted to experimental results from Pajak et al. (2016), we argue that a mechanism for spoken word learning needs to incorporate both perceptual uncertainty as well as additional, task-specific sources of uncertainty.