What Company Do Semantically Ambiguous Words Keep? Insights from Distributional Word Vectors
- Barend Beekhuizen, Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Saša Milić, Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Blair Armstrong, Department of Psychology, University of Toronto, Toronto, Ontario, Canada
- Suzanne Stevenson, Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
AbstractThe diversity of a word's contexts affects its acquisition and processing. Can differences between word types such as monosemes (unambiguous words), polysemes (multiple related senses), and homonyms (multiple unrelated meanings) be related to distributional properties of these words? We tested for traces of number and relatedness of meaning in vector representations by comparing the distance between each word type and vector representations of various "contexts'': their dictionary definitions (an extreme disambiguating context), their use in film subtitles (a natural context), and their semantic neighbours in vector space (a vector-space-internal context). Whereas dictionary definitions reveal a three-way split between our word types, the other two contexts produced a two-way split between ambiguous and unambiguous words. These inconsistencies align with discrepancies in behavioural studies and present a paradox regarding how models learn meaning relatedness despite natural contexts lacking such relatedness. We argue that viewing ambiguity as a continuum could resolve many of these issues.
Return to previous page