Predicting Lexical Norms Using a Word Association Corpus

Abstract

Obtaining norm scores for subjective properties of words can be quite cumbersome as it requires a considerable investment proportional to the size of the word set. We present a method to predict norm scores for large word sets from a word association corpus. We use similarities between word pairs, derived from this corpus, to construct a semantic space. Starting from norms for a subset of the words, we retrieve the direction in the space that optimally reflects the norm data associated with the words. This direction is used to orthogonally project all the other words in the semantic space on, providing predictions of the words on the variable of interest. In this study, we predict valence, arousal, dominance, age of acquisition, and concreteness and show that the predictions correlate strongly with the judgments of human raters. Furthermore, we show that our predictions are superior to those derived using other methods.


Back to Table of Contents