Simpler structure for more informative words: a longitudinal study


As new concepts and discoveries accumulate over time, the amount of information available to speakers increases as well. One would expect that an utterance today would be more informative than an utterance 100 years ago (basing information on surprisal; Shannon 1948), given the increase in technology and scientific discoveries. This prediction, however, is at odds with recent theories regarding information in human language use, which suggest that speakers maintain a somewhat constant information rate over time. Using the Google Ngram corpus (Michel et al. 2011), we show for multiple languages that changes in lexical information (a unigram model) are actually negatively correlated with changes in structural information (a trigram model), supporting recent proposals on information theoretic constraints.

