Estimating Semantic Transparency of Constituents of English Compounds and Two-Character Chinese Words using Latent Semantic Analysis

Abstract

The constituents of English compounds (e.g., butter and fly for butterfly) and two-character Chinese words may differ in meaning from the whole word. Furthermore, the meanings of the words containing the same constituent (e.g., butter in “butterfingers”, or “buttermilk”) may or may not be consistent. Estimating semantic transparency of a constituent is usually difficult and subjective because of these uncertainties and ambiguities. It is rather unexplored why a constituent is considered transparent/opaque by raters, and how its polysemy correlates to its transparency. We propose a computational method for predicting semantic transparency based on Latent Semantic Analysis. We computed the primary meaning of a constituent by a clustering analysis and compared it to the whole-word meaning. The proposed method successfully predicted participants’ transparency ratings, and may explain the cognitive processes in raters when classifying semantic transparency of English compounds and two-character Chinese words.


Back to Table of Contents