Unsupervised Clustering of Morphologically Related Chinese Words

Chia-Ling LeeDepartment of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Ya-Ning ChangInstitute of Linguistics, Academia Sinica, Taipei, Taiwan
Chao-Lin LiuDepartment of Computer Science, National Chengchi University, Taipei, Taiwan
Chia-Ying LeeInstitute of Linguistics, Academia Sinica, Taipei, Taiwan
Jane Yung-jen HsuDepartment of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan

Abstract

Many linguists consider morphological awareness a major factor that affects children's reading development. A Chinese character embedded in different compound words may carry related but different meanings. For example, ``商店(store)'', ``商品(commodity)'', ``商代(Shang Dynasty)'', and ``商朝(Shang Dynasty)'' can form two clusters: {``商店'', ``商品''} and {``商代'', ``商朝''}. In this paper, we aim at unsupervised clustering of a given family of morphologically related Chinese words. Successfully differentiating these words can contribute to both computer assisted Chinese learning and natural language understanding. In Experiment 1, we employed linguistic factors at the word, syntactic, semantic, and contextual levels in aggregated computational linguistics methods to handle the clustering task. In Experiment 2, we recruited adults and children to perform the clustering task. Experimental results indicate that our computational model achieved the same level of performance as children.

Files

Unsupervised Clustering of Morphologically Related Chinese Words (505 KB)



Back to Table of Contents