Incremental Models of Natural Language Category Acquisition


Learning categories from examples is a fundamental problem faced by the human cognitive system, and a long-standing topic of investigation in psychology. In this work we focus on the acquisition of natural language categories and examine how the statistics of the linguistic environment influence category formation. We present two incremental models of category acquisition -- one probabilistic, one graph-based -- which encode different assumptions about how concepts are represented (i.e., as a set of topics or nodes in a graph). Evaluation against gold-standard clusters and human performance in a category acquisition task suggests that the graph-based approach is better suited at modeling the acquisition of natural language categories.

