Order matters: Distributional properties of speech to young children bootstraps learning of semantic representations

AbstractSome researchers claim that language acquisition is critically dependent on experiencing linguistic input in order of increasing complexity. We tested this hypothesis using a simple recurrent neural network (SRN) trained to predict word sequences in CHILDES, a 5-million-word corpus of speech directed to children. First, we demonstrated that age-ordered CHILDES exhibits a gradual increase in linguistic complexity. Next, we compared the performance of two groups of SRNs trained on CHILDES which had either been age-ordered or not. Specifically, we assessed learning of grammatical and semantic structure and showed that training on age-ordered input facilitates learning of semantic, but not of sequential structure. Follow-up analyses suggest that higher noun-density in speech to younger children combined with weight entrenchment could account for this effect. The persistent learning improvement is consistent with the neural commitment hypothesis in the second language acquisition literature, which asserts that L1 representation reduces neural resources available for L2 learning. Similarly, exposure to noun-rich input first but not last (age-ordered CHILDES), may induce a representational advantage for lexical semantic acquisition.


Return to previous page