Templatic features for modeling phoneme acquisition

Abstract

We describe a model for the coding of speech sounds inspired by data on early speech acquisition in human infants. This code is obtained by computing the similarity between speech sounds and a large number of stored syllable-sized templates. We show that this code yields a better linear separation of phonemes than the standard MFCC code. Additional experiments show that the code is tuned to a particular language, and is able to use temporal cues for the purpose of phoneme recognition. Optimal templates seem to correspond to chunks of speech of around 120ms containing transitions between phonemes or syllables.


Back to Table of Contents