Three humans interacted with a spiking neural network model that controls the lip and jaw muscles of a speech synthesizer. The model learns using spike-timing dependent plasticity that is greater when reinforcement is received. First the humans each selectively reinforced the model, encouraging it to more frequently produce vocalizations with speech-like syllabic elements. Afterwards, for each human-reinforced simulation a yoked control simulation was generated. All the sounds produced during the course of learning for all human-reinforced and all yoked control simulations were then judged on a four-point syllable quality scale by all three listeners (sounds were presented in random order). In all cases, these judgments indicated that the human-reinforced models produced more sophisticated babbling over the course of learning whereas the yoked control simulations did not. The results support this model of how canonical babbling, a major developmental milestone, may develop.