Different hypotheses were proposed concerning the role of talker variability in lexical learning. It remains unclear whether new phonetic categories are acquired as episodic memory traces with talkers’ voice information preserved or as abstract categories. The current study investigated the role of voice similarity in perceptual learning of Cantonese tones. Six high-variability training sessions were given to 12 Mandarin speakers. Voice similarity was controlled in the training and pre-and posttests. Results indicate that the training positively transferred to both similar and dissimilar talkers. However, in the pretest, the performance was not significantly different between similar and dissimilar voices, whereas significant better performance was found in the similar voices in the posttest. These results suggest that learners retained speakers’ information in the learning process and made use of such information for future perception. This implies that lexical tones are probably encoded episodically in the mental representation of Mandarin L2 learners.