Past theoretical studies on word learning sought to explain the speed of children’s word learning using models sampling from a uniform word frequency (WF) distribution. We consider more realistic nonuniform, long-tailed WF distributions (i.e., Zipfian or power-law). Our new mathematical analysis of a recently-proposed simple learning model suggests that the model is unable to account for word learning in feasible time for Zipfian distributions. Considering children do learn these distributions, we propose a type of self-directed learning where the learner can help construct the contexts from which they learn words. We show that active learners choosing optimal situations can learn words hundreds of times faster than learners given randomly-sampled situations. In agreement with past empirical studies, we find theoretical support for the idea that statistical structure in real-world situations--perhaps influenced by a self-directed learner, and/or by a teacher--is a potential remedy for learning words with Zipf-distributed frequency.