In language-mediated visual search, memory and attentional resources must be allocated to simultaneously process verbal instructions while navigating a visual scene to locate linguistically specified targets. We investigate when and how listeners use object names in visual-search strategies across three visual world experiments, varying the presence and location of an added visual memory demand. The results suggest that as long as objects in the display can be visually inspected throughout the trial, participants do not linguistically encode those objects. We suggest that instead they use the visual environment as an external memory, mapping the spoken word onto potential referents and using perceptual visual routines automatically triggered by the spoken word. The results are discussed in terms of flexible and efficient allocation of memory resources in natural tasks that combine language and vision.