The reported study examined whether the processing of spatial verbal information interferes in the visuo-spatial sketchpad with the execution of eye movements, associated with viewing pictures and reading. Seventy-four students were randomly assigned to six groups, resulting from a 2×2×2 mixed design, with spatial secondary task (with vs. without), text contents (visual vs. spatial), and text modality (spoken vs. written) as independent variables. Consistent with our assumptions, learners with text containing spatial contents showed worse recall performance than those with text containing visual contents. Furthermore, written presentation of text containing spatial contents loaded the visuo-spatial sketchpad to a higher extent than spoken presentation. Implications of these results for learning with multimedia are discussed.