Infants are bombarded with a bewildering array of events to learn. In such an environment, referential cues (e.g., gestures or symbols) highlight which events infants should learn. Although many studies have documented which referential cues guide attention and learning during infancy, few have investigated how this learning occurs. The present eye-tracking study provides clear evidence for a social scaffolding process: When preceded repeatedly by communicative signals (i.e., a face addressing the infant), 9-month-olds learned that arbitrary cues predicted the appearance of an audio-visual event. Importantly, the arbitrary cues continued to guide learning of these events, even after the face disappeared from the screen. A control condition confirmed that learning from arbitrary cues alone was unsuccessful, and that eventual success was not just due to extended practice. These results are discussed in terms of a theory of cue scaffolding.