Prior research has shown that adults can make rapid use of visual context information (e.g., visual referential contrast and depicted agent-action-patient events) for syntactic structuring and disambiguation. By contrast, little is known about how visual context influences children’s language comprehension, and some results even suggest children cannot use visual referential context for syntactic structuring (e.g., Trueswell et al., 1999). We examined whether children (unlike adults) also struggle to use other kinds of information in visual context (e.g., depicted events) for real-time language comprehension. In two eye-tracking studies we directly compared real-time effects of depicted events on children’s (Exp1) vs. adults’ (Exp2) processing of spoken German subject-verb-object (SVO) and object-verb-subject (OVS) sentences. Both of these word orders are grammatical, but OVS is a non-canonical structure. Five-year olds are at chance in understanding even unambiguous OVS sentences in the absence of visual context (Dittmar et al., 2008). If children can use depicted events rapidly for syntactic structuring, we should find similar visual context effects for them as have been reported for adults (Knoeferle et al., 2005), and similar gaze pattern as for the adults in the present studies. Gaze pattern in the present studies suggested that events depicting who-does-what-to-whom incrementally influenced both adults’ and 5-year-olds’ visual attention and thematic role assignment. Depicted-event information helped children to get rid of their initial preference for the preferred SVO structure when interpreting OVS sentences. However, visual context effects were subtly delayed in children (vs. adults), and varied as a function of their accuracy and cognitive capacity.