Speakers and listeners in a dialogue establish mutual understanding by coordinating their linguistic responses. When a visual scene is present, scan patterns on that scene are also coordinated. However, it is an open question which linguistic and scene factors affect coordination. In this paper, we investigate the coordination of scan patterns during the comprehension and generation of scene descriptions. We manipulate the animacy of the subject and the number of visual referents associated with it. By using Cross Recurrence Analysis, we demonstrate that coordination emerges only during linguistic processing, and that it is especially pronounced for inanimate unambiguous subjects. When the subject is referentially ambiguous (more than one visual object associated with it), scan pattern variability increases to the extent that the animacy effect is neutralized.