Previous research has shown that listeners follow speaker gaze to mentioned objects in a shared visual environment to ground referring expressions, both for human and robot speakers. What is less clear is whether listeners exploit speaker gaze to infer referential intentions (Staudte & Crocker, 2010), or whether benefits of gaze can be more simply explained by (reflexive) gaze following (Friesen & Kingstone, 1998). To investigate this issue, we conducted two eye-tracking studies which directly contrast speech-aligned speaker gaze of a virtual agent with an arrow. Our findings show that speaker gaze is beneficial to listeners only when the order of gaze cues matched the order of mentioned objects in the utterance. Similarly timed arrow cues, however, benefit listeners regardless of this order. These findings are consistent with the view that gaze is interpreted as reflecting the speakers referential intentions, while other visual and referential cues can be exploited more flexibly.