To date, little research has addressed the cognitive resources underlying gesture comprehension. Here we used a dual task paradigm to test the relative importance of verbal and visuo-spatial WM. Participants encoded either a series of digits (verbal load) or a series of dot locations in a grid (visuo-spatial load), and rehearsed them as they performed a discourse comprehension task, in which they watched a video of a man describing household objects, viewed a picture probe, and judged whether the picture was related to the video. Next, participants recalled either the verbally or visuo-spatially encoded information. Performance on the discourse comprehension task was always better when the speaker’s gestures were congruent than incongruent. However, the congruency advantage was smaller when the concurrent memory task involved a visuo-spatial load than when it involved a verbal load. Results suggest that taxing the visuo-spatial WM system reduced participants’ ability to benefit from gestural information.