When we listen to music, we can mentally control how we perceive the beat. This ability is thought to be subserved by sensorimotor imagery, having top-down effects on attentional-allocation and perception. Here, we examine whether imagined “up and down” gestures can support an internal generation of metrical accent in rhythmic sequences. We also examine how this type of motor imagery interacts with either metrically congruent or incongruent auditory imagery. This is explored using EEG with a frequency-tagging approach, quantifying the strength of metrical accent with the amplitude of beat-related SSEPs. Gesture supports our ability to think and learn by fostering an alignment between sensorimotor representations and more abstract conceptual structure. Therefore, the imagined gestures may act as a bridge between perceptual and action-oriented understandings of metrical structure and the more abstract conceptual ones that musicians struggle with in their training. These imagery strategies may then be beneficial to music education.