While conversationally common, sarcasm presents identification challenges in writing. For example, "Oh wow!" may be expressed both sincerely and ironically. Research suggests that interpretation of ambiguous sarcasm may be context dependent, though precisely how visual and auditory cues contribute to comprehension remains unclear. To explore this, we recorded an actor performing 45 phrases sarcastically and sincerely. In two calibration studies, we selected 16 ambiguous phrases that were identifiable with all cues available. In the main study, participants classified these phrases as sarcastic or sincere in three conditions: audio-video (N=32), audio-only (N=29), or video-only (N=26). Performance was high (91%) with both cues available. Intriguingly, the drops in performance with audio only (82%, p<.001) and video only (80%, p<.0001) were small and not significantly different from one another (p>.5) suggesting auditory and visual cues contribute similarly to sarcasm comprehension. We replicated these results in a separate within-subjects experiment (N=77).