Physical Inference for Object Perception in Complex Auditory Scenes

AbstractPerception is often modeled as a pattern recognition process, with data-driven mechanisms that learn the statistics of experience without regard for the causal processes underlying the data we perceive. But perception also involves casual inference, and indeed physical inference. We introduce a novel task paradigm that allows us to study the mechanisms of physical causal inference at work in auditory perception and joint visual-auditory scene understanding. We call this task the ``box-shaking game''; people have to figure out what is in a cardboard box, in particular how many objects of a certain type are in the box, just by listening to the sounds of the box being shaken. We present three experiments showing that even naive observers readily perform this task, that they do so using information beyond the statistics of sound textures (potentially involving representations of events and dynamics), and that they benefit from cross-modal visual data that reveals the box motion but that provides no direct information about the box contents. The results suggest that listeners have an internal causal model of object interactions and use it to infer the physical events giving rise to sound.


Return to previous page