Auditory scene analysis as Bayesian inference in sound source models
- Maddie Cusimano, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States
- Luke Hewitt, MIT, Cambridge, Massachusetts, United States
- Josh Tenenbaum, Brain and Cognitive Sciences, MIT, Cambridge, Massachusetts, United States
- Josh McDermott, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States
AbstractInferring individual sound sources from the mixture of soundwaves that enters our ear is a central problem in auditory perception, termed auditory scene analysis (ASA). A diverse set of ASA illusions suggests general principles underlying perceptual organization. However, most explanations for these illusions remain intuitive or are narrowly focused, without formal models that predict perceived sound sources from the acoustic waveform. Whether ASA phenomena can be explained by a small set of principles is therefore unclear. We present a Bayesian model based on simple acoustic sources, for which a neural network is used to guide Markov chain Monte Carlo inference. Given a sound waveform, our system infers the number of sources present, parameters defining each source, and the sound produced by each source. This model qualitatively accounts for perceptual judgments on a variety of classic ASA illusions, and can in some cases infer perceptually valid sources from simple audio recordings.
Return to previous page