Deciphering the Transformation of Sounds into Meaning: Insights from Disentangling Intermediate Representations in Sound-to-Event DNNs
39 Pages Posted: 9 Oct 2024
Date Written: June 25, 2024
Abstract
Neural representations estimated from functional MRI (fMRI) responses to natural sounds in non-primary auditory cortical areas resemble those in intermediate layers of deep neural networks (DNNs) trained to recognize sounds. However, the nature of these representations remains poorly understood. In the current study, a convolutional DNN (Yamnet), pre-trained to map sound spectrograms to semantic categories, is used as a computer simulation of the human brain's processing of natural sounds. A novel sound data set is introduced and employed to test the hypothesis that sound-to-event DNNs represent basic mechanisms of sound generation (here, human actions) and physical properties of the sources (here, object materials) in their intermediate layers. Systematic changes to those latent representations are made with the help of a disentangling flow model. The manipulations are shown to cause a predictable effect on DNN's semantic output. By demonstrating this mechanism in silico, the current study paves the way for neuroscientific experiments aiming to verify it in vivo. Code available at https://github.com/TimHenry1995/LatentAudio.
Keywords: Machine Learning, Latent Space Disentanglement, Yamnet, Auditory Processing, Invertible Neural Network, Normalizing Flow
Suggested Citation: Suggested Citation