Deep learning-based single point sound source localization in spherical microphone arrays



In this contribution, we present a high-resolution and accurate sound source localization via a deep learning framework. While the spherical microphone arrays can be utilized to produce omnidirectional beams, it is widely known that the conventional spherical harmonics beamforming (SHB) has a limit in terms of its spatial resolution. To accomplish the sound source localization with high resolution and preciseness, we propose a convolutional neural network (CNN)-based source localization model as a way of a data-driven approach. We first present a novel way to define the source distribution map that can spatially represent the single point source’s position and strength. By utilizing paired dataset with spherical harmonics beamforming maps and our proposed high-resolution maps, we develop a fully convolutional neural network based on the encoder-decoder structure for establishing the image-to-image transformation model. Both quantitative and qualitative results are demonstrated to evaluate the powerfulness of the proposed data-driven source localization model.