Investigating the influence of microphone mismatch for acoustic traffic monitoring

-

Abstract

The development of robust acoustic traffic monitoring (ATM) algorithms based on machine learning faces several challenges. The biggest challenge is to collect and annotate large high-quality datasets for algorithm training and evaluation. Such a dataset must reflect a broad variety of vehicle sounds since their emitted acoustic noise patterns depend on a variety of factors such as engine noises at different speeds and road conditions. Additionally, the characteristics of the employed microphones have a strong influence on the data. If microphones with different directionality and frequency responses are used during the model development and the final deployment phase, a data mismatch is caused, which can have a deteriorating effect on the performance of machine learning algorithms. In this paper, the influence of mismatched recording locations and microphone characteristics on the proposed ATM system is investigated. To evaluate these effects, we implement state-of-the-art convolutional neural networks to detect passing vehicles, classify their type, and estimate their speed and direction of movement. The evaluated models perform well on low- and high-quality recordings at different locations when using the same recording device for training and testing. However, the results indicate that microphone mismatch causes several issues, which need to be carefully addressed.