ConvTasNet-based anomalous noise separation for intelligent noise monitoring



Noise pollution has become a growing concern in public health. The availability of low-cost wireless acoustic sensor networks permits continuous monitoring of noise. However, real acoustic scenes are composed of irrelevant sources (anomalous noise) that overlap with monitored noise, causing biased evaluation and controversy. One classical scene is selected in our study. For road traffic noise assessment, other possible non-traffic noise (e.g., speech, thunder) should be excluded to obtain a reliable evaluation. Because anomalous noise is diverse, occasional, and unpredictable in real-life scenes, removing it from the mixture is a challenge. We explore a fully convolutional time-domain audio separation network (ConvTasNet) for arbitrary sound separation. ConvTasNet is trained by a large dataset, including environmental sounds, speech, and music over 150 hours. After training, the scale-invariant signal-to-distortion ratio (SI-SDR) is improved by 11.40 dB on average for an independent test dataset. ConvTasNet is next applied to anomalous noise separation of traffic noise scenes. We mix traffic noise and anomalous noise at random SNR between -10 dB to 0 dB. Separation is especially effective for salient and long-term anomalous noise, which smooth the overall sound pressure level curve over time. Results emphasize the importance of anomalous noise separation for reliable evaluation.