Sound Event Detection with Soft Labels using Self-Attention Mechanisms for Global Scene Feature Extraction

Published in Detection and Classification of Acoustic Scenes and Events 2023, 2023

Authors: Nhan Tri-Do, Param Biyani, Zhang Yuxuan, Andrew Koh Jin Jie, Chng Eng Siong

Github Repo

This paper presents our approach to Task 4b of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge, which focuses on Sound Event Detection with Soft Labels. Our proposed method builds upon a CRNN backbone model and leverages the benefits of data augmentation techniques to improve model robustness. Furthermore, we introduce self-attention mechanisms to capture global context information and enhance the model's ability to predict soft label segments more accurately. Our experiments demonstrate that incorporating soft labels and self-attention mechanisms result in significant performance gains compared to traditional methods on data varying across different scenarios.