Voice Attacker Leveraging Multi-Head Factorized Attentive Reconstructor and Gradient Reversal for Random Prosody Anonymization
Published in Paper, 2025
This is the report for Team 04-SpeechWorld in the First VoicePrivacy Attacker Challenge. The attack methods aimed to verify speakers anonymized by two main anonymization systems: STTTS-based and NAC-based. The characteristics of the original audio were reconstructed using speaker embeddings from WavLM-Ecapa and codecs for the NAC system. Additionally, gradient reversal layers were incorporated to eliminate dependencies on prosody features that were randomly simulated by the anonymization models. The results show that the proposed attackers achieved a relative improvement of 26.49% in Equal Error Rate (EER) compared to the baseline, reducing it from 43.22% to 31.77% for the T12-5 attacker system.