Abstract
The key to audio deepfake detection is distinguishing bonafide speech from carefully generated spoofed speech. The more distinguishable they are, the better and more generalizable the detection becomes. In this work, we propose a novel approach to enhance this distinguishability in the latent space. Inspired by one-class classification, we formulate an objective function that encourages the contraction of bonafide samples while dispersing fake speech samples during training. Our objective consists of two key components: Bonafide-Pair Learning (BPL) loss and an Extended One-Class Softmax (EOC-S) loss. The BPL reduces intra-class variance by aligning the embeddings of augmented bonafide pairs, while the EOC-S leverages Adam-based centroid updates and margin constraints to reinforce separability from spoofed data. Experimental results on ASVspoof datasets demonstrate that our proposed approach enhances detection performance across diverse attack scenarios.
| Original language | English |
|---|---|
| Pages (from-to) | 2250-2254 |
| Number of pages | 5 |
| Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| DOIs | |
| State | Published - 2025 |
| Event | 26th Interspeech Conference 2025 - Rotterdam, Netherlands Duration: 17 Aug 2025 → 21 Aug 2025 |
Keywords
- anti-spoofing
- audio deepfake detection
- contrastive learning
- one-class classification
Fingerprint
Dive into the research topics of 'Enhancing Audio Deepfake Detection by Improving Representation Similarity of Bonafide Speech'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver