Enhancing Audio Deepfake Detection by Improving Representation Similarity of Bonafide Speech

  • Seung Bin Kim
  • , Hyun Seo Shin
  • , Jungwoo Heo
  • , Chan Yeong Lim
  • , Kyo Won Koo
  • , Jisoo Son
  • , Sanghyun Hong
  • , Souhwan Jung
  • , Ha Jin Yu

Research output: Contribution to journalConference articlepeer-review

Abstract

The key to audio deepfake detection is distinguishing bonafide speech from carefully generated spoofed speech. The more distinguishable they are, the better and more generalizable the detection becomes. In this work, we propose a novel approach to enhance this distinguishability in the latent space. Inspired by one-class classification, we formulate an objective function that encourages the contraction of bonafide samples while dispersing fake speech samples during training. Our objective consists of two key components: Bonafide-Pair Learning (BPL) loss and an Extended One-Class Softmax (EOC-S) loss. The BPL reduces intra-class variance by aligning the embeddings of augmented bonafide pairs, while the EOC-S leverages Adam-based centroid updates and margin constraints to reinforce separability from spoofed data. Experimental results on ASVspoof datasets demonstrate that our proposed approach enhances detection performance across diverse attack scenarios.

Original languageEnglish
Pages (from-to)2250-2254
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
StatePublished - 2025
Event26th Interspeech Conference 2025 - Rotterdam, Netherlands
Duration: 17 Aug 202521 Aug 2025

Keywords

  • anti-spoofing
  • audio deepfake detection
  • contrastive learning
  • one-class classification

Fingerprint

Dive into the research topics of 'Enhancing Audio Deepfake Detection by Improving Representation Similarity of Bonafide Speech'. Together they form a unique fingerprint.

Cite this