Self-supervised pre-training with acoustic configurations for replay spoofing detection

Hye Jin Shim, Hee Soo Heo, Jee Weon Jung, Ha Jin Yu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Constructing a dataset for replay spoofing detection requires a physical process of playing an utterance and re-recording it, presenting a challenge to the collection of large-scale datasets. In this study, we propose a self-supervised framework for pre-training acoustic configurations using datasets published for other tasks, such as speaker verification. Here, acoustic configurations refer to the environmental factors generated during the process of voice recording but not the voice itself, including microphone types, place and ambient noise levels. Specifically, we select pairs of segments from utterances and train deep neural networks to determine whether the acoustic configurations of the two segments are identical. We validate the effectiveness of the proposed method based on the ASVspoof 2019 physical access dataset utilizing two well-performing systems. The experimental results demonstrate that the proposed method outperforms the baseline approach by 30%.

Original languageEnglish
Title of host publicationInterspeech 2020
PublisherInternational Speech Communication Association
Pages1091-1095
Number of pages5
ISBN (Print)9781713820697
DOIs
StatePublished - 2020
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: 25 Oct 202029 Oct 2020

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2020-October
ISSN (Print)2308-457X
ISSN (Electronic)1990-9772

Conference

Conference21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Country/TerritoryChina
CityShanghai
Period25/10/2029/10/20

Keywords

  • ASVspoof
  • Replay attack detection
  • Self-supervised learning
  • Speaker verification
  • Transfer learning

Fingerprint

Dive into the research topics of 'Self-supervised pre-training with acoustic configurations for replay spoofing detection'. Together they form a unique fingerprint.

Cite this