BPCNN: Bi-Point Input for Convolutional Neural Networks in Speaker Spoofing Detection

Sunghyun Yoon, Ha Jin Yu

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

We propose a method, called bi-point input, for convolutional neural networks (CNNs) that handle variable-length input features (e.g., speech utterances). Feeding input features into a CNN in a mini-batch unit requires that all features in each mini-batch have the same shape. A set of variable-length features cannot be directly fed into a CNN because they commonly have different lengths. Feature segmentation is a dominant method for CNNs to handle variable-length features, where each feature is decomposed into fixed-length segments. A CNN receives one segment as an input at one time. However, a CNN can consider only the information of one segment at one time, not the entire feature. This drawback limits the amount of information available at one time and consequently results in suboptimal solutions. Our proposed method alleviates this problem by increasing the amount of information available at one time. With the proposed method, a CNN receives a pair of two segments obtained from a feature as an input at one time. Each of the two segments generally covers different time ranges and therefore has different information. We also propose various combination methods and provide a rough guidance to set a proper segment length without evaluation. We evaluate the proposed method on the spoofing detection tasks using the ASVspoof 2019 database under various conditions. The experimental results reveal that the proposed method reduces the relative equal error rate (EER) by approximately 17.2% and 43.8% on average for the logical access (LA) and physical access (PA) tasks, respectively.

Original languageEnglish
Article number4483
JournalSensors
Volume22
Issue number12
DOIs
StatePublished - 1 Jun 2022

Keywords

  • bi-point input
  • bidirectional feature segmentation
  • convolutional neural network (CNN)
  • spoofing detection
  • variable-length features

Fingerprint

Dive into the research topics of 'BPCNN: Bi-Point Input for Convolutional Neural Networks in Speaker Spoofing Detection'. Together they form a unique fingerprint.

Cite this