DCasenet: An integrated pretrained deep neural network for detecting and classifying acoustic scenes and events

Jee Weon Jung, Hye Jin Shim, Ju Ho Kim, Ha Jin Yu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

20 Scopus citations

Abstract

Although acoustic scenes and events include many related tasks, their combined detection and classification have been scarcely investigated. We propose three architectures of deep neural networks that are integrated to simultaneously perform acoustic scene classification, audio tagging, and sound event detection. The first two architectures are inspired by human cognitive processes. The first architecture resembles the short-term perception for scene classification of adults, who can detect various sound events that are then used to identify the acoustic scene. The second architecture resembles the long-term learning of babies, being also the concept underlying self-supervised learning. Babies first observe the effects of abstract notions such as gravity and then learn specific tasks using such perceptions. The third architecture adds a few layers to the second one that solely perform a single task before its corresponding output layer. The aim is to build an integrated system that can serve as a pretrained model to perform the three abovementioned tasks. Experiments on three datasets demonstrate that the proposed architecture, called DcaseNet, can be either directly used for any of the tasks while providing suitable results or fine-tuned to improve the performance of one task. The code and pretrained DcaseNet weights are available at https://github.com/Jungjee/DcaseNet.

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages621-625
Number of pages5
ISBN (Electronic)9781728176055
DOIs
StatePublished - 2021
Event2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Virtual, Toronto, Canada
Duration: 6 Jun 202111 Jun 2021

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2021-June
ISSN (Print)1520-6149

Conference

Conference2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021
Country/TerritoryCanada
CityVirtual, Toronto
Period6/06/2111/06/21

Keywords

  • Acoustic scene classification
  • Audio tagging
  • Deep neural networks
  • Sound event detection

Fingerprint

Dive into the research topics of 'DCasenet: An integrated pretrained deep neural network for detecting and classifying acoustic scenes and events'. Together they form a unique fingerprint.

Cite this