Capturing discriminative information using a deep architecture in acoustic scene classification

Hye Jin Shim, Jee Weon Jung, Ju Ho Kim, Ha Jin Yu

Research output: Contribution to journalArticlepeer-review

4 Scopus citations


Acoustic scene classification contains frequently misclassified pairs of classes that share many common acoustic properties. Specific details can provide vital clues for distinguishing such pairs of classes. However, these details are generally not noticeable and are hard to generalize for different data distributions. In this study, we investigate various methods for capturing discriminative information and simultaneously improve the generalization ability. We adopt a max feature map method that replaces conventional non-linear activation functions in deep neural networks; therefore, we apply an element-wise comparison between the different filters of a convolution layer’s output. Two data augmentation methods and two deep architecture modules are further explored to reduce overfitting and sustain the system’s discriminative power. Various experiments are conducted using the “detection and classification of acoustic scenes and events 2020 task1-a” dataset to validate the proposed methods. Our results show that the proposed system consistently outperforms the baseline, where the proposed system demonstrates an accuracy of 70.4% compared to the baseline at 65.1%.

Original languageEnglish
Article number8361
JournalApplied Sciences (Switzerland)
Issue number18
StatePublished - Sep 2021


  • Acoustic scene classification
  • Deep neural networks
  • Light convolutional neural networks


Dive into the research topics of 'Capturing discriminative information using a deep architecture in acoustic scene classification'. Together they form a unique fingerprint.

Cite this