Acoustic scene classification using teacher-student learning with soft-labels

Hee Soo Heo, Jee Weon Jung, Hye Jin Shim, Ha Jin Yu

Research output: Contribution to journalConference articlepeer-review

11 Scopus citations


Acoustic scene classification identifies an input segment into one of the pre-defined classes using spectral information. The spectral information of acoustic scenes may not be mutually exclusive due to common acoustic properties across different classes, such as babble noises included in both airports and shopping malls. However, conventional training procedure based on one-hot labels does not consider the similarities between different acoustic scenes. We exploit teacher-student learning with the purpose to derive soft-labels that consider common acoustic properties among different acoustic scenes. In teacher-student learning, the teacher network produces soft-labels, based on which the student network is trained. We investigate various methods to extract soft-labels that better represent similarities across different scenes. Such attempts include extracting soft-labels from multiple audio segments that are defined as an identical acoustic scene. Experimental results demonstrate the potential of our approach, showing a classification accuracy of 77.36 % on the DCASE 2018 task 1 validation set.


  • Acoustic scene classification
  • Deep neural networks
  • Knowledge distillation
  • Teacher-student learning


Dive into the research topics of 'Acoustic scene classification using teacher-student learning with soft-labels'. Together they form a unique fingerprint.

Cite this