Whisper를 활용한 위급 상황 탐지 및 음성 인식 성능 향상

Translated title of the contribution: Emergency situation detection and speech recognition enhancement utilizing Whisper

Research output: Contribution to journalArticlepeer-review

Abstract

This study proposes a model designed to promptly detect and report emergency situations that may occur in single-person or elderly households. To achieve this, we modified the model architecture suggested in the Whisper Audio Tagging (Whisper-AT) paper, based on the Whisper model, to enable both classification of emergency situations and prediction of their occurrence times. Additionally, Whisper and the classification model were fine-tuned jointly to perform Automatic Speech Recognition (ASR) training on emergency situation data. As a result, the proposed method achieved an accuracy of 97.70 % in the classification of 16 types of emergency situations. Furthermore, compared to the approach of solely fine-tuning Whisper, integrating emergency situation classification during training improved ASR performance, reducing the Character Error Rate (CER) from 12.03 to 10.11. The proposed model is capable of detecting emergency situations with a low latency of only 4.2 s.

Translated title of the contributionEmergency situation detection and speech recognition enhancement utilizing Whisper
Original languageKorean
Pages (from-to)132-143
Number of pages12
JournalJournal of the Acoustical Society of Korea
Volume44
Issue number2
DOIs
StatePublished - 2025

Keywords

  • Acoustic detection
  • Deep learning
  • Emergency situation
  • Speech recognition

Fingerprint

Dive into the research topics of 'Emergency situation detection and speech recognition enhancement utilizing Whisper'. Together they form a unique fingerprint.

Cite this