KU-DMIS at EHRSQL 2024: Generating SQL query via question templatization in EHR

  • Hajung Kim
  • , Chanhwi Kim
  • , Hoonick Lee
  • , Kyochul Jang
  • , Jiwoo Lee
  • , Kyungjae Lee
  • , Gangwoo Kim
  • , Jaewoo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Transforming natural language questions into SQL queries is crucial for precise data retrieval from electronic health record (EHR) databases. A significant challenge in this process is detecting and rejecting unanswerable questions that request information beyond the database’s scope or exceed the system’s capabilities. In this paper, we introduce a novel text-to-SQL framework that robustly handles out-of-domain questions and verifies the generated queries with query execution. Our framework begins by standardizing the structure of questions into a templated format. We use a powerful large language model (LLM), fine-tuned GPT-3.5 with detailed prompts involving the table schemas of the EHR database system. Our experimental results demonstrate the effectiveness of our framework on the EHRSQL-2024 benchmark benchmark, a shared task in the ClinicalNLP workshop. Although a straightforward fine-tuning of GPT shows promising results on the development set, it struggled with the out-of-domain questions in the test set. With our framework, we improve our system’s adaptability and achieve competitive performances in the official leaderboard of the EHRSQL-2024 challenge.

Original languageEnglish
Title of host publicationClinicalNLP 2024 - 6th Workshop on Clinical Natural Language Processing, Proceedings of the Workshop
EditorsTristan Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, Danielle Bitterman
PublisherAssociation for Computational Linguistics (ACL)
Pages672-686
Number of pages15
ISBN (Electronic)9798891761094
StatePublished - 2024
Event6th Workshop on Clinical Natural Language Processing, ClinicalNLP 2024, held at NAACL 2024 - Mexico City, Mexico
Duration: 21 Jun 2024 → …

Publication series

NameClinicalNLP 2024 - 6th Workshop on Clinical Natural Language Processing, Proceedings of the Workshop

Conference

Conference6th Workshop on Clinical Natural Language Processing, ClinicalNLP 2024, held at NAACL 2024
Country/TerritoryMexico
CityMexico City
Period21/06/24 → …

Fingerprint

Dive into the research topics of 'KU-DMIS at EHRSQL 2024: Generating SQL query via question templatization in EHR'. Together they form a unique fingerprint.

Cite this