TY - JOUR
T1 - Auto-summarization of the texts of construction dispute precedents
AU - Seo, Wonkyoung
AU - Kang, Youngcheol
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/5
Y1 - 2025/5
N2 - Advancements in text analysis are driving the adoption of document automation in the construction industry. Despite significant financial losses from construction disputes, efforts to automate document processes in this domain remain limited. Effective dispute management requires the rapid identification of relevant precedent cases to help practitioners respond appropriately. However, the complexity and length of such texts pose challenges to quick comprehension. This study presents a natural language processing (NLP) model for automatically summarizing construction dispute case texts. The model was tested on 300 U.S. construction dispute cases sourced from the Westlaw database. Various NLP models, including large language models (LLMs) such as OpenAI's models and BERT, were evaluated, achieving an F-score of approximately 0.39 based on the ROUGE-L metric. To accomplish the domain-specific objective of summarizing construction precedent cases, this study explored multiple approaches, including data preprocessing, fine-tuning, and model engineering using LangChain. Furthermore, this study aims to develop models for summarizing legal precedent texts and investigates methods to capture the distinctive characteristics of construction dispute data compared to general legal texts. The models were validated through domain experts who recognize the unique nature of construction disputes, enhancing the reliability of the evaluation process. The findings contribute significantly to the automation of construction dispute document summarization, enabling practitioners to manage such cases more efficiently.
AB - Advancements in text analysis are driving the adoption of document automation in the construction industry. Despite significant financial losses from construction disputes, efforts to automate document processes in this domain remain limited. Effective dispute management requires the rapid identification of relevant precedent cases to help practitioners respond appropriately. However, the complexity and length of such texts pose challenges to quick comprehension. This study presents a natural language processing (NLP) model for automatically summarizing construction dispute case texts. The model was tested on 300 U.S. construction dispute cases sourced from the Westlaw database. Various NLP models, including large language models (LLMs) such as OpenAI's models and BERT, were evaluated, achieving an F-score of approximately 0.39 based on the ROUGE-L metric. To accomplish the domain-specific objective of summarizing construction precedent cases, this study explored multiple approaches, including data preprocessing, fine-tuning, and model engineering using LangChain. Furthermore, this study aims to develop models for summarizing legal precedent texts and investigates methods to capture the distinctive characteristics of construction dispute data compared to general legal texts. The models were validated through domain experts who recognize the unique nature of construction disputes, enhancing the reliability of the evaluation process. The findings contribute significantly to the automation of construction dispute document summarization, enabling practitioners to manage such cases more efficiently.
KW - Construction dispute
KW - Dispute precedent
KW - Large language model
KW - Natural language processing
KW - Text summarization
UR - https://www.scopus.com/pages/publications/105003204138
U2 - 10.1016/j.aei.2025.103381
DO - 10.1016/j.aei.2025.103381
M3 - Article
AN - SCOPUS:105003204138
SN - 1474-0346
VL - 65
JO - Advanced Engineering Informatics
JF - Advanced Engineering Informatics
M1 - 103381
ER -