TY - GEN
T1 - Data-efficient End-to-end Information Extraction for Statistical Legal Analysis
AU - Hwang, Wonseok
AU - Eom, Saehee
AU - Lee, Hanuhl
AU - Park, Hai Jin
AU - Seo, Minjoon
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - Legal practitioners often face a vast amount of documents. Lawyers, for instance, search for appropriate precedents favorable to their clients, while the number of legal precedents is ever-growing. Although legal search engines can assist finding individual target documents and narrowing down the number of candidates, retrieved information is often presented as unstructured text and users have to examine each document thoroughly which could lead to information overloading. This also makes their statistical analysis challenging. Here, we present an end-to-end information extraction (IE) system for legal documents. By formulating IE as a generation task, our system can be easily applied to various tasks without domain-specific engineering effort. The experimental results of four IE tasks on Korean precedents shows that our IE system can achieve competent scores (-2.3 on average) compared to the rule-based baseline with as few as 50 training examples per task and higher score (+5.4 on average) with 200 examples. Finally, our statistical analysis on two case categories - drunk driving and fraud - with 35k precedents reveals the resulting structured information from our IE system faithfully reflects the macroscopic features of Korean legal system.
AB - Legal practitioners often face a vast amount of documents. Lawyers, for instance, search for appropriate precedents favorable to their clients, while the number of legal precedents is ever-growing. Although legal search engines can assist finding individual target documents and narrowing down the number of candidates, retrieved information is often presented as unstructured text and users have to examine each document thoroughly which could lead to information overloading. This also makes their statistical analysis challenging. Here, we present an end-to-end information extraction (IE) system for legal documents. By formulating IE as a generation task, our system can be easily applied to various tasks without domain-specific engineering effort. The experimental results of four IE tasks on Korean precedents shows that our IE system can achieve competent scores (-2.3 on average) compared to the rule-based baseline with as few as 50 training examples per task and higher score (+5.4 on average) with 200 examples. Finally, our statistical analysis on two case categories - drunk driving and fraud - with 35k precedents reveals the resulting structured information from our IE system faithfully reflects the macroscopic features of Korean legal system.
UR - http://www.scopus.com/inward/record.url?scp=85154577068&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85154577068
T3 - NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop
SP - 143
EP - 152
BT - NLLP 2022 - Natural Legal Language Processing Workshop 2022, Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
T2 - 4th Natural Legal Language Processing Workshop, NLLP 2022, co-located with the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
Y2 - 8 December 2022
ER -