TY - GEN
T1 - BROS
T2 - 36th AAAI Conference on Artificial Intelligence, AAAI 2022
AU - Hong, Teakgyu
AU - Kim, Donghyun
AU - Ji, Mingi
AU - Hwang, Wonseok
AU - Nam, Daehyun
AU - Park, Sungrae
N1 - Publisher Copyright:
Copyright © 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2022/6/30
Y1 - 2022/6/30
N2 - Key information extraction (KIE) from document images requires understanding the contextual and spatial semantics of texts in two-dimensional (2D) space. Many recent studies try to solve the task by developing pre-trained language models focusing on combining visual features from document images with texts and their layout. On the other hand, this paper tackles the problem by going back to the basic: effective combination of text and layout. Specifically, we propose a pre-trained language model, named BROS (BERT Relying On Spatiality), that encodes relative positions of texts in 2D space and learns from unlabeled documents with area-masking strategy. With this optimized training scheme for understanding texts in 2D space, BROS shows comparable or better performance compared to previous methods on four KIE benchmarks (FUNSD, SROIE, CORD, and SciTSR) without relying on visual features. This paper also reveals two real-world challenges in KIE tasks-(1) minimizing the error from incorrect text ordering and (2) efficient learning from fewer downstream examples-and demonstrates the superiority of BROS over previous methods.
AB - Key information extraction (KIE) from document images requires understanding the contextual and spatial semantics of texts in two-dimensional (2D) space. Many recent studies try to solve the task by developing pre-trained language models focusing on combining visual features from document images with texts and their layout. On the other hand, this paper tackles the problem by going back to the basic: effective combination of text and layout. Specifically, we propose a pre-trained language model, named BROS (BERT Relying On Spatiality), that encodes relative positions of texts in 2D space and learns from unlabeled documents with area-masking strategy. With this optimized training scheme for understanding texts in 2D space, BROS shows comparable or better performance compared to previous methods on four KIE benchmarks (FUNSD, SROIE, CORD, and SciTSR) without relying on visual features. This paper also reveals two real-world challenges in KIE tasks-(1) minimizing the error from incorrect text ordering and (2) efficient learning from fewer downstream examples-and demonstrates the superiority of BROS over previous methods.
UR - http://www.scopus.com/inward/record.url?scp=85135871929&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85135871929
T3 - Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
SP - 10767
EP - 10775
BT - AAAI-22 Technical Tracks 10
PB - Association for the Advancement of Artificial Intelligence
Y2 - 22 February 2022 through 1 March 2022
ER -