TY - JOUR
T1 - About relationship between business text patterns and financial performance in corporate data
AU - Lee, Bang Rae
AU - Park, Jun Hwan
AU - Kwon, Leenam
AU - Moon, Young Ho
AU - Shin, Young Ho
AU - Kim, Gyu Seok
AU - Kim, Han Joon
N1 - Publisher Copyright:
© The Author(s).2018.
PY - 2018/2/2
Y1 - 2018/2/2
N2 - This study uses text and data mining to investigate the relationship between the text patterns of annual reports published by US listed companies and sales performance. Taking previous research a step further, although annual reports show only past and present financial information, analyzing text content can identify sentences or patterns that indicate the future business performance of a company. First, we examine the relation pattern between business risk factors and current business performance. For this purpose, we select companies belonging to two categories of US SIC (Standard Industry Classification) in the IT sector, 7370 and 7373, which include Twitter, Facebook, Google, Yahoo, etc. We manually collect sales and business risk information for a total of 54 companies that submitted an annual report (Form 10-K) for the last three years in these two categories. To establish a correlation between patterns of text and sales performance, four hypotheses were set and tested. To verify the hypotheses, statistical analysis of sales, statistical analysis of text sentences, sentiment analysis of sentences, clustering, dendrogram visualization, keyword extraction, and word-cloud visualization techniques are used. The results show that text length has some correlation with sales performance, and that patterns of frequently appearing words are correlated with the sales performance. However, a sentiment analysis indicates that the positive or negative tone of a report is not related to sales performance.
AB - This study uses text and data mining to investigate the relationship between the text patterns of annual reports published by US listed companies and sales performance. Taking previous research a step further, although annual reports show only past and present financial information, analyzing text content can identify sentences or patterns that indicate the future business performance of a company. First, we examine the relation pattern between business risk factors and current business performance. For this purpose, we select companies belonging to two categories of US SIC (Standard Industry Classification) in the IT sector, 7370 and 7373, which include Twitter, Facebook, Google, Yahoo, etc. We manually collect sales and business risk information for a total of 54 companies that submitted an annual report (Form 10-K) for the last three years in these two categories. To establish a correlation between patterns of text and sales performance, four hypotheses were set and tested. To verify the hypotheses, statistical analysis of sales, statistical analysis of text sentences, sentiment analysis of sentences, clustering, dendrogram visualization, keyword extraction, and word-cloud visualization techniques are used. The results show that text length has some correlation with sales performance, and that patterns of frequently appearing words are correlated with the sales performance. However, a sentiment analysis indicates that the positive or negative tone of a report is not related to sales performance.
KW - 10-k
KW - Business keyword
KW - Corporate annual report
KW - Correlation coefficient
KW - Dendrogram
KW - Financial performance
KW - Hierarchical clustering
KW - Keyword trends
KW - Sentiment analysis
KW - Text mining
KW - Word cloud
UR - http://www.scopus.com/inward/record.url?scp=85045750276&partnerID=8YFLogxK
U2 - 10.1186/s40852-018-0080-9
DO - 10.1186/s40852-018-0080-9
M3 - Article
AN - SCOPUS:85045750276
SN - 2199-8531
VL - 4
JO - Journal of Open Innovation: Technology, Market, and Complexity
JF - Journal of Open Innovation: Technology, Market, and Complexity
IS - 1
M1 - 3
ER -