About relationship between business text patterns and financial performance in corporate data

Bang Rae Lee, Jun Hwan Park, Leenam Kwon, Young Ho Moon, Young Ho Shin, Gyu Seok Kim, Han Joon Kim

Research output: Contribution to journalArticlepeer-review

11 Scopus citations


This study uses text and data mining to investigate the relationship between the text patterns of annual reports published by US listed companies and sales performance. Taking previous research a step further, although annual reports show only past and present financial information, analyzing text content can identify sentences or patterns that indicate the future business performance of a company. First, we examine the relation pattern between business risk factors and current business performance. For this purpose, we select companies belonging to two categories of US SIC (Standard Industry Classification) in the IT sector, 7370 and 7373, which include Twitter, Facebook, Google, Yahoo, etc. We manually collect sales and business risk information for a total of 54 companies that submitted an annual report (Form 10-K) for the last three years in these two categories. To establish a correlation between patterns of text and sales performance, four hypotheses were set and tested. To verify the hypotheses, statistical analysis of sales, statistical analysis of text sentences, sentiment analysis of sentences, clustering, dendrogram visualization, keyword extraction, and word-cloud visualization techniques are used. The results show that text length has some correlation with sales performance, and that patterns of frequently appearing words are correlated with the sales performance. However, a sentiment analysis indicates that the positive or negative tone of a report is not related to sales performance.

Original languageEnglish
Article number3
JournalJournal of Open Innovation: Technology, Market, and Complexity
Issue number1
StatePublished - 2 Feb 2018


  • 10-k
  • Business keyword
  • Corporate annual report
  • Correlation coefficient
  • Dendrogram
  • Financial performance
  • Hierarchical clustering
  • Keyword trends
  • Sentiment analysis
  • Text mining
  • Word cloud


Dive into the research topics of 'About relationship between business text patterns and financial performance in corporate data'. Together they form a unique fingerprint.

Cite this