Wikipedia-based concept networks: A probabilistic approach

Han Joon Kim, Ga Hui Lee

Research output: Contribution to journalArticlepeer-review

Abstract

This paper propose a novel way of automatically building a concept network containing hierarchical 'ISA' and associative 'ASSO' relationships by probabilistically analyzing connected hyperlinks within the Wikipedia articles. The concept network can be built up by connecting four types of concept pairs: relational concept pairs, infobox concept pairs, category concept pairs and synopsis anchor concept pairs. The 'ISA' relationship of concept pairs can be determined by computing the subsumption probabilities between incoming links of upper concepts and outgoing links of lower concepts, which is internally represented as a partial ordering matrix. If the difference of subsumption probabilities for two concepts is smaller than a given threshold, then such a concept pair allows defining the 'ASSO' relationship. Our prototype system can produce a highly reasonable concept network that contains not only noun-level concepts but also proper noun-level concepts from the Wikipedia articles. We confirm that the concept network can be used as a knowledge base for improving various types of text mining applications.

Original languageEnglish
Pages (from-to)7387-7397
Number of pages11
JournalInformation
Volume20
Issue number10
StatePublished - 2017

Keywords

  • Computational linguistics
  • Concept
  • Probability
  • Wikipedia

Fingerprint

Dive into the research topics of 'Wikipedia-based concept networks: A probabilistic approach'. Together they form a unique fingerprint.

Cite this