A novel algorithm for scalable k -nearest neighbour graph construction

Youngki Park, Heasoo Hwang, Sang Goo Lee

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Finding the k-nearest neighbours of every node in a dataset is one of the most important data operations with wide application in various areas such as recommendation and information retrieval. However, a major challenge is that the execution time of existing approaches grows rapidly as the number of nodes or dimensions increases. In this paper, we present greedy filtering, an efficient and scalable algorithm for finding an approximate k-nearest neighbour graph. It selects a fixed number of nodes as candidates for every node by filtering out node pairs that do not have any matching dimensions with large values. Greedy filtering achieves consistent approximation accuracy across nodes in linear execution time. We also present a faster version of greedy filtering that uses inverted indices on the node prefixes. Through theoretical analysis, we show that greedy filtering is effective for datasets whose features have Zipfian distribution, a characteristic observed in majority of large datasets. We also conduct extensive comparative experiments against (a) three state-of-the-art algorithms, and (b) three algorithms in related research domains. Our experimental results show that greedy filtering consistently outperforms other algorithms in various types of high-dimensional datasets.

Original languageEnglish
Pages (from-to)274-288
Number of pages15
JournalJournal of Information Science
Volume42
Issue number2
DOIs
StatePublished - 1 Apr 2016

Keywords

  • collaborative filtering
  • k-nearest neighbour graph construction
  • k-nearest neighbour search

Fingerprint

Dive into the research topics of 'A novel algorithm for scalable k -nearest neighbour graph construction'. Together they form a unique fingerprint.

Cite this