TY - JOUR
T1 - Query-specific signature selection for efficient k -nearest neighbour approximation
AU - Park, Youngki
AU - Hwang, Heasoo
AU - Lee, Sang Goo
N1 - Publisher Copyright:
© The Author(s) 2016.
PY - 2017/8/1
Y1 - 2017/8/1
N2 - Finding k-nearest neighbours (k-NN) is one of the most important primitives of many applications such as search engines and recommendation systems. However, its computational cost is extremely high when searching for k-NN points in a huge collection of high-dimensional points. Locality-sensitive hashing (LSH) has been introduced for an efficient k-NN approximation, but none of the existing LSH approaches clearly outperforms others. We propose a novel LSH approach, Signature Selection LSH (S2LSH), which finds approximate k-NN points very efficiently in various datasets. It first constructs a large pool of highly diversified signature regions with various sizes. Given a query point, it dynamically generates a query-specific signature region by merging highly effective signature regions selected from the signature pool. We also suggest S2LSH-M, a variant of S2LSH, which processes multiple queries more efficiently by using query-specific features and optimization techniques. Extensive experiments show the performance superiority of our approaches in diverse settings.
AB - Finding k-nearest neighbours (k-NN) is one of the most important primitives of many applications such as search engines and recommendation systems. However, its computational cost is extremely high when searching for k-NN points in a huge collection of high-dimensional points. Locality-sensitive hashing (LSH) has been introduced for an efficient k-NN approximation, but none of the existing LSH approaches clearly outperforms others. We propose a novel LSH approach, Signature Selection LSH (S2LSH), which finds approximate k-NN points very efficiently in various datasets. It first constructs a large pool of highly diversified signature regions with various sizes. Given a query point, it dynamically generates a query-specific signature region by merging highly effective signature regions selected from the signature pool. We also suggest S2LSH-M, a variant of S2LSH, which processes multiple queries more efficiently by using query-specific features and optimization techniques. Extensive experiments show the performance superiority of our approaches in diverse settings.
KW - k -nearest neighbour graph
KW - k -nearest neighbour search
KW - locality-sensitive hashing
UR - http://www.scopus.com/inward/record.url?scp=85021922490&partnerID=8YFLogxK
U2 - 10.1177/0165551516644176
DO - 10.1177/0165551516644176
M3 - Article
AN - SCOPUS:85021922490
SN - 0165-5515
VL - 43
SP - 440
EP - 457
JO - Journal of Information Science
JF - Journal of Information Science
IS - 4
ER -