TY - GEN
T1 - A fast k-nearest neighbor search using query-specific signature selection
AU - Park, Youngki
AU - Hwang, Heasoo
AU - Lee, Sang Goo
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/10/17
Y1 - 2015/10/17
N2 - k-nearest neighbor (k-NN) search aims at finding k points nearest to a query point in a given dataset. k-NN search is important in various applications, but it becomes extremely expensive in a high-dimensional large dataset. To address this performance issue, locality-sensitive hashing (LSH) is suggested as a method of probabilistic dimension reduction while preserving the relative distances between points. However, the performance of existing LSH schemes is still inconsistent, requiring a large amount of search time in some datasets while the k-NN approximation accuracy is low. In this paper, we target on improving the performance of k-NN search and achieving a consistent k-NN search that performs well in various datasets. In this regard, we propose a novel LSH scheme called Signature Selection LSH (S2LSH). First, we generate a highly diversified signature pool containing signature regions of various sizes and shapes. Then, for a given query point, we rank signature regions of the query and select points in the highly ranked signature regions as k-NN candidates of the query. Extensive experiments show that our approach consistently outperforms the state-of-the-art LSH schemes.
AB - k-nearest neighbor (k-NN) search aims at finding k points nearest to a query point in a given dataset. k-NN search is important in various applications, but it becomes extremely expensive in a high-dimensional large dataset. To address this performance issue, locality-sensitive hashing (LSH) is suggested as a method of probabilistic dimension reduction while preserving the relative distances between points. However, the performance of existing LSH schemes is still inconsistent, requiring a large amount of search time in some datasets while the k-NN approximation accuracy is low. In this paper, we target on improving the performance of k-NN search and achieving a consistent k-NN search that performs well in various datasets. In this regard, we propose a novel LSH scheme called Signature Selection LSH (S2LSH). First, we generate a highly diversified signature pool containing signature regions of various sizes and shapes. Then, for a given query point, we rank signature regions of the query and select points in the highly ranked signature regions as k-NN candidates of the query. Extensive experiments show that our approach consistently outperforms the state-of-the-art LSH schemes.
KW - K-nearest neighbor search
KW - Locality sensitive hashing
UR - http://www.scopus.com/inward/record.url?scp=84958233206&partnerID=8YFLogxK
U2 - 10.1145/2806416.2806632
DO - 10.1145/2806416.2806632
M3 - Conference contribution
AN - SCOPUS:84958233206
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1883
EP - 1886
BT - CIKM 2015 - Proceedings of the 24th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 24th ACM International Conference on Information and Knowledge Management, CIKM 2015
Y2 - 19 October 2015 through 23 October 2015
ER -