An Adaptive Multi-level Hashing Structure for Fast Approximate Similarity Search

  • Alexander Ocsa ICMC, University of Sao Paulo (USP), Brazil
  • Elaine P. M. de Sousa ICMC, University of Sao Paulo (USP), Brazil
Keywords: Access methods, LSH, Multidimentional Index, Similarity Information Retrieval

Abstract

Fast information retrieval is an essential task in data management, mainly due to the increasing availability of data. To address this problem, database researchers have developed indexing techniques to logically organize elements from large datasets in order to answer queries efficiently. In this context, an approximate similarity search algorithm known as Locality Sensitive Hashing (LSH) was recently proposed to query high-dimensional datasets with efficient computational time. The query cost of LSH is sub-linear on the dataset size. However, some drawbacks of LSH have not been solved entirely, such as the need of several hash indexes to achieve accurate query results and the critical dependency on data distribution and parameter values. Therefore, the LSH solution is unsuitable to many real applications involving dynamic datasets. In this paper, we propose an adaptive Multi-level hashing to support dynamic index construction efficiently. By employing a Multi-level scheme it is possible to dynamically adapt the data domain parameters and exploit the resulting multi-resolution index structure to speed up the query process. Experimental studies on large real and synthetic datasets show the similarity search performance of the proposed technique.
Published
2010-09-09
Section
Regular Articles