DBM-Tree: A Dynamic Metric Access Method Sensitive to Local Density Data
AbstractMetric Access Methods (MAM) are widely employed to speed up the evaluation of similarity queries, such as range and k-nearest neighbor queries. Most of the currently MAM available today achieve reasonable good performance of query evaluation through minimizing the number of disk accesses required to evaluate a query. The classical way of minimizing the number of disk accesses in hierarchical access methods is to keep the height of the structures very short. This class of structures is called height-balanced structures and are widely employed in Database Management Systems (DBMS). Slim-tree and M-tree are examples of MAM that are height-balanced structures which achieve very good performance both in terms of disk accesses and running time mainly because the height of the trees are kept very short. However, the performance of these two structures degrade very easy because the covering radius of nodes and the overlapping between nodes increase in a way that a large number of subtrees have to be analyzed when processing a query. This paper presents a new dynamic MAM called the DBM-tree (Density-Based Metric tree), which can minimize the overlap between "high-density" nodes by, in a controlled way, "relaxing" the height-balancing of the structure. Thus, the height of the tree is higher in denser regions in order to keep a tradeoff between breadth-searching and depth-searching. We also present a new optimization algorithm called Shrink, which improves the performance of already built DBM-trees by reorganizing the elements among their nodes. Experiments performed over several synthetic and real-world datasets showed that the DBM-tree is on average 50% faster than traditional MAM. The DBM-tree also reduces the number of distance calculations by up to 72% and disk accesses by up to 54% for answering queries. After performing the Shrink algorithm, query performance of the DBM-tree improves up to 30% regarding the number of disk accesses. In addition, the DBM-tree scales up well, exhibiting sub-linear performance when increasing the database size.