Analyzing Missing Data in Metric Spaces
Keywords:Distance Concentration, Data Distribution, Missing attribute values, Similarity Search
Similarity search in multimedia databases has challenged researchers for the last two decades, whose studies resulted in several achievements. However, searching in incomplete databases, i.e., databases with missing attribute values, has been less studied so far.
In this article, we present a set of experimental analyzes that evaluate the impact of missing data on the query performance in metric spaces. The results show that missing data cause severe skew in the metric space with only 2% of missing values and drastically affect the performance of the metric indexing techniques. Interestingly, our analyzes, confirmed by the presented experiments, show that data missing not at random are more prone of skew and raise the conditions of distance concentration phenomenon where the distances between pairs of elements in the space become homogeneous. Thus, this study provides an understanding of the issues involved with metric spaces when indexing incomplete databases and gives ground for research that supports the development of advanced metric access methods with handling of missing attribute values.