Efficient Processing of Analytical Queries Extended with Similarity Search Predicates over Images in Spark

Cristina Dutra de Aguiar Ciferri; Guilherme Muzzi da Rocha

Authors

Cristina Dutra de Aguiar Ciferri Universidade de São Paulo
Guilherme Muzzi da Rocha Universidade de São Paulo

Abstract

An image data warehousing extends a conventional data warehousing to also manipulate images represented by feature vectors and attributes for similarity search. A challenge that arises is the efficient processing of analytical queries extended with a similarity search predicate. These queries have a high computational cost since they require the processing of costly star join operations and distance calculations in the same setting. We consider applications that manage huge volumes of data, where the use of parallel and distributed data processing frameworks is needed. In this article, we introduce two methods to efficiently solve this challenge in Spark. BrOmnImg is based on the integration of the broadcast join and the Omni techniques for the processing of the star join operation and the distance calculations, respectively. BrOmnImgCF extends BrOmnImg by using the conventional predicate to further reduce the number of distance calculations. Compared with the closest method available in the literature, BrOmnImg reduced the time spent on query processing by up to about 65%. Compared with BrOmnImg, BrOmnImgCF improved the performance by up to about 54%.

Efficient Processing of Analytical Queries Extended with Similarity Search Predicates over Images in Spark

Authors

Abstract

Downloads

Published

Issue

Section

Developed By

Language

Information