Using Pivots to Speed-Up k-Medoids Clustering

  • Adriano Arantes Paterlini ICMC - USP
  • Mario A. Nascimento University of Alberta
  • Caetano Traina Jr. ICMC - USP
Keywords: Clustering, k-medoids, Metric space, Pivot-based technique

Abstract

Clustering is a key technique within the KDD process, with k-means, and the more general k-medoids, being well-known incremental partition-based clustering algorithms. A fundamental issue within this class of algorithms is to find an initial set of medians (or medoids) that improves the efficiency of the algorithms (e.g., accelerating its convergence to a solution), at the same time that it improves its effectiveness (e.g., finding more meaningful clusters). Thus, in this article we aim at providing a technique that, given a set of elements, quickly finds a very small number of elements as medoid candidates for this set, allowing to improve both the efficiency and effectiveness of existing clustering algorithms. We target the class of k-medoids algorithms in general, and propose a technique that selects a well-positioned subset of central elements to serve as the initial set of medoids for the clustering process. Our technique leads to a substantially smaller amount of distance calculations, thus improving the algorithm's efficiency when compared to existing methods, without sacrificing effectiveness.  A salient feature of our proposed technique is that it is not a new k-medoid clustering algorithm per se, rather, it can be used in conjunction with any existing clustering algorithm that is based on the k-medoid paradigm. Experimental results, using both synthetic and real datasets, confirm the efficiency, effectiveness and scalability of the proposed technique. 

Author Biographies

Adriano Arantes Paterlini, ICMC - USP

Systems Analyst at Department of Computer Science - ICMC - USP

Mario A. Nascimento, University of Alberta

Ph.D. Associate Professor at Department of Computing Science- University of Alberta

Caetano Traina Jr., ICMC - USP
Ph.D. Titular Professor at Department of Computer Science - ICMC - USP
Published
2011-09-12