Using Pivots to Speed-Up k-Medoids Clustering

Adriano Arantes Paterlini; Mario A. Nascimento; Caetano Traina Jr.

Authors

Adriano Arantes Paterlini ICMC - USP
Mario A. Nascimento University of Alberta
Caetano Traina Jr. ICMC - USP

Keywords:

Clustering, k-medoids, Metric space, Pivot-based technique

Abstract

Clustering is a key technique within the KDD process, with k-means, and the more general k-medoids, being well-known incremental partition-based clustering algorithms. A fundamental issue within this class of algorithms is to find an initial set of medians (or medoids) that improves the efficiency of the algorithms (e.g., accelerating its convergence to a solution), at the same time that it improves its effectiveness (e.g., finding more meaningful clusters). Thus, in this article we aim at providing a technique that, given a set of elements, quickly finds a very small number of elements as medoid candidates for this set, allowing to improve both the efficiency and effectiveness of existing clustering algorithms. We target the class of k-medoids algorithms in general, and propose a technique that selects a well-positioned subset of central elements to serve as the initial set of medoids for the clustering process. Our technique leads to a substantially smaller amount of distance calculations, thus improving the algorithm's efficiency when compared to existing methods, without sacrificing effectiveness. A salient feature of our proposed technique is that it is not a new k-medoid clustering algorithm per se, rather, it can be used in conjunction with any existing clustering algorithm that is based on the k-medoid paradigm. Experimental results, using both synthetic and real datasets, confirm the efficiency, effectiveness and scalability of the proposed technique.

Author Biographies

Adriano Arantes Paterlini, ICMC - USP

Systems Analyst at Department of Computer Science - ICMC - USP
Mario A. Nascimento, University of Alberta

Ph.D. Associate Professor at Department of Computing Science- University of Alberta
Caetano Traina Jr., ICMC - USP

Ph.D. Titular Professor at Department of Computer Science - ICMC - USP

Using Pivots to Speed-Up k-Medoids Clustering

Authors

Keywords:

Abstract

Author Biographies

Downloads

Published

Issue

Section

Developed By

Language

Information