Towards a Better Quality Metric for Graph Cluster Evaluation

  • Hélio Almeida Universidade Federal de Minas Gerais
  • Dorgival Guedes Neto Universidade Federal de Minas Gerais
  • Wagner Meira Jr Universidade Federal de Minas Gerais
  • Mohammed J. Zaki Rensselaer Polytechnic Institute
Keywords: Graph Clustering, Quality Metrics, Quality Evaluation, Graph Mining


The process of discovering groups of similar vertices in a graph, known as graph clustering, has interesting applications in many di?erent scenarios, such as marketing and recommendation systems. One of the most important aspects of graph clustering is the evaluation of cluster quality, which is important not only to measure the e?ectiveness of clustering algorithms, but also to give insights on the dynamics of relationships in a given network. Many quality metrics for graph clustering evaluation exist, but the most popular ones have strong biases and structural inconsistencies that cause the quality of their results to be, at least, doubtful. Our studies showed that, while in general those popular quality metrics do a good job evaluating the external sparsity between clusters, they do poorly when evaluating the internal density of those clusters, ignoring essential information (such as a cluster's vertex count) or having its internal density component ignored in practice because of its computational cost. In this article, we propose a new method forevaluating the internal density of a given cluster, one that not only uses more complete information to evaluate that density, but also takes into consideration structural characteristics of the original graph. With our proposed method, the internal density of a cluster is evaluated in terms of the expected density of similar clusters in that same graph, in contrast to the traditional quality metrics available, where clusters from di?erent graphs are compared by the same standards. We believe that, if used in conjunction with a good external sparsity evaluation metric, like conductance,this method will help to obtain better, more signi?cant graph clustering evaluation results.
SBBD Articles