RDBMS as an Efficient Tool to Mine Cliques on Complex Networks
Keywords: cliques, cluster coefficient, graph mining, power law, RDBMS
Complex networks are intrinsically present in a wide range of applications.
Real world networks have several unique properties, such as, sparsity, node degree distribution, which follow a power law and a large amount of triangles that further form larger cliques.
Triangles and cluster coefficient, which are usually used to find groups, are not always enough to distinguish a different node neighborhood topology.
By using cliques of sizes 4 and 5, it is possible to study how triangles become involved to form large cliques.
To retrieve these cliques called k4 and k5 a novel technique called ``FCR - Fast Clique Retrieval'' has been developed, taking advantage of the data management and optimization techniques of a relational database management system and SQL to query cliques of sizes 4 and 5. This paper demonstrates that cliques (3, 4 and 5) follow interesting power laws that allow identifying nodes with suspicious behaviors. It also presents an extension of the cluster coefficient formula, which may become a valuable equation to identify nodes that most influence the network first eigenvalue.