Distributed Community Detection Using Synthetic Coordinates

Introduction

This work is focused on finding the entire community structure of a network, based on local interactions between neighboring nodes and on an unsupervised distributed hierarchical clustering algorithm (see Figure 1). The novelty of the proposed approach is the fact that the algorithm is based on the use of network coordinates computed by a distributed algorithm (see Figure 2). We reduce the graph clustering problem (see Figure 1) to points clustering (see Fig. 2) that can be solved using centralized clustering algorithms like k-means [1-2] or distributed clustering algorithms [3].

Experimental results and comparisons with other methods from literature are presented for a variety of benchmark graphs with known community structure, derived by varying a number of graph parameters and real data sets. The experimental results and comparisons to existing methods with similar computation cost on real and synthetic data sets demonstrate the high performance and robustness of the proposed scheme.

 

Figure 1. A network graph of  4-communities.

Figure 2. The Synthetic coordinates representation of a 4-communites network. 

                                                  


Methodology

The proposed local community finding algorithm [3] comprises the following steps:
• The position estimation algorithm, which is a distributed algorithm inspired by Vivaldi [7].
• The community detection algorithm using hierarchical clustering.

 

Phase 1: Network coordinates

  • Each node gets a position in high-dimensional Euclidean space using Vivaldi algorithm. Vivaldi is a fully decentralized, light-weight, adaptive network coordinate algorithm that predicts Internet latencies with low error.

  • Vivaldi simulates a network of physical springs. We place imaginary springs between selected pairs of connected nodes (traction) and non-connected nodes (turning away).
     

Figure 3. The two types of  physical springs for connected nodes (traction) and non-connected nodes (turning away) .

 

Phase 2: Distributed Clustering algorithm

  • Each node is considered as a (singleton) cluster. We successively merge two clusters a, b:
    if a is b’s closest neighbor and b is a’s closest neighbor and the distance between the two clusters is less that a threshold.

                                                


Downloads

  • You can download the executable for Community Detection Using Synthetic Coordinates  and the synthetic datasets used in [3]. See readme.txt files for instructions. You can use them only for non-commercial purposes. If you use them, please cite [3] or a part of our related papers [1-6].

 


Related Publications

[1]. H. Papadakis, C. Panagiotakis and P. Fragopoulou, Locating Communities on Real Dataset Graphs Using Synthetic Coordinates, Parallel Processing Letters (PPL), vol. 22, no. 1, Jan. 2012.

[2]. H. Papadakis, C. Panagiotakis and P. Fragopoulou, Locating Communities on Graphs with variations in Community Sizes, Journal of Supercomputing, vol. 185, no.1, pp. 9-15, Jan. 2012.

[3]. H. Papadakis, C. Panagiotakis and P. Fragopoulou, Distributed Community Detection in a Complex World Using Synthetic Coordinates, Journal of Statistical Mechanics, 2014 (under revision).
 

[4]. H. Papadakis, C. Panagiotakis and P. Fragopoulou, Local Community Finding using Synthetic Coordinates, International Conference on Future Information Technology (FutureTech 2011),  2011 (Best Paper Award).

[5]. A. Kalaitzakis, H. Papadakis, C. Panagiotakis and P. Fragopoulou, Community Detection in Collaborative Environments: A Comparative Analysis, International Conference on PErvasive Technologies Related to Assistive Environments, 2011.

[6]. H. Papadakis, C. Panagiotakis and P. Fragopoulou, Distributed Community Detection: Finding neighborhoods in a complex world using synthetic coordinates, IEEE symposium on Computers and Communications, 2011.

[7]. Frank Dabek, Russ Cox, Frans Kaashoek, and Robert Morris. Vivaldi: A decentralized network coordinate system. In Proceedings of the ACM SIGCOMM ’04 Conference, August 2004.

[8] C. Panagiotakis, H. Papadakis, E. Grinias, N. Komodakis, P. Fragopoulou and G. Tziritas, Interactive Image Segmentation Based on Synthetic Graph Coordinates, Pattern Recognition, vol. 46, no. 11, pp. 2940-2952, Nov. 2013.

[9] C. Panagiotakis, H. Papadakis, E. Grinias, N. Komodakis, P. Fragopoulou and G. Tziritas, Interactive Image Segmentation via Graph Clustering and Synthetic Coordinates Modeling, International Conference on Computer Analysis of Images and Patterns, 2013. 

 

Acknowledgments: This work is partially supported by the  “ARCHIMEDE III: Education and Lifelong Learning” (Project’s Acronym: P2PCOORD) project.