TY - GEN
T1 - Mining communities in networks
T2 - 2009 9th ACM SIGCOMM Internet Measurement Conference, IMC 2009
AU - Kwak, Haewoon
AU - Choi, Yoonchan
AU - Eom, Young Ho
AU - Jeong, Hawoong
AU - Moon, Sue
PY - 2009
Y1 - 2009
N2 - Online social networks pose significant challenges to computer scientists, physicists, and sociologists alike, for their massive size, fast evolution, and uncharted potential for social computing. One particular problem that has interested us is community identification. Many algorithms based on various metrics have been proposed for identifying communities in networks [18, 24], but a few algorithms scale to very large networks. Three recent community identification algorithms, namely CNM [16],Wakita [59], and Louvain [10], stand out for their scalability to a few millions of nodes. All of them use modularity as the metric of optimization. However, all three algorithms produce inconsistent communities every time the input ordering of nodes to the algorithms changes. We propose two quantitative metrics to represent the level of consistency across multiple runs of an algorithm: pairwise membership probability and consistency. Based on these two metrics, we propose a solution that improves the consistency without compromising the modularity. We demonstrate that our solution to use pairwise membership probabilities as link weights generates consistent communities within six or fewer cycles for most networks. However, our iterative, pairwise membership reinforcing approach does not deliver convergence for Flickr, Orkut, and Cyworld networks as well for the rest of the networks. Our approach is empirically driven and is yet to be shown to produce consistent output analytically. We leave further investigation into the topological structure and its impact on the consistency as future work. In order to evaluate the quality of clustering, we have looked at 3 of the 48 communities identified in the AS graph. Surprisingly, they all have either hierarchical, geographical, or topological interpretations to their groupings. Our preliminary evaluation of the quality of communities is promising. We plan to conduct more thorough evaluation of the communities and study network structures and their evolutions using our approach.
AB - Online social networks pose significant challenges to computer scientists, physicists, and sociologists alike, for their massive size, fast evolution, and uncharted potential for social computing. One particular problem that has interested us is community identification. Many algorithms based on various metrics have been proposed for identifying communities in networks [18, 24], but a few algorithms scale to very large networks. Three recent community identification algorithms, namely CNM [16],Wakita [59], and Louvain [10], stand out for their scalability to a few millions of nodes. All of them use modularity as the metric of optimization. However, all three algorithms produce inconsistent communities every time the input ordering of nodes to the algorithms changes. We propose two quantitative metrics to represent the level of consistency across multiple runs of an algorithm: pairwise membership probability and consistency. Based on these two metrics, we propose a solution that improves the consistency without compromising the modularity. We demonstrate that our solution to use pairwise membership probabilities as link weights generates consistent communities within six or fewer cycles for most networks. However, our iterative, pairwise membership reinforcing approach does not deliver convergence for Flickr, Orkut, and Cyworld networks as well for the rest of the networks. Our approach is empirically driven and is yet to be shown to produce consistent output analytically. We leave further investigation into the topological structure and its impact on the consistency as future work. In order to evaluate the quality of clustering, we have looked at 3 of the 48 communities identified in the AS graph. Surprisingly, they all have either hierarchical, geographical, or topological interpretations to their groupings. Our preliminary evaluation of the quality of communities is promising. We plan to conduct more thorough evaluation of the communities and study network structures and their evolutions using our approach.
KW - AS graph
KW - CNM
KW - Community
KW - Consistent community identification
KW - Louvain
KW - Modularity
KW - Social networks
KW - Wakita
UR - http://www.scopus.com/inward/record.url?scp=84870691488&partnerID=8YFLogxK
U2 - 10.1145/1644893.1644930
DO - 10.1145/1644893.1644930
M3 - Conference contribution
AN - SCOPUS:84870691488
SN - 9781605587707
T3 - Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC
SP - 301
EP - 314
BT - IMC 2009 - Proceedings of the 2009 ACM SIGCOMM Internet Measurement Conference
Y2 - 4 November 2009 through 6 November 2009
ER -