Community detection finds sets of nodes, i.e., that are highly connected inside the community and coarsely connected outside the community.

Community: a group of individuals sharing common interests.

A community is a subset CVC\subseteq V of the nodes VV. Non-overlapping communities or partitions are pairwise disjoint set Ci,CjC_i,C_j s.t. CiCj=C_i\cap C_j=\empty for each i,ji,j. Overlapping communities are not necessarily pairwise disjoint.

Non-overlapping Community Detection

Disjoint aka partitions.

Two common algorithms for non-overlapping community detection are Spectral clustering and Modularity optimization.

Modularity Optimization

Modularity is a measure of cohesiveness 凝聚力 among communities.

Modularity Q\mathcal Q compares the density inside communities with a null model of density. The null model assumes that nodes can form random connections with any other node.

QCC\mathcal Q\varpropto\sum_{C\in\mathcal C} [number of edges within community CC - expected number of edges within community CC]

The main question is how do we quantify the edges within a community and what is a good null model.

Edge within community. In order to find how many edges belong to the same community we simply sum the edges in each community and normalize by the total number of edges.

12mCCiCjCaij\frac{1}{2m}\sum_{C\in\mathcal C}\sum_{i\in C}\sum_{j\in C}a_{ij}

We consider all pairs of nodes, we need to normalize by twice number of edges 2m2m as we count both ijij and jiji.

Modularity’s null model. Given a graph GG with n=Vn=|V| vertices and m=Em=|E| edges, construct a rewired graph GG'. The rewired graph has the same degree distribution but the connections among nodes appear at random. By ensuring the same degree, we assert that we do not compare the graph GG with a completely random one.

Node with large degrees represents celebrities. Altering their degrees might end up transforming a star into a ordinary person.

If nodes are allowed to form any connection with any other node as long as the degree is preserved, the probability that given one node ii the other one jj is

pij=didj4m2p_{ij}=\frac{d_id_j}{4m^2}

where di,djd_i,d_j are the degrees of ii and jj respectively. The expected number of edges between nodes ii and jj is 2mpij=2mpij=didj2m2m\cdot p_{ij}=2mp_{ij}=\frac{d_id_j}{2m}. As such, the expected number of edges in GG' is

12iVjVdidj2m=1212miVdijVdj=14m2m2m=m\frac{1}{2}\sum_{i\in V}\sum_{j\in V}\frac{d_id_j}{2m}=\frac{1}{2}\cdot\frac{1}{2m}\sum_{i\in V}d_i\sum_{j\in V}d_j\\ =\frac{1}{4m}2m\cdot2m=m

Modularity. Putting all together, the Modularity of a partition C\mathcal C of GG is

Q(G,C)=12mCCiCjC(aijdidj2m)\mathcal Q(G,\mathcal C)=\frac{1}{2m}\sum_{C\in\mathcal C}\sum_{i\in C}\sum_{j\in C}(a_{ij}-\frac{d_id_j}{2m})

This value is taken in [1,1][-1,1]. A modularity close to 11 indicates a good partition, while a negative value indicates a partition in which connected nodes are separated. A strong community structure is usually considered between Q=0.3\mathcal Q=0.3 and 0.70.7,

The direct modularity optimization is NP-hard. The Louvain algorithm propose a greedy algorithm that runs in O(nlogn)\mathcal O(n\log n). The Louvain method works in a bottom-up fashion by progressively merging the two communities that bring the largest increase in modularity.

Spectral Modularity

A spectral method for modularity maximization.

First, with a bi-partition of the graph i.e. dividing the graph in 2 communities C1,C2C_1,C_2. Let cRn\mathbf c\in\mathbb R^n be a real vector that indicates whether a node belongs to one or the other community. An entry cic_i is

ci={1 if viC11 if viC2c_i=\begin{cases} 1\text{ if }v_i\in C_1\\ -1\text{ if }v_i\in C_2 \end{cases}

We can rewrite equation for Q\mathcal Q as

Q=12mi,j(aijdidj2m)δ(ci,cj)\mathcal Q=\frac{1}{2m}\sum_{i,j}(a_{ij}-\frac{d_id_j}{2m})\delta(c_i,c_j)

where δ(ci,cj)=1\delta(c_i,c_j)=1 if ci=cjc_i=c_j and 00 otherwise.

Notice that δ(ci,cj)=12(cicj+1)\delta(c_i,c_j)=\frac{1}{2}(c_ic_j+1). Thus

Q=14mi,j(aijdidj2m)cicj=14mcBc\mathcal Q=\frac{1}{4m}\sum_{i,j}(a_{ij}-\frac{d_id_j}{2m})c_ic_j=\frac{1}{4m}\mathbf c^\top\mathbf{Bc}

where B\mathbf B with entries Bij=aijdidj2mB_{ij}=a_{ij}-\frac{d_id_j}{2m} is referred to as modularity matrix.

The objective is very close to spectral clustering. However, in this case we try to maximize the quantity. Since B\mathbf B is symmetric, the variational characterization of the eigenvalues tells that the solution of the optimization maxxMxxx\max\frac{\mathbf x^\top\mathbf{Mx}}{\mathbf x^\top\mathbf x} is the eigenvector that corresponds to the largest of the eigenvalues of the modularity matrix.

In order to compute the partition c\mathbf c, we relax the vector c\mathbf c to take any real value. Our object (omitting 14m1\over4m) becomes

maxccBcs.t. cc=nccBc=0c[cBcβ(ncc)]2Bc=2βc\max_\mathbf c \mathbf{c^\top Bc}\\ \text{s.t. }\mathbf{c^\top c}=n\\ \Rarr\frac{\partial}{\partial\mathbf c}\mathbf{c^\top Bc}=0\\ \Rarr\frac{\partial}{\partial\mathbf c}[\mathbf{c^\top Bc}-\beta(n-\mathbf{c^\top c})]\Rarr2\mathbf{Bc}=2\beta c

The last equality is an eigenvalue - eigenvalue equation, where β\beta is an eigenvalue for the eigenvector c\mathbf c. If we substitute Bc=βc\mathbf{Bc}=\beta\mathbf c, we will obtain

cBc=cβc=βcc=βn\mathbf{c^\top Bc}=\mathbf{c^\top\beta c}=\beta\mathbf{c^\top c}=\beta n

In order to maximize the objective we need to select the largest eigenvalue and the corresponding eigenvector.

Finally, the partition of the graph is obtained by

ci={1 if vi>01 otherwisec_i=\begin{cases} 1\text{ if }v_i\gt0\\ -1\text{ otherwise} \end{cases}

Detecting multiple communities. Similar to spectral clustering, we would be tempted to generalize the spectral modularity to the case of multiple communities by recursive bi-partitioning. This method, however, assumes that the graph is cut into two parts and that the edges among the two partitions are removed. Unfortunately, removing edges changes also the degree of the nodes, and in turns the null model that forms the matrix B\mathbf B, ending up maximizing the wrong quantity. As a remedy, we consider the change ΔQ\Delta\mathcal Q in modularity after splitting a community CC into two parts

ΔQ=12m[12i,jCBij(cicj+1)i,jCBij]=14m[i,jCBijcicji,jCBij]=14mi,jC[BijδijkCBik]cicj=14mcB(C)c\Delta\mathcal Q=\frac{1}{2m}[\frac{1}{2}\sum_{i,j\in C}B_{ij}(c_ic_j+1)-\sum_{i,j\in C}B_{ij}]\\ =\frac{1}{4m}[\sum_{i,j\in C}B_{ij}c_ic_j-\sum_{i,j\in C}B_{ij}]\\ =\frac{1}{4m}\sum_{i,j\in C}[B_{ij}-\delta_{ij}\sum_{k\in C}B_{ik}]c_ic_j\\ =\frac{1}{4m}\mathbf c^\top \mathbf B^{(C)}\mathbf c

where δij=1\delta_{ij}=1 if i=ji=j and 00 otherwise, is called the Kronecker delta and Bij(C)=BijδijkCBikB_{ij}^{(C)}=B_{ij}-\delta_{ij}\sum_{k\in C}B_{ik}.

The above is generalized spectral method, in which we can repeatedly find eigenvectors in the modified matrix B(C)\mathbf B^{(C)}. It also provides a convenient stopping criteria: Once the increment in modularity is negative, there is no other community that will increase the modularity further and the method can stop.

Pros and Cons

👍

  • Clusters can have arbitrary shape and size, i.e., clusters are not restricted to have convex shapes.
  • Efficient in common real graphs.
  • Robust to noise.

👎

  • Inefficient on dense graphs with O(n3)\mathcal O(n^3) in the worse case.
  • Resolution limit: When merging communities if the number of edges is larges.

Overlapping Community Detection

Nodes can be shared among communities. Harder than Non-overlapping.

Clique Percolation

Clique percolation detects overlapping communities by exploiting the definition of clique.

A clique is a fully connected subgraph. In other words, a clique is a set of nodes that have an edge with any other node in the clique.

A kk-clique is a complete subgraph on kk nodes.

A kk-clique is adjacent to another clique if it shares k1k-1 vertices. In clique percolation we define a community the union of adjacent kk-cliques. Given such a definition it’s easy to formulate our first algorithm CFinder.

The CFinder algorithm starts with a kk-clique defining a community, where kk is a user parameter, and expands the community until there is no more adjacent cliques to add.

Pseudo Code

Input: Graph GG, Size of cliques kk

  1. Start with a kk-clique
  2. Roll the clique over adjacent cliques
  3. A kk-cliques community is the largest subgraph obtained by the union of all adjacent kk-cliques
  4. Other kk-cliques that can not be reached correspond to other clique-communities

Pros and Cons

👍

  • Easy to implement and understand
  • Finding kk-cliques is polynomial
  • Communities are easily explainable

👎

  • Requires size of cliques: kk
  • Rigid communities: the results depend on the existence of the cliques.

AGM/BigCLAM

This method instead take a probabilistic approach.

AGM and BigCLAM are two different models but with very similar premises.

Intuition: graph is a sample of an unknown distribution in which we flip a coin on the existence of an edge.

The existence probability is determined by the community structure: the more communities two nodes share, the higher the probability.

Given a graph GG with communities C\mathcal C each of them existing with probability pCp_C, a membership set Mi={CCviC}M_i=\{C\in\mathcal C|v_i\in C\}. The probability of existence of an edge in this model is

P(i,j)=1CMiMj(1pC)P(i,j)=1-\prod_{C\in M_i\cap M_j}(1-p_C)

The above probability ensures that an edge exists if the nodes share at least one community. The above is an instance of a noisy-OR. If the nodes share no community, we assign a fixed small probability ϵ\epsilon.

Our model assumes that each edge is generated independently by any other edge. As such, given the parameters Θ=(C,M,{pC})\Theta=(\mathcal C,M,\{p_C\}), the likelihood that our graph GG is sampled by the AGM model is

P(GC,M,{pC})=(i,j)EP(i,j)(i,j)E(1P(i,j))P(G|\mathcal C,M,\{p_C\})=\prod_{(i,j)\in E}P(i,j)\prod_{(i,j)\notin E}(1-P(i,j))

AGM is a rather flexible model and can express a variety of community structures, such as non-overlapping and hierarchical communities.

The model assumes that we know already the parameters in order to compute the likelihood of a network.

By maximizing this equation, we can find the parameters of the model that maximize the likelihood. Bit how to find this model? Finding Θ\Theta means finding a community membership for each node. The task is hard.

From AGM to BigCLAM. AGM model is hard to optimize. BigCLAM drops some of the requirements of AGM to attain efficiency.

Under BigCLAM each node belongs to some community CjC_j with some strength FijF_{ij}. If Fij=0F_{ij}=0, node ii is not a member of community jj. The probability pCp_C that an edge between ii and jj exists if they belong to community CC is

pC(i,j)=1exp(FiCFjC).p_C(i,j)=1-\exp(-F_{iC}F_{jC}).

This means that the probability that an edge exists is proportional to the product of the strengths. If one node does not belong to the community CC (FiC=0F_{iC}=0) then its probability is 00. The probability that at least one common community CC links two nodes is

P(i,j)=1C(1pC(i,j))=1exp(FiFj)P(i,j)=1-\prod_C(1-p_C(i,j))=1-\exp(-\mathbf F_{i\cdot}\mathbf F_{j\cdot}^\top)

where Fi\mathbf F_{i\cdot} is the row vector of the matrix FRn×C\mathbf F\in\mathbb R^{n\times|C|}.

To compute F\mathbf F to fit the model to the data, we use a maximum likelihood approach

argmaxFi,jEp(i,j)(i,j)E(1p(i,j)).\arg\max_\mathbf F\prod_{i,j\in E}p(i,j)\prod_{(i,j)\notin E}(1-p(i,j)).

Instead of maximizing the likelihood, we maximize the log-likelihood

P(GF)=(i,j)Elog(1exp(FiFj))(i,j)E(FiFj).P(G|\mathbf F)=\sum_{(i,j)\in E}\log(1-\exp(-\mathbf F_{i\cdot}\mathbf F_{j\cdot}^\top))-\sum_{(i,j)\notin E}(\mathbf F_{i\cdot}\mathbf F_{j\cdot}^\top).

Since matrix F\mathbf F is real, the model can be optimized with coordinate ascent over the rows.

Efficiency optimization. The gradient can be optimized by caching and compute an iteration in linear time.

P(GF)=vN(i)Fvexp(FiFv))1exp(FiFv))vN(i)Fv.\nabla P(G|\mathbf F)=\sum_{v\in N(i)}\mathbf F_v\cdot\frac{\exp(-\mathbf F_{i\cdot}\mathbf F_{v\cdot}^\top))}{1-\exp(-\mathbf F_{i\cdot}\mathbf F_{v\cdot}^\top))}-\sum_{v\notin N(i)}\mathbf F_v.

The factor vN(i)Fv\sum_{v\notin N(i)}\mathbf F_v can be written as

vFvFivN(i)Fv\sum_v\mathbf F_{v\cdot}-\mathbf F_{i\cdot}-\sum_{v\in N(i)}\mathbf F_{v\cdot}

which runs in linear time as both vFvFi\sum_v\mathbf F_{v\cdot}-\mathbf F_{i\cdot} can be cashed in advance.

Pros and Cons

👍

  • Detect overlapping and non-overlapping communities
  • Cluster can have arbitrary shapes
  • Nearly linear Complexity
  • Flexible model.

👎

  • Requires the number of communities upfront
  • Hard to find the threshold for determining the membership of each nodes.