This module introduces graph mining.

Data points from networks, graphs, of connections.

Purpose: extract patterns from graphs.

3 different areas of graph mining:

  1. Community detection: finding groups of nodes with similar characteristics.
  2. Link analysis: understanding which node is more important than others.
  3. Embedding: capturing similarities among nodes.

Graph G=(V,E)G=(V,E): a set of vertices VV connected through edges EV×VE\subseteq V\times V.

undirected & directed graph

Graph representations. A graph can be represented by adjacency lists or an adjacency matrix.

Degree of a node.

Graph construction:

  • KNN graph: connect every point x\mathbb x with its closest kk points.
  • ϵ\epsilon-neighborhood: connect all points x\mathbb x with the points within ϵ\epsilon distance.
  • Fully-connected graph: choose a distance distdist, then connect any point to any other point. The weight of each edge (i,j)(i,j) corresponds to 1dist1-dist between ii and jj.

The Adjacency Matrix

Consider undirected graphs. Their adjacency matrix is symmetric.

By multiplying a vector xRn\mathbf x\in\mathbb R^n with a adjacency matrix.

x\mathbf x can be thought as the vector form of a function f:VRf:V\rarr\mathbb R.

Ax=y(a1,1a1,nan,1an,n)(x1xn)=(y1yn)\mathbf{Ax}=\mathbf{y}\\ \Rarr \left( \begin{array}{cc}a_{1,1} & \cdots & a_{1,n}\\ \vdots & \ddots & \vdots\\ a_{n,1} & \cdots& a_{n,n} \end{array} \right) \left({\begin{array}{cc}x_1\\ \vdots\\x_n\end{array}}\right) = \left({\begin{array}{cc}y_1\\ \vdots\\y_n\end{array}}\right)

The vector y\mathbf y’s ii-th element is yi=(i,j)Exjy_i=\sum_{(i,j)\in E}x_j. Therefore, the entry yiy_i is the sum of labels xjx_j of neighbors of ii. This means that A\mathbf A is a matrix that transforms the value of a node as a sum of the value of the neighbors. If we repeat this aggregation multiple times we obtain a propagation of the initial labels x\mathbf x over the network x(t)=Ax(t1)\mathbf x^{(t)}=\mathbf{Ax}^{(t-1)}

There exists a fixed-point in this iteration:

λ,Ax=λx\exist\lambda,\mathbf{Ax}=\lambda\mathbf{x}

We want to find an eigenvector x\mathbf x and eigenvalue λ\lambda for the matrix A\mathbf A.

Spectral Graph Theory

Analyzing properties of the graph by inspecting the eigenvalues and eigenvectors of matrices that represent the graph structure.

Given a graph GG and a matrix M\mathbf M that describes the graph’s structure, the graph spectrum is the set of M\mathbf M’s eigenvalues Λ={λ1,,λn}\Lambda=\{\lambda_1,\cdots,\lambda_n\} sorted in descending order λ1λλn\lambda_1\ge\lambda\ge\cdots\ge\lambda_n

λC1\lambda\in\mathbb C^1 is an eigenvalue if there exists a vector xCn,x0\mathbf x\in\mathbb C^n,\mathbf x\neq0 s.t. \mathbf{Ax}=\lambda\mathbf x\Lrarr(\mathbf A-\lambda\mathbf I)\mathbf x=0\Rarr\mathbf\det(A-\lambda\mathbf I)=0

For symmetric matrices. A\mathbf A is symmetric, This means aj,i=ai,ji,ja_{j,i}=a_{i,j}\forall i,j. Thus,

  • All eigenvalues are positive λ0\lambda\ge0

  • The matrix is positive semi-definite, xAx0x\mathbf x^\top\mathbf{Ax}\ge0\forall\mathbf x.

  • \exist matrix N\mathbf N, s.t. A=NN\mathbf A=\mathbf N^\top\mathbf N

  • All eigenvalues are real numbers, all eigenvectors are orthogonal (uv=0\mathbf u^\top\mathbf v=0).

In a dd-regular graph, that 1\mathbf 1 is an eigenvector. That means all the eigenvectors u\mathbf u should be 1u=0iui=0\mathbf 1^\top\mathbf u=0\Rarr\sum_iu_i=0. This is a very important fact that will be used later.

A graph is dd-regular, if the degree for every vertex is dd.

Graph Matrices

The degree matrix denoted Δ\mathbf \Delta is an n×nn\times n diagonal matrix where Δ=jai,j\mathbf\Delta=\sum_ja_{i,j} is the degree of node ii.

Unnormalized Laplacian Matrix or LL is an n×nn\times n matrix. L=ΔAL=\mathbf\Delta-\mathbf A

Normalized Symmetric Laplacian or Lsym=Δ1/2LΔ1/2=IΔ1/2AΔ1/2L^{sym}=\mathbf\Delta^{-1/2}L\mathbf\Delta^{-1/2}=\mathbf I-\Delta^{-1/2}\mathbf{A\Delta}^{-1/2} is the normalized version of LL.

Asymmetric (Random Walk) Laplacian or Lrw=IΔ1AL^{rw}=\mathbf I-\mathbf\Delta^{-1}\mathbf A.

Laplacian’s properties: It has a trivial eigenvector 1\mathbf 1 corresponding with the eigenvalue 00 since L1=0L\mathbf 1=0. Moreover, Laplacian is symmetric and positive semi-definite, all the eigenvalues are non-negative real numbers and the eigenvectors are real and orthogonal.

xLx=(i,j)E(xixj)2\mathbf x^\top L\mathbf x=\sum_{(i,j)\in E}(x_i-x_j)^2

Spectral Clustering

A cut is a set of edges with only one vertex in a group

W(A,B)=iA,jBai,jW(A,B)=\sum_{i\in A,j\in B}a_{i,j}

If we directly minimize the cut we might end up in unfortunate solution in which a partition contains only one node. To avoid this, we can minimize a normalized cut

Jrc(C)=W(A,B)A+W(A,B)BJ_{rc}(C)=\frac{W(A,B)}{|A|}+\frac{W(A,B)}{|B|}

Where C={A,B}C=\{A,B\}. Computing optimal normalized cuts is NP-hard.

Another approach to write the cut is considering a variable xix_i for each vertex ii, s.t.

xi={+1  (iA)1  (iB)x_i=\begin{cases} +1\ \ (i\in A)\\ -1\ \ (i\in B) \end{cases}

Thus W(A,B)=(i,j)ExixjW(A,B)=\sum_{(i,j)\in E}|x_i-x_j|.

So if we want to minimize the cut we can just minimize the sum of absolute differences among x\mathbf x entries. It’s more convenient instead to minimize i,jE(xixj)2\sum_{i,j\in E}(x_i-x_j)^2. However, as we have seen this corresponds to xLx\mathbf x^\top L\mathbf x.

Similarly, we can see that

xLxxx\frac{\mathbf x^\top L\mathbf x}{\mathbf x^\top\mathbf x}

corresponds to a normalized version of the cut and is called the Rayleigh quotient. (瑞利商)

Minimum-cut Relates to Spectral Properties

Variantional characterization of eigenvalues. Consider a matrix M\mathbf M real and symmetric. The eigenvalues are sorted in decreasing order. The variational characterization of the eigenvalues states that states that the smallest eigenvalue corresponds to

λn=minx0xMxxx=minx0i,jmi,jxixjixi2\lambda_n=\min_{\mathbf x\neq 0}\frac{\mathbf x^\top\mathbf{Mx}}{\mathbf x^\top\mathbf x}=\min_{\mathbf x\neq0}\frac{\sum_{i,j}m_{i,j}x_ix_j}{\sum_ix_i^2}

where xn\mathbf x_n is the eigenvector corresponding to the eigenvalue λn\lambda_n. Similarily, the second smallest eigenvector λn1\lambda_{n-1} is the solution

λn1=minx0,xx1=0xMxxx\lambda_{n-1}=\min_{\mathbf x\neq0,\mathbf x^\top\mathbf x_1=0}\frac{\mathbf x^\top\mathbf{Mx}}{\mathbf x^\top\mathbf x}

And so on with all the eigenvalues. Finally, λ1\lambda_1

λ1=maxx0xMxxx\lambda_1=\max_{\mathbf x\neq0}\frac{\mathbf x^\top\mathbf{Mx}}{\mathbf x^\top\mathbf x}

Minimum cut as eigenvalues of the Laplacian.

  • All eigenvectors a symmetric matrix are orthogonal and that the Laplacian matrix for an undirected graph is a symmetric matrix.
  • The eigenvalue that minimizes λn\lambda_n is the eigenvalue 00 that corresponds to the eigenvector 1\mathbf 1 of the graph’s Laplacian LL. This corresponds to placing all nodes into a single partition.
  • The second smallest λn1\lambda_{n-1} corresponds to an eigenvector x\mathbf x, which is orthogonal to the eigenvector 1\mathbf 1, i.e. ixi=0\sum_ix_i=0.
  • The eigenvector is called the Fiedler’s vector and minimizes a relaxed version of the normalized cut.

Spectral Clustering Algorithm

Spectral clustering works in 3 phases:

  1. Preprocessing: Construct a matrix representation of the graph.
  2. Decomposition: compute the eigenvalues and eigenvectors of the matrix. Map each point to a lower-dimensional representation based on one or more eigenvectors.
  3. Grouping: Assign points to two or more clusters, based on the new representation.

The Spectral Partitioning Algorithm finds a partition of the graph that is close to the optimal partition. In the decomposition phase, Spectral clustering maps vertices to corresponding components of λn1\lambda_{n-1} and its vector. In the grouping phase, we sort the components of the new 1-dimensional vector and split then nodes in positive and negative or using the median value.

kk-way Spectral Clustering

We extend spectral clustering to the case of finding kk-partitions.

One strategy is to recursively split the graphs into two partitions, until kk partitions are returned.

Another strategy leverages multiple eigenvectors besides the one corresponding to the second smallest eigenvalue. By stacking on the eigenvectors each node is embedded into a multi-dimensional space. Finally, we run kk-means in this new space to find kk-partitions.

Advantages of using multiple eigenvectors:

  • Approximates the optimal cut
  • Emphasizes cohesive clusters by increasing the unevenness in the distribution in the data.
  • Provides a well-separated space. It transforms data to a new embedded space consisting of kk orthogonal basis vectors.
  • Multiple eigenvectors prevent instability caused by information loss.

Pros and Cons

👍

  • Clusters can have arbitrary shapes and size and are not restricted to convex shapes.
  • Efficient in normal space graphs.
  • Robust to noise is theoretically grounded and well-connected to graph properties.

👎

  • Inefficient on dense graph - O(n3)O(n^3) worst case.
  • Need to provide number of clusters kk.