layout | title |
---|---|
post | Spectral Clustering |
Here we study the important class of spectral methods for understanding networks on a global level. By "spectral" we mean the spectrum, or eigenvalues, of matrices derived from graphs, which will give us insight into the structure of the graphs themselves. In particular, we will explore spectral clustering algorithms, which take advantage of these tools for clustering nodes in graphs.
The spectral clustering algorithms we will explore generally consist of three basic stages.
Let's formalize the task we would like to solve. We start out with an undirected graph $$G(V, E)$$. Our goal is to partition $$V$$ into two disjoint groups $$A, B$$ (so $$A \cap B = \emptyset$$ and $$A \cup B = V$$) in a way that maximizes the number of connections internal to the groups and minimizes the number of connections between the two groups.
To further formalize the objective, let's introduce some terminology:
Since we want to minimize the number of connections between $$A$$ and $$B$$, we might decide to make the minimum cut our objective. However, we find that we end up with very unintuitive clusters this way -- we can often simply set $$A$$ to be a single node with very few outgoing connections, and $$B$$ to be the rest of the network, to get a very small cut. What we need is a measure that also considers internal cluster connectivity.
Enter the conductance, which balances between-group and within-group connectivity concerns. We define $$\phi(A, B) = \frac{cut(A, B)}{min(vol(A), vol(B))}$$ where $$vol(A) = \sum_{i \in A} k_i$$, the total (weighted) degree of the nodes in $$A$$. We can roughly think of conductance as analogous to a surface area to volume ratio: the numerator is the area of the shared surface between $$A$$ and $$B$$, and the denominator measures volume while trying to ensure $$A$$ and $$B$$ have similar volumes. Because of this nuanced measure, picking $$A$$ and $$B$$ to minimize the conductance results in more balanced partitions than minimizing the cut. The challenge then becomes to efficiently find a good partition, since minimizing conductance is NP-hard.
Enter spectral graph partitioning, a method that will allow us to pin down the conductance using eigenvectors. We'll start by introducing some basic techniques in spectral graph theory.
The goal of spectral graph theory is to analyze the "spectrum" of matrices representing graphs. By spectrum we mean the set $$\Lambda = {\lambda_1, \ldots, \lambda_n}$$ of eigenvalues $$\lambda_i$$ of a matrix representing a graph, in order of their magnitudes, along with their corresponding eigenvalues. For example, the largest eigenvector/eigenvalue pair for the adjacency matrix of a d-regular graph is the all-ones vector $$x = (1, 1, \ldots, 1)$$, with eigenvalue $$\lambda = d$$. Exercise: what are some eigenvectors for a disconnected graph with two components, each component d-regular? Note that by the spectral theorem, the adjacency matrix (which is real and symmetric) has a complete spectrum of orthogonal eigenvectors.
What kinds of matrices can we analyze using spectral graph theory?
In particular, $$\lambda_2$$, the second smallest eigenvalue of $$L$$, is already fascinating and studying it will let us make big strides in understanding graph clustering. By the theory of Rayleigh quotients, we have that $$\lambda_2 = \min_{x: x^T w_1 = 0} \frac{x^T L x}{x^T x}$$ where $$w_1$$ is the eigenvector corresponding to eigenvalue $$\lambda_1$$; in other words, we minimize the objective in the subspace of vectors orthogonal to the first eigenvector in order to find the second eigenvector (remember that $$L$$ is symmetric and thus has an orthogonal basis of eigenvalues). On a high level, Rayleigh quotients frame the eigenvector search as an optimization problem, letting us bring optimization techniques to bear. Note that the objective value does not depend on the magnitude of $$x$$, so we can constrain its magnitude to be 1. Note additionally that we know that the first eigenvector of $$L$$ is the all-ones vector with eigenvalue 0, so saying that $$x$$ is orthogonal to this vector is equivalent to saying that $$\sum_i x_i = 0$$.
Using these properties and the definition of $$L$$, we can write out a more concrete formula for $$\lambda_2$$: $$\lambda_2 = \min_x \frac{\sum_{(i, j) \in E} (x_i - x_j)^2}{\sum_i x_i^2}$$, subject to the constraint $$\sum_i x_i = 0$$. If we additionally constrain $$x$$ to have unit length, the objective turns into simply $$\min_x \sum_{(i, j) \in E} (x_i - x_j)^2$$.
How does $$\lambda_2$$ relate to our original objective of finding a best partition of our graph? Let's express our partition $$(A, B)$$ as a vector $$y$$ defined by $$y_i = 1$$ if $$i \in A$$ and $$y_i = -1$$ if $$i \in B$$. Instead of using the conductance here, let's first try to minimize the cut while taking care of the problem of balancing partition sizes by enforcing that $$\vert A\vert = \vert B\vert$$ (balance size of partitions), which amounts to constraining $$\sum_i y_i = 0$$. Given this size constraint, let's minimize the cut of the partition, i.e. find $$y$$ that minimizes $$\sum_{(i, j) \in E} (y_i - y_j)^2$$. Note that the entries of $$y$$ must be $$+1$$ or $$-1$$, which has the consequence that the length of $$y$$ is fixed. *This optimization problem looks a lot like the definition of $$\lambda_2$$!* Indeed, by our findings above we have that this objective is minimized by $$\lambda_2$$ of our Laplacian, and the optimal clustering $$y$$ is given by its corresponding eigenvector, known as the **Fiedler vector**.
Now that we have a link between an eigenvalue of $$L$$ and graph partitioning, let's push the connection further and see if we can get rid of the hard $$\vert A\vert = \vert B\vert$$ constraint -- maybe there is a link between the more flexible conductance measure and $$\lambda_2$$. Let's rephrase conductance here in the following way: if a graph $$G$$ is partitioned into $$A$$ and $$B$$ where $$\vert A\vert \leq \vert B\vert$$, then the conductance of the cut is defined as $$\beta = cut(A, B)/\vert A\vert$$. A result called the Cheeger inequality links $$\beta$$ to $$\lambda_2$$: in particular, $$\frac{\beta^2}{2k_{max}} \leq \lambda_2 \leq 2\beta$$ where $$k_{max}$$ is the maximum node degree in the graph. The upper bound on $$\lambda_2$$ is most useful to us for graph partitioning, since we are trying to minimize the conductance; it says that $$\lambda_2$$ gives us a good estimate of the conductance -- we never overestimate it more than by a factor of 2! The corresponding eigenvector $$x$$ is defined by $$x_i = -1/a$$ if $$i \in A$$ and $$x_j = 1/b$$ if $$i \in B$$; the signs of the entries of $$x$$ give us the partition assignments of each node.
Let's put all our findings together to state the spectral partitioning algorithm.
Some practical considerations emerge.
What if we want to cluster by higher-level patterns than raw edges? We can instead cluster graph motifs into "modules". We can do everything in an analogous way. Let's start by proposing analogous definitions for cut, volume and conductance:
How do we find clusters of motifs? Given a motif $$M$$ and graph $$G$$, we'd like to find a set of nodes $$S$$ that minimizes $$\phi_M(S)$$. This problem is NP-hard, so we will again make use of spectral methods, namely motif spectral clustering:
Again, we can prove a motif version of the Cheeger inequality to show that the motif conductance found by our algorithm is bounded above by $$4\sqrt{\phi_M^}$$, where $$\phi_M^$$ is the optimal conductance.
We can apply this method to cluster the food web (which has motifs dictated by biology) and gene regulatory networks (in which directed, signed triads play an important role).