Read the paper Proportionally Fair Clustering Revisited by Micha et Shah.

kk cluster centers must be placed given nn points in a metric space.

The cost to each point is its distance to the nearest cluster center.

This paper: Focus on the case where cluster centers can be placed anywhere in metric space.

E.g. L2L^2 distance over Rt\mathbb R^t, the approximation ratio of greedy capture improves to 22.

For L1,LL^1,L^\infty metric, the approximation ratio remains 1+21+\sqrt 2.

The paper gives universal lower bounds which apply to all algorithms.

Introduction

Points N\mathcal N, possible cluster centers M\mathcal M.

Given kNk\in\mathbb N, task is to select a set XMX\subseteq\mathcal M consisting of X=k|X|=k cluster centers.

Group fairness.

Proportional fairness.

Contribution

Focus on the case where the metric consists of usual distance functions such as L1,L2,LL^1,L^2,L^\infty over Rt\mathbb R^t.

⚠️Cluster centers can be placed anywhere in the infinite metric space. M=Rt\mathcal M=\mathbb R^t

请注意此处,候选聚类中心和上一篇论文不一样,所以得出的近似下界不一样。

Greedy Capture by Chen et al. where d{L1,L2,L}d\in\{L^1,L^2,L^\infty\} and M=Rt\mathcal M=\mathbb R^t.

For d=L2,M=Rtd=L^2,\mathcal M=\mathbb R^t, GC provides 22-PROP. The approximation is obtained by Apollonius radius.

For L1,LL^1,L^\infty, the approximation ratio can not be better than 1+21+\sqrt2.

Universal lower bounds: for d=L2d=L^2 and M=Rt\mathcal M=\mathbb R^t, no algorithm achieves better than 231.155\frac{2}{\sqrt 3}\approx1.155-PROP, whereas for d{L1,L}d\in\{L^1,L^\infty\} and M=Rt\mathcal M=\mathbb R^t, the lower bound is 1.41.4.

Also, this paper consider the case where M\mathcal M is the set of nodes of an unweighted graph, and dd measures the shortest distance between two nodes on the graph.

  • If the graph is a tree, then PROP must exist. And there are an efficient algorithm.
  • If the graph is arbitrary, kn/2k\ge n/2. PROP must exist and can be computed efficiently.
  • Open problem: Whether an exact PROP exists for all graphs.

For d=L2d=L^2 and M=Rt\mathcal M=\mathbb R^t for t2t\ge2, checking whether a clustering is PROP is NP-hard.

When tt is large,even greedy capture is NP-hard. When tt is constant, the problem becomes efficiently solvable.

When tt is large, using a PTASPTAS as sub-routine of greedy capture, we can efficiently compute 2(1+ϵ)2(1+\epsilon)-PROP fair clustering for any fixed ϵ>0\epsilon\gt0

Problem of generalization: Would a clustering that is PROP w.r.t. samples drawn from N\mathcal N remain approximately PROP w.r.t. entire set N\mathcal N? Chen et al. proved that the answer is yes when M|\mathcal M| is finite. Using the VC dimension, the answer is also yes for M=Rt\mathcal M=\mathbb R^t.

Preliminaries

N\mathcal N: the set of nn data points lie in a metric space (X,d)(\mathcal X,d), where d:X×XRd:\mathcal X\times\mathcal X\to\mathbb R is a distance function satisfying the triangle inequality.

Consider X=Rt\mathcal X=\mathbb R^t for tNt\in\mathbb N.

Distance function: L2,L1,LL^2,L^1,L^\infty.

M\mathcal M: set of candidate cluster centers. In this paper M=X\mathcal M=\mathcal X.

Given kNk\in\mathbb N, a kk-clustering is a set XMkX\in\mathcal M^k.

Each xXx\in X is called the open cluster center.

Cost to agent iNi\in\mathcal N induced by a cluster center xMx\in\mathcal M is the distance d(i,x)d(i,x). The cost to agent ii should be the minimal distance.

d(i,X)=minxXd(i,x)d(i,X)=\min_{x\in X}d(i,x)

Agent ii is interested in minimizing the cost.

A set of points SNS\subseteq N containing at least n/k\lceil n/k\rceil, if they can find a new cluster that is better for each of them, it is a violation of fairness (block coalition)

📖Definition 1. ρ\rho-PROP, block coalition. Omitted.

PROP on graphs: given an undirected graph G=(V,E)G=(V,E), NV,M=V\mathcal N\subseteq V,\mathcal M=V. Distance between u,vVu,v\in V is the length of the shortest path.

Greedy Capture

Detail is omitted here.

(1+2)(1+\sqrt2)-PROP

This paper gives a refined analysis of Greedy Capture.

📖Definition 2 (Apollonius radius) Given ρ1\rho\ge1, the ρ\rho-Apollonius radius of a metric (X,d)(\mathcal X,d) is defined as AX,d(ρ)=supx,yXΔ(ρ,x,y)/d(x,y)A_{\mathcal X,d}(\rho)=\sup_{x,y\in\mathcal X}\Delta(\rho,x,y)/d(x,y), where Δ(ρ,x,y)\Delta(\rho,x,y) is the radius of the smallest ball centered at some point in X\mathcal X that contains the entire set {pX:ρd(p,y)d(p,x)}\{p\in\mathcal X:\rho\cdot d(p,y)\le d(p,x)\}.

When ρ>1\rho\gt1, the set is a ball.

🎯Theorem 3. For any metric (X,d)(\mathcal X,d) and M=X\mathcal M=\mathcal X, greedy capture finds a ρ\rho-PROP clustering, where ρ1\rho\ge1 is the smallest positive number satisfying AX,d(ρ)ρ+1ρ1A_{\mathcal X,d}(\rho)\cdot\frac{\rho+1}{\rho}\le1.

M=X\mathcal M=\mathcal X: 候选聚类中心可以是空间上任意一点。

Proof by triangle inequality. Proof that if XX is not ρ\rho-PROP, then AX,d(ρ)ρ+1ρ>1A_{\mathcal X,d}(\rho)\cdot\frac{\rho+1}{\rho}\gt1.

🎯Theorem 4. For any metric (X,d)(\mathcal X,d), the ρ\rho-Apollonius radius is AX,d(ρ)1ρ1A_{\mathcal X,d}(\rho)\le\frac{1}{\rho-1}. Hence, greedy capture finds a (1+2)(1+\sqrt2)-PROP fair clustering for every metric.

在我看来,本篇论文定义了一个复杂的概念——阿波罗尼乌斯半径,来形式化证明算法结果的下界。

但是这个阿波罗尼乌斯半径的定义过于复杂了。

Next, the paper shows for d=L2d=L^2, the ρ\rho-Apollonius radius is better, 22-PROP guarantee.

🎯Theorem 5. For the metric space (R2,L2)(\mathbb R^2,L^2), where tNt\in\mathbb N, the ρ\rho-Apollonius radius AR2,L2(ρ)ρρ21A_{\mathbb R^2,L^2}(\rho)\le\frac{\rho}{\rho^2-1}, and hence, greedy capture finds a 22-PROP clustering.

Proof omitted.

Whether the introduction of Apollonius improves approximation bound for other distance metrics. For L1,LL^1,L^\infty, the answer is unfortunately NO. The approximation remains 1+21+\sqrt 2.

🎯Theorem 6. For the metric space (Rt,d)(\mathbb R^t,d) where t2t\ge2 and d{L1,L}d\in\{L^1,L^\infty\}, and M=Rt\mathcal M=\mathbb R^t, there exists an example in which the clustering produced by greedy capture is not ρ\rho-PROP for ρ<1+2\rho\lt1+\sqrt2.

Universal Lower Bounds

This paper generalizes lower bounds on approximation to PROP that apply to ALL ALGORITHMS.

Chen et al. showed that ρ<2\rho<2 is not guaranteed. For N=M\mathcal N=\mathcal M, the weaker lower bound 1.51.5.

This paper: for the case where M=X=Rt\mathcal M=\mathcal X=\mathbb R^t and d{L1,L2,L}d\in\{L^1,L^2,L^\infty\}.

再次提醒:M=X\mathcal M=\mathcal X: 候选聚类中心可以是空间上任意一点。

  • When t=1t=1, it’s easy to see that there always exist PROP clusterings.
  • When t2t\ge2, a lower bound 2/32/\sqrt3 for d=L2d=L^2 and a lower bound 1.41.4 for d{L1,L}d\in\{L^1,L^\infty\}.

🎯Theorem 7. For the metric space (Rt,L2)(\mathbb R^t,L^2) where t2t\ge2 and M=Rt\mathcal M=\mathbb R^t, there is an example in which no clustering is ρ\rho-PROP for p<2/31.155p\lt2/\sqrt3\approx1.155.

Closing the gap between lower bound 1.1551.155 and upper bound 22 for L2L^2 is an open problem.

🎯Theorem 8. For the metric space (Rt,d)(\mathbb R^t,d), where t2t\ge2 and d{L1,L}d\in\{L^1,L^\infty\}, and M=Rt\mathcal M=\mathbb R^t, there is an example in which no clustering is ρ\rho-PROP for ρ<1.4\rho\lt1.4.

Clustering in Graphs

In this section, we consider the special case where the metric space (X,d)(\mathcal X,d) is induced by an undirected graph G=(V,E)G=(V,E).

Consider the case of M=X=V\mathcal M=\mathcal X=V. Distance d(x,y)d(x,y) is the length of the shortest path between x,yx,y.

所有节点都是候选聚类中心,距离定义为最短路长度。

When GG is a tree, an exact PROP clustering always exists, and can be computed by an efficient algorithm.

PROP Clustering for Trees

First root the tree at an arbitrary node rr to obtain a rooted tree (G,r)(G,r).

Denote with hh the height of the rooted rooted tree, and with level(x)level(x) the height of node xx relative to root rr with level(r)=1level(r)=1.

Let ST(x)ST(x) denote the sub-tree of node xx

The algorithm starts from the leaves, opens a center every time it finds a node whose sub-tree contains at least n/k\lceil n/k\rceil nodes, then deletes this sub-tree from the graph. At the end, the cost to each node is still defined using the closest node at which a center is opened by the algorithm.

Pseudo Code - PROP Clustering for Trees

  1. Root the tree GG at an arbitrary node rr.
  2. XX\gets\empty
  3. GdGG^d\gets G.
  4. for =d\ell=d to 11 do
    1. G1=GG^{\ell-1}=G^\ell
    2. for every xVx\in V with level(x)=llevel(x)=l and ST(x)n/k|ST(x)|\ge\lceil n/k\rceil do
      • XX{x}X\gets X\cup\{x\}
      • G1G1ST(x)G^{\ell-1}\gets G^{\ell-1}\diagdown ST(x)
  5. if G0G^0\neq\empty then XX{r}X\gets X\cup\{r\}
  6. return XX

🎯Theorem 9. Let G=(V,E)G=(V,E) be an undirected tree, (V,d)(V,d) be the metric induced by GG, NV,M=V\mathcal N\subseteq V,\mathcal M=V, and kNk\in\mathbb N. Then, the algorithm yields a PROP clustering.

Hard proof by contradiction.

What about Non-tree

Consider the universal lower bound for (R2,L1)(\mathbb R^2,L^1) metric from Theorem 8.

If the graph is dense enough, we can derive same lower bound of 1.41.4.

❓Whether better lower bounds exist is an open problem.

❓For special case where N=V\mathcal N=V, we do not know whether PROP always exists

If GG is connected and we want to place a large number of clusters kn/2k\ge n/2, then PROP always exists. This is because a dominating set of any size kn/2k\ge n/2 is guaranteed to exist in a graph with nn nodes and can be computed efficiently.

Definition of dominating set: A set of nodes is called a dominating set if every node in the graph is either in this set or adjacent to a node in this set.

Thus the problem of finding a PROP clustering in general graphs with N=M=V\mathcal N=\mathcal M=V is trivial when kn/2k\ge n/2, but remains open when k<n/2k\lt n/2.

Computational Aspects

Two problems:

  1. Checking whether a PROP clustering exits.
  2. Implementing the Greedy Capture when M=Rt\mathcal M=\mathbb R^t.

🎯Theorem 10, Given a finite N\mathcal N, finite M\mathcal M, kNk\in\mathbb N, and d=L2d=L^2, checking whether a PROP clustering exists is NP-hard.

Polynomial-time reduction from the planar monotone rectilinear 3-SAT problem.

Planar monotone rectilinear 3-SAT problem: Given a 3-SAT formula, each clause cjc_j consists of only positive or only negative literals, the graph connecting clauses to literals they contain is planar, and this graph has a planar embedding in which each variable viv_i is represented by a rectangle on the xx-axis and each positive/negative clause is represented by a rectangle above/below xx-axis with 3 vertical lines or legs to its 3 variables.

什么玩意啊,什么乱七八糟的 NP-hard 问题。欺负我不会证规约是吧。

🎯Theorem 11. Let tNt\in\mathbb N, finite NRt\mathcal N\subset\mathbb R^t, and kNk\in\mathbb N be given as input. Suppose M=Rt\mathcal M=\mathbb R^t and d=L2d=L^2. Then, the following hold:

  1. The clustering returned by Greedy Capture algorithm cannot be computed in polynomial time unless P=NPP=NP.
  2. If tt is constant, then it can be computed in polynomial time.
  3. Even if tt is not constant, for any constant ϵ>0\epsilon>0, there exists a polynomial time algorithm which finds a (2+ϵ)(2+\epsilon)-PROP clustering.

Learning Fair Clustering by Sampling

VERY DIFFICULT!

Is a clustering that is approximately PROP w.r.t. random samples taken from an underlying population would remain approximately PROP w.r.t. the whole population.

YES!

📖Definition 12. We say that a kk-clustering XX is ρ\rho-PROP fair to (1+ϵ)(1+\epsilon)-deviations w.r.t. N\mathcal N if for all SNS\subseteq\mathcal N with SN(1+ϵ)/k|S|\ge|\mathcal N|\cdot(1+\epsilon)/k and all yMy\in\mathcal M, there exists at least one iSi\in S s.t. ρd(i,y)d(i,X)\rho\cdot d(i,y)\ge d(i,X).

📖Definition 13. Given a set of points N\mathcal N, a kk-clustering XX, and yMy\in\mathcal M, define the binary classifier hX,y:N{0,1}h_{X,y}:\mathcal N\to\{0,1\} s.t. hX,y(i)=1h_{X,y}(i)=1 iff ρd(i,y)<d(i,X)\rho\cdot d(i,y)\lt d(i,X). Define the error of this classifier on a set of points SNS\subseteq\mathcal N as errS(hX,y)=(1/S)iShX,y(i)err_S(h_{X,y})=(1/|S|)\cdot\sum_{i\in S}h_{X,y}(i).

The classifier equals 11 iff ii can be part of a coalition that complains about the unfairness of XX by demonstrating yy as a location that provides them ρ\rho-improvement.

XX is ρ\rho-PROP to (1+ϵ)(1+\epsilon)-deviations w.r.t. N\mathcal N iff errN(X,y)1+ϵkerr_\mathcal N(X,y)\le\frac{1+\epsilon}{k} for all yMy\in\mathcal M.

Goal: Show that given a large enough random simple NNN\subseteq\mathcal N, if we have clustering XX that is ρ\rho-PROP w.r.t. NN, then it is ρ\rho-PROP to (1+ϵ)(1+\epsilon)-deviations w.r.t. N\mathcal N with high probability. But there is no control over X,yX,y. Because of the uniform convergence guarantee.

errN(hX,y)errN(hX,y)|err_N(h_{X,y})-err_\mathcal N(h_{X,y})| is bounded for all X,yX,y.

📖Definition 14 (VC Dimension). Let N\mathcal N be a set of points. Let H\mathcal H be a family of binary classifier over N\mathcal N. We say that H\mathcal H shatters SNS\subseteq\mathcal N if for every labeling :S{0,1}\ell:S\to\{0,1\}, there exists a classifier hHh\in\mathcal H s.t. h(i)=(i)h(i)=\ell(i) for all iSi\in S. The VC dimension of H\mathcal H, denoted dimVC(H)dim^{VC}(\mathcal H), is the size of the largest SS that can be shattered by H\mathcal H.

🤔Proposition 15. Let H\mathcal H be a family of binary classifier over a set of points N\mathcal N. If NNN\subseteq N is a uniformly random sample with NΩ(dimVC(H)+ln(1/δ)ϵ2)|N|\ge\Omega\left(\frac{dim^{VC}(\mathcal H)+\ln(1/\delta)}{\epsilon^2}\right), then with probability at least 1δ1-\delta, errN(h)errN(h)ϵ|err_N(h)-err_\mathcal N(h)|\le\epsilon for all hHh\in\mathcal H.

🎯Theorem 16. Fix ϵ,δ>0,ρ1,k,tN\epsilon,\delta\gt0,\rho\ge1,k,t\in\N, and metric (X,d)(\mathcal X,d) where X=Rt\mathcal X=\R^t and d=L2d=L^2. Let N\mathcal N be a set of points and M=Rt\mathcal M=\R^t be the set of candidate centers. Let NNN\subseteq\mathcal N be sampled uniformly at random with NΩ(k2(tklnk+ln(1/δ))ϵ2)|N|\ge\Omega\left(\frac{k^2\cdot(tk\ln k+\ln(1/\delta))}{\epsilon^2}\right). Then with probability at least 1δ1-\delta, every kk-clustering XMkX\in\mathcal M^k that is ρ\rho-PROP w.r.t. NN is ρ\rho-PROP to (1+ϵ)(1+\epsilon)-deviation w.r.t. N\mathcal N.

Open Questions

This paper mainly consider M\mathcal M is entire metric space.

There are some open questions.

Bridging Gaps

Can we bridge the lower (1+2)(1+\sqrt2) and upper bounds on the approximation ratio to PROP.

Conjecture for L2L^2, the lower bound 2/32/\sqrt3 may be achievable.

Maybe by Jung’s theorem, for L2L^2 distance in R2\R^2, any set of points with diameter at most 11 is contained in a ball of radius at most 1/31/\sqrt3. This could be useful in closing the gap for L2L^2.

On Graphs

Does PROP clustering always exist for all graphs when N=M\mathcal N=\mathcal M is the set of all nodes of the graph?

Computational Hardness

What is the computational hardness for other distance metrics?

Other ML Models

Can we adapt this PROP to other machine learning settings such as regression or classification?

Some Personal Unremarkable Rants

This paper revisit Greedy Capture. But remember M=\mathcal M= whole metric space, this gives different bounds.

Some dazzling new concepts like “Apollonius radius”, “shatter” etc.

Impressive spectacular showmanship on hardness proof.